Technical & DevelopmentIntermediate

hugging-face-vision-trainer

Train vision models on HF infrastructure

Developer Setup

Setup & Installation

bash

npx skills add https://github.com/huggingface/skills --skill hugging-face-vision-trainer

npx skills add https://github.com/huggingface/skills --skill hugging-face-vision-trainer

Or paste this URL into your assistant to install:

https://github.com/huggingface/skills/tree/main/skills/hugging-face-vision-trainer View on GitHub

Overview

What This Skill Does

Trains and fine-tunes vision models on Hugging Face Jobs cloud GPUs, covering object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models including MobileNetV3, ResNet, ViT), and SAM/SAM2 segmentation. Handles COCO-format dataset prep, Albumentations augmentation, mAP/accuracy evaluation, and automatic model persistence to the Hugging Face Hub.

Application

When to use this Skill

Configuring integration settings for custom agent workflows.
Optimizing query execution and response latency in production.
Developing clean, standard-compliant implementations for enterprise services.
Troubleshooting connection timeouts and authentication handshakes.
Monitoring API rate limits and execution pipelines programmatically.

Documentation

Show Skills.md file

Vision Model Training on Hugging Face Jobs

Train object detection, image classification, and SAM/SAM2 segmentation models on managed cloud GPUs. No local GPU setup required—results are automatically saved to the Hugging Face Hub.

When to Use This Skill

Use this skill when users want to:

Fine-tune object detection models (D-FINE, RT-DETR v2, DETR, YOLOS) on cloud GPUs or local
Fine-tune image classification models (timm: MobileNetV3, MobileViT, ResNet, ViT/DINOv3, or any Transformers classifier) on cloud GPUs or local
Fine-tune SAM or SAM2 models for segmentation / image matting using bbox or point prompts
Train bounding-box detectors on custom datasets
Train image classifiers on custom datasets
Train segmentation models on custom mask datasets with prompts
Run vision training jobs on Hugging Face Jobs infrastructure
Ensure trained vision models are permanently saved to the Hub

Related Skills

hugging-face-jobs — General HF Jobs infrastructure: token authentication, hardware flavors, timeout management, cost estimation, secrets, environment variables, scheduled jobs, and result persistence. Refer to the Jobs skill for any non-training-specific Jobs questions (e.g., "how do secrets work?", "what hardware is available?", "how do I pass tokens?").
hugging-face-model-trainer — TRL-based language model training (SFT, DPO, GRPO). Use that skill for text/language model fine-tuning.

Local Script Execution

Helper scripts use PEP 723 inline dependencies. Run them with uv run:

uv run scripts/dataset_inspector.py --dataset username/dataset-name --split train
uv run scripts/estimate_cost.py --help

Lines 1 - 28 of 588

Recommendations

Explore other random skills

TechnicalIntermediate

venice-api-overview

API basics, auth modes, pricing, and versioning

TechnicalIntermediate

venice-auth

API keys and wallet-based Venice authentication

TechnicalIntermediate

venice-audio-music

Music generation queueing, retrieval, and completion endpoints

All skills My patterns