hugging-face-vision-trainer
Train vision models on HF infrastructure
Developer Setup
Setup & Installation
npx skills add https://github.com/huggingface/skills --skill hugging-face-vision-trainernpx skills add https://github.com/huggingface/skills --skill hugging-face-vision-trainerOverview
What This Skill Does
Trains and fine-tunes vision models on Hugging Face Jobs cloud GPUs, covering object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models including MobileNetV3, ResNet, ViT), and SAM/SAM2 segmentation. Handles COCO-format dataset prep, Albumentations augmentation, mAP/accuracy evaluation, and automatic model persistence to the Hugging Face Hub.
Application
When to use this Skill
- Configuring integration settings for custom agent workflows.
- Optimizing query execution and response latency in production.
- Developing clean, standard-compliant implementations for enterprise services.
- Troubleshooting connection timeouts and authentication handshakes.
- Monitoring API rate limits and execution pipelines programmatically.
Documentation
Show Skills.md file
Vision Model Training on Hugging Face Jobs
Train object detection, image classification, and SAM/SAM2 segmentation models on managed cloud GPUs. No local GPU setup required—results are automatically saved to the Hugging Face Hub.
When to Use This Skill
Use this skill when users want to:
- Fine-tune object detection models (D-FINE, RT-DETR v2, DETR, YOLOS) on cloud GPUs or local
- Fine-tune image classification models (timm: MobileNetV3, MobileViT, ResNet, ViT/DINOv3, or any Transformers classifier) on cloud GPUs or local
- Fine-tune SAM or SAM2 models for segmentation / image matting using bbox or point prompts
- Train bounding-box detectors on custom datasets
- Train image classifiers on custom datasets
- Train segmentation models on custom mask datasets with prompts
- Run vision training jobs on Hugging Face Jobs infrastructure
- Ensure trained vision models are permanently saved to the Hub
Related Skills
hugging-face-jobs— General HF Jobs infrastructure: token authentication, hardware flavors, timeout management, cost estimation, secrets, environment variables, scheduled jobs, and result persistence. Refer to the Jobs skill for any non-training-specific Jobs questions (e.g., "how do secrets work?", "what hardware is available?", "how do I pass tokens?").hugging-face-model-trainer— TRL-based language model training (SFT, DPO, GRPO). Use that skill for text/language model fine-tuning.
Local Script Execution
Helper scripts use PEP 723 inline dependencies. Run them with uv run:
uv run scripts/dataset_inspector.py --dataset username/dataset-name --split train
uv run scripts/estimate_cost.py --help
Recommendations