skills.vishalvoidskills/vishalvoid
Technical & DevelopmentIntermediate

hugging-face-vision-trainer

Train vision models on HF infrastructure

Developer Setup

Setup & Installation

bash
npx skills add https://github.com/huggingface/skills --skill hugging-face-vision-trainer

Overview

What This Skill Does

Trains and fine-tunes vision models on Hugging Face Jobs cloud GPUs, covering object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models including MobileNetV3, ResNet, ViT), and SAM/SAM2 segmentation. Handles COCO-format dataset prep, Albumentations augmentation, mAP/accuracy evaluation, and automatic model persistence to the Hugging Face Hub.

Application

When to use this Skill

Documentation

Show Skills.md file

Vision Model Training on Hugging Face Jobs

Train object detection, image classification, and SAM/SAM2 segmentation models on managed cloud GPUs. No local GPU setup required—results are automatically saved to the Hugging Face Hub.

When to Use This Skill

Use this skill when users want to:

  • Fine-tune object detection models (D-FINE, RT-DETR v2, DETR, YOLOS) on cloud GPUs or local
  • Fine-tune image classification models (timm: MobileNetV3, MobileViT, ResNet, ViT/DINOv3, or any Transformers classifier) on cloud GPUs or local
  • Fine-tune SAM or SAM2 models for segmentation / image matting using bbox or point prompts
  • Train bounding-box detectors on custom datasets
  • Train image classifiers on custom datasets
  • Train segmentation models on custom mask datasets with prompts
  • Run vision training jobs on Hugging Face Jobs infrastructure
  • Ensure trained vision models are permanently saved to the Hub

Related Skills

  • hugging-face-jobs — General HF Jobs infrastructure: token authentication, hardware flavors, timeout management, cost estimation, secrets, environment variables, scheduled jobs, and result persistence. Refer to the Jobs skill for any non-training-specific Jobs questions (e.g., "how do secrets work?", "what hardware is available?", "how do I pass tokens?").
  • hugging-face-model-trainer — TRL-based language model training (SFT, DPO, GRPO). Use that skill for text/language model fine-tuning.

Local Script Execution

Helper scripts use PEP 723 inline dependencies. Run them with uv run:

uv run scripts/dataset_inspector.py --dataset username/dataset-name --split train
uv run scripts/estimate_cost.py --help
Lines 1 - 28 of 588

Recommendations

Explore other random skills

All skillsMy patterns