hugging-face-evaluation
Model evaluation with vLLM/lighteval and eval tables
Developer Setup
Setup & Installation
npx skills add https://github.com/huggingface/skills --skill hugging-face-evaluationnpx skills add https://github.com/huggingface/skills --skill hugging-face-evaluationOverview
What This Skill Does
Adds and manages evaluation results in Hugging Face model cards using the model-index metadata format. Supports extracting benchmark tables from README files, importing scores from the Artificial Analysis API, and running evaluations with vLLM or lighteval on local GPUs or HF Jobs infrastructure.
Application
When to use this Skill
- Configuring integration settings for custom agent workflows.
- Optimizing query execution and response latency in production.
- Developing clean, standard-compliant implementations for enterprise services.
- Troubleshooting connection timeouts and authentication handshakes.
- Monitoring API rate limits and execution pipelines programmatically.
Documentation
Show Skills.md file
Overview
This skill is for running evaluations against models on the Hugging Face Hub on local hardware.
It covers:
inspect-aiwith local inferencelightevalwith local inference- choosing between
vllm, Hugging Face Transformers, andaccelerate - smoke tests, task selection, and backend fallback strategy
It does not cover:
- Hugging Face Jobs orchestration
- model-card or
model-indexedits - README table extraction
- Artificial Analysis imports
.eval_resultsgeneration or publishing- PR creation or community-evals automation
If the user wants to run the same eval remotely on Hugging Face Jobs, hand off to the hugging-face-jobs skill and pass it one of the local scripts in this skill.
If the user wants to publish results into the community evals workflow, stop after generating the evaluation run and hand off that publishing step to ~/code/community-evals.
All paths below are relative to the directory containing this
SKILL.md.
When To Use Which Script
Recommendations