Creative & DesignIntermediate

speech

Generate spoken audio from text using OpenAI's API with built-in vo...

Developer Setup

Setup & Installation

bash

npx skills add https://github.com/openai/skills --skill speech

npx skills add https://github.com/openai/skills --skill speech

Or paste this URL into your assistant to install:

https://github.com/openai/skills/tree/main/skills/.curated/speech View on GitHub

Overview

What This Skill Does

Generate spoken audio from text using OpenAI's API with built-in voices

Application

When to use this Skill

Integrating speech into your development workflow.
Following best practices for generate spoken audio from text using openai's api with built-in voices.
Automating repetitive tasks with AI-assisted tooling.
Building production-grade applications with proper standards.
Debugging and troubleshooting common implementation issues.

Documentation

Show Skills.md file

Speech Generation Skill

Generate spoken audio for the current project (narration, product demo voiceover, IVR prompts, accessibility reads). Defaults to gpt-4o-mini-tts-2025-12-15 and built-in voices, and prefers the bundled CLI for deterministic, reproducible runs.

When to use

Generate a single spoken clip from text
Generate a batch of prompts (many lines, many files)

Decision tree (single vs batch)

If the user provides multiple lines/prompts or wants many outputs -> batch
Else -> single

Workflow

Decide intent: single vs batch (see decision tree above).
Collect inputs up front: exact text (verbatim), desired voice, delivery style, format, and any constraints.
If batch: write a temporary JSONL under tmp/ (one job per line), run once, then delete the JSONL.
Augment instructions into a short labeled spec without rewriting the input text.
Run the bundled CLI (scripts/text_to_speech.py) with sensible defaults (see references/cli.md).
For important clips, validate: intelligibility, pacing, pronunciation, and adherence to constraints.
Iterate with a single targeted change (voice, speed, or instructions), then re-check.
Save/return final outputs and note the final text + instructions + flags used.

Temp and output conventions

Use tmp/speech/ for intermediate files (for example JSONL batches); delete when done.
Write final artifacts under output/speech/ when working in this repo.

Lines 1 - 25 of 138

Recommendations

Explore other random skills

CreativeIntermediate

figma-use

Prerequisite skill for every use_figma tool call — write/read actions in Figma context

CreativeBeginner

frontend-skill

Create visually strong landing pages, websites, and app UIs with restrained composition

CreativeIntermediate

playwright-interactive

Persistent browser and Electron interaction via js_repl for iterative UI debugging

All skills My patterns