skills.vishalvoidskills/vishalvoid
Creative & DesignIntermediate

speech

Generate spoken audio from text using OpenAI's API with built-in vo...

Developer Setup

Setup & Installation

bash
npx skills add https://github.com/openai/skills --skill speech

Overview

What This Skill Does

Generate spoken audio from text using OpenAI's API with built-in voices

Application

When to use this Skill

Documentation

Show Skills.md file

Speech Generation Skill

Generate spoken audio for the current project (narration, product demo voiceover, IVR prompts, accessibility reads). Defaults to gpt-4o-mini-tts-2025-12-15 and built-in voices, and prefers the bundled CLI for deterministic, reproducible runs.

When to use

  • Generate a single spoken clip from text
  • Generate a batch of prompts (many lines, many files)

Decision tree (single vs batch)

  • If the user provides multiple lines/prompts or wants many outputs -> batch
  • Else -> single

Workflow

  1. Decide intent: single vs batch (see decision tree above).
  2. Collect inputs up front: exact text (verbatim), desired voice, delivery style, format, and any constraints.
  3. If batch: write a temporary JSONL under tmp/ (one job per line), run once, then delete the JSONL.
  4. Augment instructions into a short labeled spec without rewriting the input text.
  5. Run the bundled CLI (scripts/text_to_speech.py) with sensible defaults (see references/cli.md).
  6. For important clips, validate: intelligibility, pacing, pronunciation, and adherence to constraints.
  7. Iterate with a single targeted change (voice, speed, or instructions), then re-check.
  8. Save/return final outputs and note the final text + instructions + flags used.

Temp and output conventions

  • Use tmp/speech/ for intermediate files (for example JSONL batches); delete when done.
  • Write final artifacts under output/speech/ when working in this repo.
Lines 1 - 25 of 138

Recommendations

Explore other random skills

All skillsMy patterns