skills.vishalvoidskills/vishalvoid
Technical & DevelopmentIntermediate

podcast-generation

AI podcast audio with Azure OpenAI Realtime API

Developer Setup

Setup & Installation

bash
npx skills add https://github.com/microsoft/skills --skill podcast-generation

Overview

What This Skill Does

Connects a React frontend to a Python FastAPI backend over WebSocket to generate spoken audio from text using Azure OpenAI's GPT Realtime Mini model. Takes a text prompt, streams PCM audio chunks, converts them to WAV, and returns base64-encoded audio for browser playback. Includes transcript output alongside the audio.

Application

When to use this Skill

Documentation

Show Skills.md file

Podcast Generation with GPT Realtime Mini

Generate real audio narratives from text content using Azure OpenAI's Realtime API.

Quick Start

  1. Configure environment variables for Realtime API
  2. Connect via WebSocket to Azure OpenAI Realtime endpoint
  3. Send text prompt, collect PCM audio chunks + transcript
  4. Convert PCM to WAV format
  5. Return base64-encoded audio to frontend for playback

Environment Configuration

AZURE_OPENAI_AUDIO_API_KEY=your_realtime_api_key
AZURE_OPENAI_AUDIO_ENDPOINT=https://your-resource.cognitiveservices.azure.com
AZURE_OPENAI_AUDIO_DEPLOYMENT=gpt-realtime-mini

Note: Endpoint should NOT include /openai/v1/ - just the base URL.

Core Workflow

Backend Audio Generation

Lines 1 - 25 of 116

Recommendations

Explore other random skills

All skillsMy patterns