gemini-live-api-dev
Building real-time bidirectional streaming apps with the Gemini Live API
Developer Setup
Setup & Installation
npx skills add https://github.com/google-gemini/gemini-skills --skill gemini-live-api-devnpx skills add https://github.com/google-gemini/gemini-skills --skill gemini-live-api-devOverview
What This Skill Does
Covers building real-time, bidirectional streaming apps with the Gemini Live API over WebSockets. Handles audio/video/text input streams, voice activity detection, session management, ephemeral tokens, and function calling. SDKs covered are google-genai (Python) and @google/genai (JavaScript/TypeScript).
Application
When to use this Skill
- Configuring integration settings for custom agent workflows.
- Optimizing query execution and response latency in production.
- Developing clean, standard-compliant implementations for enterprise services.
- Troubleshooting connection timeouts and authentication handshakes.
- Monitoring API rate limits and execution pipelines programmatically.
Documentation
Show Skills.md file
Gemini Live API Development Skill
Overview
The Live API enables low-latency, real-time voice and video interactions with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses.
Key capabilities:
- Bidirectional audio streaming — real-time mic-to-speaker conversations
- Video streaming — send camera/screen frames alongside audio
- Text input/output — send and receive text within a live session
- Audio transcriptions — get text transcripts of both input and output audio
- Voice Activity Detection (VAD) — automatic interruption handling
- Native audio — thinking (with configurable
thinkingLevel) - Function calling — synchronous tool use
- Google Search grounding — ground responses in real-time search results
- Session management — context compression, session resumption, GoAway signals
- Ephemeral tokens — secure client-side authentication
[!NOTE] The Live API currently only supports WebSockets. For WebRTC support or simplified integration, use a partner integration.
Models
gemini-3.1-flash-live-preview— Optimized for low-latency, real-time dialogue. Native audio output, thinking (viathinkingLevel). 128k context window. This is the recommended model for all Live API use cases.
Recommendations
Explore other random skills
cso
Chief Security Officer: OWASP Top 10 + STRIDE threat model with zero false-positive exclusions
ship
Release Engineer: sync main, run tests, audit coverage, push, open PR
land-and-deploy
Merge the PR, wait for CI and deploy, verify production health