Technical & DevelopmentIntermediate

gemini-live-api-dev

Building real-time bidirectional streaming apps with the Gemini Live API

Developer Setup

Setup & Installation

bash

npx skills add https://github.com/google-gemini/gemini-skills --skill gemini-live-api-dev

npx skills add https://github.com/google-gemini/gemini-skills --skill gemini-live-api-dev

Or paste this URL into your assistant to install:

https://github.com/google-gemini/gemini-skills/tree/main/skills/gemini-live-api-dev View on GitHub

Overview

What This Skill Does

Covers building real-time, bidirectional streaming apps with the Gemini Live API over WebSockets. Handles audio/video/text input streams, voice activity detection, session management, ephemeral tokens, and function calling. SDKs covered are google-genai (Python) and @google/genai (JavaScript/TypeScript).

Application

When to use this Skill

Configuring integration settings for custom agent workflows.
Optimizing query execution and response latency in production.
Developing clean, standard-compliant implementations for enterprise services.
Troubleshooting connection timeouts and authentication handshakes.
Monitoring API rate limits and execution pipelines programmatically.

Documentation

Show Skills.md file

Gemini Live API Development Skill

Overview

The Live API enables low-latency, real-time voice and video interactions with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses.

Key capabilities:

Bidirectional audio streaming — real-time mic-to-speaker conversations
Video streaming — send camera/screen frames alongside audio
Text input/output — send and receive text within a live session
Audio transcriptions — get text transcripts of both input and output audio
Voice Activity Detection (VAD) — automatic interruption handling
Native audio — thinking (with configurable thinkingLevel)
Function calling — synchronous tool use
Google Search grounding — ground responses in real-time search results
Session management — context compression, session resumption, GoAway signals
Ephemeral tokens — secure client-side authentication

[!NOTE] The Live API currently only supports WebSockets. For WebRTC support or simplified integration, use a partner integration.

Models

gemini-3.1-flash-live-preview — Optimized for low-latency, real-time dialogue. Native audio output, thinking (via thinkingLevel). 128k context window. This is the recommended model for all Live API use cases.

Lines 1 - 25 of 280

Recommendations

Explore other random skills

OfficeAdvanced

cso

Chief Security Officer: OWASP Top 10 + STRIDE threat model with zero false-positive exclusions

EnterpriseAdvanced

ship

Release Engineer: sync main, run tests, audit coverage, push, open PR

TechnicalIntermediate

land-and-deploy

Merge the PR, wait for CI and deploy, verify production health

All skills My patterns