Configuration Guide

Model Generation

The model ID to use. Supports stable models like gemini-3.1-flash-live, preview models, and even undocumented alpha/beta variants. Type any model ID — the system will attempt to connect regardless.

Voice Audio

The voice persona for speech output. 30 voices available ranging from Puck (upbeat) to Gacrux (mature). Each voice has a distinct personality, pitch, and cadence. You can type any voice name including experimental ones.

System Prompt Generation

Instructions that shape the model's behavior and personality. This is sent once at session start and guides every response. Keep it concise for voice — overly long prompts can add latency.

Temperature Generation

Controls randomness in the model's output. 0.0 = deterministic and focused. 2.0 = maximum creativity and variation. For voice assistants, 0.5-0.9 is the sweet spot — natural but not chaotic.

Top P (Nucleus Sampling) Generation

Limits the model to sampling from the top cumulative probability mass. 0.1 = only the most likely tokens. 0.95 = broad vocabulary. Works alongside temperature — lower Top P constrains output even at higher temperatures.

Top K Generation

Limits the model to sampling from the top K most likely tokens at each step. 1 = greedy (always pick the best). 40 = moderate diversity. Unlike Top P, this is a hard cutoff — useful for preventing rare/odd tokens.

Max Output Tokens Generation

Caps the length of each model response in tokens. Leave empty for the model's default. Lower values force shorter, snappier answers — ideal for voice where brevity matters.

Sample Rates Audio

Input: The sample rate of your microphone audio sent to the model. 16000 Hz is standard for speech. Output: The sample rate of audio returned by the model. 24000 Hz gives higher quality playback.

Barge-in VAD

When enabled, the user can interrupt the model mid-speech. The model will stop talking and listen. When disabled, the model finishes speaking before processing new input — useful for long-form explanations.

Speech Start Sensitivity VAD

High: Catches speech quickly, even soft starts — but may false-trigger on background noise. Low: More conservative, ignores faint sounds — better for noisy environments.

Speech End Sensitivity VAD

High: Responds immediately when you stop talking — snappier but may cut you off mid-pause. Low: Waits longer to confirm you're done — more patient, handles long pauses better.

Silence Before Response VAD

Milliseconds of silence required before the model starts responding. 200ms = ultra-fast (may interrupt you). 1000ms = very patient. Default 500ms balances speed and turn-taking.

Prefix Padding VAD

Milliseconds of audio captured before speech is actually detected. This "look-back" ensures the beginning of words isn't clipped. 200ms is a safe default.

API Version Experimental

v1beta = latest features and models. v1alpha = bleeding-edge, possibly unstable. v1 = stable production API. Some alpha/beta models require specific API versions to work.

Extra Params (JSON) Experimental

Raw JSON objects merged into the WebSocket setup and generation config messages. Use these for undocumented features, alpha flags, or any parameter not exposed in the UI. For power users and bleeding-edge testing.

Logo

Jen — Dev Console

Configure, test, and deploy voice sessions

Talk to Jen

Session Config

Voice Activity Detection (VAD)
Extra / Experimental Params (JSON)

Save as Persistent Agent

Session Output

Disconnected
Tap to speak

Saved Agents

No saved agents yet. Configure above and click "Save Agent".