Google launched a real-time video call interpretation feature for Gemini Live on Thursday, supporting 32 languages with what the company says is sub-400-millisecond latency for on-device translations. The feature is rolling out first to Pixel 9, Pixel 9 Pro, Samsung Galaxy S25, and S25 Ultra users, with broader Android availability promised by end of Q3.
The technical approach is interesting: Gemini Live attempts to handle the first 30 seconds of any conversation entirely on the device, using a distilled version of Gemini 3 Flash that fits in roughly 4GB of RAM. Once the conversation clearly continues past that threshold — or if the on-device model encounters a phrase it has low confidence on — the system falls back to cloud inference. Google says this hybrid approach preserves user privacy for casual interactions while maintaining quality for longer calls.
What works, and what doesn't
We tested the feature across English-Indonesian, English-Mandarin, and English-Spanish call pairs over the past 48 hours. Quality is genuinely impressive on the on-device side: latency is low enough that conversations feel natural, and accuracy on common phrases is comparable to high-end paid translation services. The cloud fallback is faster and more accurate, but introduces a noticeable lag — usually around 800 milliseconds — that disrupts the rhythm of conversation.
The biggest weakness is technical jargon. Medical terms, legal phrases, and code-switching between languages all caused the model to either drop translations or produce confidently wrong ones. For business calls in domain-specific contexts, the feature is currently a useful aid but not a replacement for a human interpreter. Google acknowledges this limitation and says domain-specific fine-tuning is on the roadmap for later this year.
Why this is the real Gemini story
Multimodal real-time interaction is the use case Google has been pointing toward since the original Gemini launch in 2023, and the one that most clearly differentiates it from text-first competitors. While the headlines this week have focused on the 10M context window of Gemini 3 Ultra, the long-term strategic story is probably this one. ChatGPT and Claude both have voice modes; neither has anything close to the latency, language coverage, or device integration that Gemini Live now ships with.