Google AI developer relations lead Logan Kilpatrick announced on April 15 the release of Gemini 3.1 Flash TTS—the latest text-to-speech model from Google. This model supports 70 languages, fine-grained control at the level of scene direction and speakers, and audio tags. It is now available for use in the audio playground in Google AI Studio and in the Gemini API.
Four core features
Gemini 3.1 Flash TTS comes with four notable upgrades compared with its predecessor:
Scene Direction — You can set a context for the voice, such as “speaking softly in a noisy café” or “excitedly announcing good news,” and the model will adjust tone, speaking pace, and emotion based on the scene
Speaker-Level Specificity — In multi-role conversations, you can set different voice characteristics for each character
Audio Tags — Supports inserting sound-effect instructions into text to control details like pauses and tone changes
Support for 70 languages — Significantly expands multilingual coverage, including Chinese
More natural, more expressive voices
Google emphasized improvements in voice naturalness with this model. Traditional TTS models are often criticized for output that “sounds like AI.” Gemini 3.1 Flash TTS aims to narrow the gap with human speech through richer prosody variations and emotional expression. Kilpatrick noted that progress from Gemini 2.5 to 3.1 is “very significant.”
How developers can use it
Developers can use it in two ways:
Google AI Studio Audio Playground — Test and preview voice effects directly in the web interface
Gemini API — Integrate into applications for scenarios such as voice assistants, audiobooks, automatic Podcast generation, and multilingual customer service
Gemini product line keeps expanding
Flash TTS is part of the recent flurry of releases in the Gemini 3.1 series. Previously, Google rolled out Gemini Robotics ER 1.6 (robot vision reasoning), Tab Tab Tab (Vibe Coding prompt completion), and design preview features. Google is expanding Gemini from a “chat model” into a full-modal AI platform spanning text, speech, vision, and robotics.
This article Google releases Gemini 3.1 Flash TTS: Supports 70 languages and scene direction, for more natural AI voices first appeared on Liannews ABMedia.
Related Articles
Honor's Lightning Robot Wins Beijing 2026 Humanoid Robot Half Marathon with 50:26 Finish
Meta Stock Rises 1.73% as Company Plans 8,000-Job Layoff Starting May 20
Google’s annual report says Gemini achieves millisecond interception, blocking 99% of scam ads
Ethereum Co-founder Lubin: AI Will Be Critical Turning Point for Crypto, But Tech Giant Monopoly Poses Systemic Risk
Elon Musk Pushes 'Universal High Income' Checks as Ultimate Solution for AI Unemployment
DeepSeek Reportedly Launches First External Fundraising Round, Targets $10B+ Valuation and $300M+