According to Beating, Resemble AI released DramaBox, a speech generation model, on Hugging Face today. The model features director-level controllability through separated prompt syntax—users input dialogue in quotation marks while specifying stage directions like sighing, pauses, or whispers outside quotes. The model renders these directions as emotionally-inflected speech rather than reading them aloud.
DramaBox supports zero-shot voice cloning with just 10 seconds of reference audio and allows natural language prompts to set character age, accent, and emotion. Output is 48kHz stereo studio-quality audio. All generated audio includes an invisible Perth watermark resistant to MP3 compression and standard audio editing to prevent deepfake misuse.
Related News
OpenAI adds ChatGPT crisis conversation detection, improving the ability to warn about self-harm and violence
WhatsApp introduces Meta AI “disappearing” chat, prompting concerns over accountability mechanisms as messages automatically vanish
Mistral AI in talks with European banks to develop Mythos as a replacement for internet security models