Xiaomi Releases ControlFoley, Open-Source Video Audio Framework Enabling Precise Sound Control

According to Beating, Xiaomi's AI team released and open-sourced ControlFoley, a video audio generation framework that offers creators precise control over sound style through text descriptions or reference audio. Unlike traditional AI dubbing systems that infer sound from visuals alone, ControlFoley allows creators to modify audio characteristics—such as changing a door knock to a metallic strike or applying drum tones to tennis ball impacts—while maintaining audio-visual synchronization. The framework employs a time-space audio-visual encoder with a time-timbre decoupling strategy. The project's technical report, code, model weights, and demo are now available.
Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments