Stability AI’s new Stable Audio model allows users to generate various types of high-quality audio by simply inputting text descriptions.
Main Points
Main Points:
- Generative Capabilities: Stable Audio can produce a wide range of audio, from instrumental sounds to ambient noises, using text prompts.
- Technical Specifications: The model is based on a U-Net diffusion architecture, trained on over 19,500 hours of audio data.
- Usage and Accessibility: Users can access the service with a free tier, generating up to 20 audio clips per month, though commercial use is restricted.
Summary
Stability AI has introduced Stable Audio, a generative AI model capable of creating high-quality audio from text descriptions. Leveraging a U-Net-based diffusion model, Stable Audio can produce diverse audio outputs, including single instruments, full ensembles, and ambient sounds. Trained on a vast dataset of 19,500 hours of audio, the model ensures high fidelity and responsiveness.
Stable Audio utilizes a text-to-audio embedding approach similar to that used in other generative models by Stability AI. Users provide a text prompt and specify the desired audio length, which the model then uses to generate the corresponding sound. While currently available in a limited, non-commercial capacity, the tool offers significant potential for future development in audio and music generation. Stability AI plans to release open-source versions and custom training capabilities, enhancing the tool’s accessibility and adaptability for various creative and professional uses.
Source: Stability AI releases a sound generator
Keep up to date on the latest AI news and tools by subscribing to our weekly newsletter, or following up on Twitter and Facebook.