Music and voice generation.
Google Lyria 2: audio/music generation with strong aesthetic sense and coherence—good for stylized and artistic audio.
Generate full music tracks from a text description (Suno-style)—create songs with structure, style, and length you describe.
Generate full songs from text with Songcraft (Suno). Control genre, mood, and lyrics. Extend songs, create covers, and split stems.
Create or transform audio: text-to-audio, inpainting (edit parts of a clip), or audio-to-audio for sound design and music.
Work with song structure: extract vocals, instrumental, or split into stems for remixing and production.
Create music with AI lyrics and instrumental options, extend clips, or regenerate segments—full control over style and arrangement.
MiniMax music generation: create tracks from descriptions with a balance of quality and speed for drafts and finished pieces.
Generate music from style and mood prompts plus lyrics. Text-to-music with control over composition and sample rate.
Pixio's music generation: create and shape music from text with integrated controls and workflows.
Compose songs from a prompt or a composition plan. Create instrumentals and full tracks with ElevenLabs Music (Compose).
MiniMax text-to-speech with multiple quality and speed tiers—from fast Turbo to high-fidelity HD for different use cases.
High-quality text-to-speech with MiniMax Speech 02, 2.5, 2.6, and 2.8 (Turbo and HD). Multiple preset voices and natural intonation.
Clone a voice from samples with MiniMax—create a consistent synthetic voice for narration, dialogue, or content at scale.
Convert text to speech with ElevenLabs. Choose from a wide range of voices, adjust stability and style, and use custom voice clones (IVC).
ElevenLabs TTS, voice cloning (IVC), and multi-voice dialogue—natural-sounding speech and character voices.
Generate multi-speaker dialogue from text. Assign different voices to each speaker for podcasts, storytelling, and presentations.
ElevenLabs music composition and sound effects—generate background music and SFX from text for video and media.
Create a reusable custom voice from a clean 5–30 second audio sample. Use your Kling voice ID for consistent voiceovers and content.