Audio & Music Models | Pixio

Tools
Pricing
Workflows
All Models Seedance Guide
Maker Mode
Gallery
Showcase
Academy
Documentation
API
Status
Blog

Sign In Sign Up

Tools
Pricing
Workflows
All Models Seedance Guide
Maker Mode
Gallery
Showcase
Academy
Documentation
API
Status
Blog

Sign In Sign Up

Visualize the Future: Crafted by AI, Inspired by You

© Copyright 2026 Pixio. All Rights Reserved.

Privacy Policy Terms of Service Refund Policy

ModelsAudio & Music

ModelsPixio audio model systemBuilt for voice, music, and structure

Audio & Music

Music and voice generation.

Audio is easier to navigate when you split the problem by function: speech, composition, dialogue, or transformation. The right tool usually becomes obvious once the role is clear.

Open in Pixio Explore the academy

Browse by output role first, then use the model page to get into prompting, structure, and workflow details.

19 models in Audio & Music

Open any card for the full model brief

AudioModel brief

Convert text to speech with ElevenLabs. Choose from a wide range of voices, adjust stability and style, and use custom voice clones (IVC).

Open model brief

AudioModel brief

ElevenLabs Music

Compose songs from a prompt or a composition plan. Create instrumentals and full tracks with ElevenLabs Music (Compose).

Open model brief

AudioModel brief

ElevenLabs Text to Dialogue

Generate multi-speaker dialogue from text. Assign different voices to each speaker for podcasts, storytelling, and presentations.

Open model brief

AudioModel brief

Gemini 3.1 Flash TTS Preview

Google text-to-speech with natural single-speaker narration, selectable voices, and prompt-controlled style.

Open model brief

AudioModel brief

Kling Create Voice

Create a reusable custom voice from a clean 5–30 second audio sample. Use your Kling voice ID for consistent voiceovers and content.

Open model brief

AudioModel brief

Google Lyria 2: audio/music generation with strong aesthetic sense and coherence—good for stylized and artistic audio.

Open model brief

AudioModel brief

MiniMax Music V2

Generate music from style and mood prompts plus lyrics. Text-to-music with control over composition and sample rate.

Open model brief

AudioModel brief

High-quality text-to-speech with MiniMax Speech 02, 2.5, 2.6, and 2.8 (Turbo and HD). Multiple preset voices and natural intonation.

Open model brief

AudioModel brief

Create music with AI lyrics and instrumental options, extend clips, or regenerate segments—full control over style and arrangement.

Open model brief

AudioModel brief

Music (Compose) / Sound Effects

ElevenLabs music composition and sound effects—generate background music and SFX from text for video and media.

Open model brief

AudioModel brief

MiniMax music generation: create tracks from descriptions with a balance of quality and speed for drafts and finished pieces.

Open model brief

AudioModel brief

Pixio's music generation: create and shape music from text with integrated controls and workflows.

Open model brief

AudioModel brief

Generate full songs from text with Songcraft (Suno). Control genre, mood, and lyrics. Extend songs, create covers, and split stems.

Open model brief

AudioModel brief

Songcraft Generate

Generate full music tracks from a text description (Suno-style)—create songs with structure, style, and length you describe.

Open model brief

AudioModel brief

Speech 02/2.5/2.6/2.8 Turbo & HD

MiniMax text-to-speech with multiple quality and speed tiers—from fast Turbo to high-fidelity HD for different use cases.

Open model brief

AudioModel brief

Stable Audio 2.5

Create or transform audio: text-to-audio, inpainting (edit parts of a clip), or audio-to-audio for sound design and music.

Open model brief

AudioModel brief

Work with song structure: extract vocals, instrumental, or split into stems for remixing and production.

Open model brief

AudioModel brief

Text to Speech / Voice Clone (IVC) / Text to Dialogue

ElevenLabs TTS, voice cloning (IVC), and multi-voice dialogue—natural-sounding speech and character voices.

Open model brief

AudioModel brief

Clone a voice from samples with MiniMax—create a consistent synthetic voice for narration, dialogue, or content at scale.

Open model brief