Build with Velma-2 Models

Explore Modulate's Velma-2 models — Deepfake Detection, Speech-to-text Transcription, Emotion Detection, and more.

Transcription - Low Cost, Low WER

Multilingual batch and streaming transcription at just $0.03 / hr. Includes emotion detection, PII redaction, accent detection and specialized vocabulary for medical, geographic, and political terms.

Deepfake Detection

#1 Ranked Deepfake Detection Model on 🤗 Hugging Face's Deepfake Speech Arena Leaderboard. Just $0.25 / hr, over 100x lower cost other providers.

Conversation Understanding

Coming soon — The only voice-native model that delivers true conversation understanding by combining emotion detection, behavior identification, intent signals, and much more, into a single API call.

Built for Developers

REST and WebSocket APIs
Simple API key authentication
Credit-based pricing with free tier
Usage dashboard and billing controls

Example API call

curl -X POST \ https://modulate-developer-apis.com/api/velma-2-stt-batch \ -H "X-API-Key: YOUR_API_KEY" \ -F "upload_file=@audio.mp3" \ -F "speaker_diarization=true" \ -F "emotion_signal=true"

Learn More

Modulate.ai

Learn about Modulate's mission and technology

Ensemble Listening Models

Multi-model voice analysis for conversations

Velma Preview

Try Modulate's voice analysis in the browser

Ready to Build?

Create a free account and start making API calls in minutes.

Get Started