Build with Velma-2 Models
Explore Modulate's Velma-2 models — Deepfake Detection, Speech-to-text Transcription, Emotion Detection, and more.
Transcription - Low Cost, Low WER
Multilingual batch and streaming transcription at just $0.03 / hr. Includes emotion detection, PII redaction, accent detection and specialized vocabulary for medical, geographic, and political terms.
Deepfake Detection
#1 Ranked Deepfake Detection Model on 🤗 Hugging Face's Deepfake Speech Arena Leaderboard. Just $0.25 / hr, over 100x lower cost other providers.
Conversation Understanding
Coming soon — The only voice-native model that delivers true conversation understanding by combining emotion detection, behavior identification, intent signals, and much more, into a single API call.
Built for Developers
- REST and WebSocket APIs
- Simple API key authentication
- Credit-based pricing with free tier
- Usage dashboard and billing controls
Example API call
curl -X POST \ https://modulate-developer-apis.com/api/velma-2-stt-batch \ -H "X-API-Key: YOUR_API_KEY" \ -F "upload_file=@audio.mp3" \ -F "speaker_diarization=true" \ -F "emotion_signal=true"