Voices Enables Fast Text-to-Speech in Java Applications
Voices: Fast Text-to-Speech for Java Applications
Date: 2025-11-16 10:16 Zhejiang

Voices enables Java applications to quickly implement text-to-speech (TTS) conversion.

---
Overview
Voices is an open-source TTS project designed for Java 17+.
Key features:
- No external APIs required.
- No manual installation of additional software.
- Supports multiple languages via dictionaries or OpenVoice.
- Generates audio files directly within Java applications.
Background
- Creator: Henry Coles — also known for Pitest and Arcmutate mutation testing.
- First introduced on Bluesky: September 2025.
- Latest release: Version 0.0.8 (October 2025).
Technology
- Built on ONNX Runtime — a cross-platform AI engine optimized for both training and inference.
- Supports models from TensorFlow, PyTorch, and more.
- Can accelerate on hardware processors where available.
---
Getting Started
Maven Configuration
Include the following dependencies in your `pom.xml`:
org.pitest.voices
chorus
0.0.8
org.pitest.voices
alba
0.0.8
org.pitest.voices
en_uk
0.0.8
com.microsoft.onnxruntime
onnxruntime
1.22.0
Tips:
- Replace `en_uk` with `en_us` for American English pronunciation.
- Swap `onnxruntime` for `onnxruntime_gpu` to enable GPU acceleration.
- Use a single `Chorus` instance to minimize model loading overhead.
---
Example: English Text to Audio
ChorusConfig config = chorusConfig(EnUkDictionary.en_uk());
try (Chorus chorus = new Chorus(config)) {
Voice alba = chorus.voice(Alba.albaMedium());
Audio audio = alba.say("This is the InfoQ article about the Voices library");
Path path = Paths.get("InfoQ_English");
audio.save(path);
}---
Runtime Model Download
Add the downloader dependency:
org.pitest.voices
model-downloader
0.0.8
Available factory classes:
- `org.pitest.voices.download.Models`
- `org.pitest.voices.download.UsModels`
- `org.pitest.voices.download.NonEnglishModels`
Example: Dutch Text to Speech
Model nlModel = NonEnglishModels.nlNLRonnie();
ChorusConfig config = chorusConfig(EnUkDictionary.en_uk());
try (Chorus chorus = new Chorus(config)) {
Voice alba = chorus.voice(nlModel);
Audio audio = alba.say("Dit is een Nederlandse tekst Scheveningen");
Path path = Paths.get("Dutch");
audio.save(path);
}---
Using OpenVoice (No Dictionaries Required)
Dependency
org.pitest.voices
openvoice-phonemizer
0.0.8
Note:
- File size: 50MB (vs. 3MB for dictionary).
- Higher computational requirements.
Example
ChorusConfig config = chorusConfig(Dictionaries.empty())
.withModel(new OpenVoiceSupplier());
try (Chorus chorus = new Chorus(config)) {
Voice alba = chorus.voice(Alba.albaMedium());
Audio audio = alba.say("This is the InfoQ article about the Voices library");
Path path = Paths.get("InfoQ_English_OpenVoice");
audio.save(path);
}---
GPU Support
To run on GPU:
- Remove `onnxruntime` dependency.
- Add `onnxruntime_gpu` dependency.
- Use `gpuChorusConfig`:
ChorusConfig config = gpuChorusConfig(EnUkDictionary.en_uk());Default: GPU 0.
Custom CUDA options: use `.withCudaOptions()` in `ChorusConfig`.
---
Pause & Markdown Handling
- Markdown characters (`#`, `---`, em or en dashes) trigger pauses in audio output.
- Pause behavior can be customized via `ChorusConfig`.
---
Alternatives
Other TTS libraries like Sherpa Onnx or MaryTTS may have:
- More difficult Maven integration.
- Lower audio quality.
---
Interview Highlights: Henry Coles on Voices
Common Use Cases
> “Whenever you need to quickly generate natural-sounding speech without external services.”
Motivation
> “I needed a Java-native way to run piper models without relying on Python services.”
Challenges
- No linguistics background.
- Porting TypeScript logic to Java.
- Handling phonemisation exceptions with dictionaries.
Future Improvements
- Cleaner API design.
- Adjustable pause handling and speech pace.
Testing Recommendation
- Focus on input boundary tests.
- Moderate output tests to confirm audio generation.
---
Related AI Content Monetization
Platforms like AiToEarn官网 integrate:
- AI content generation.
- Cross-platform publishing (Douyin, Kwai, WeChat, Bilibili, Rednote, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X).
- Analytics & AI Model Rankings.
This ecosystem complements Voices by enabling multi-channel distribution of AI-generated speech content.
---
References
- Original Article: Voices Enables Fast Text-to-Speech for Java Applications
- Disclaimer: This is an InfoQ translation — reproduction without permission is prohibited.
---
Recommended Reads
- Cursor's $200B Valuation & Founder's Secrets
- Baidu Unveils Self-Evolving Intelligent Agent
- Meta's Internal Competition & Yann LeCun's Exit
- Jeff Barr on AI's Impact on Developers
---
Would you like me to also prepare a concise Quick Start guide version of this Voices documentation, so developers can implement TTS in under 5 minutes? That could make this Markdown even more actionable.