Voices Enables Fast Text-to-Speech in Java Applications

Voices Enables Fast Text-to-Speech in Java Applications

Voices: Fast Text-to-Speech for Java Applications

Date: 2025-11-16 10:16 Zhejiang

image

Voices enables Java applications to quickly implement text-to-speech (TTS) conversion.

image

---

Overview

Voices is an open-source TTS project designed for Java 17+.

Key features:

  • No external APIs required.
  • No manual installation of additional software.
  • Supports multiple languages via dictionaries or OpenVoice.
  • Generates audio files directly within Java applications.

Background

  • Creator: Henry Coles — also known for Pitest and Arcmutate mutation testing.
  • First introduced on Bluesky: September 2025.
  • Latest release: Version 0.0.8 (October 2025).

Technology

  • Built on ONNX Runtime — a cross-platform AI engine optimized for both training and inference.
  • Supports models from TensorFlow, PyTorch, and more.
  • Can accelerate on hardware processors where available.

---

Getting Started

Maven Configuration

Include the following dependencies in your `pom.xml`:



    org.pitest.voices
    chorus
    0.0.8




    org.pitest.voices
    alba
    0.0.8




    org.pitest.voices
    en_uk
    0.0.8




    com.microsoft.onnxruntime
    onnxruntime
    1.22.0

Tips:

  • Replace `en_uk` with `en_us` for American English pronunciation.
  • Swap `onnxruntime` for `onnxruntime_gpu` to enable GPU acceleration.
  • Use a single `Chorus` instance to minimize model loading overhead.

---

Example: English Text to Audio

ChorusConfig config = chorusConfig(EnUkDictionary.en_uk());
try (Chorus chorus = new Chorus(config)) {
    Voice alba = chorus.voice(Alba.albaMedium());
    Audio audio = alba.say("This is the InfoQ article about the Voices library");
    Path path = Paths.get("InfoQ_English");
    audio.save(path);
}

---

Runtime Model Download

Add the downloader dependency:


    org.pitest.voices
    model-downloader
    0.0.8

Available factory classes:

  • `org.pitest.voices.download.Models`
  • `org.pitest.voices.download.UsModels`
  • `org.pitest.voices.download.NonEnglishModels`

Example: Dutch Text to Speech

Model nlModel = NonEnglishModels.nlNLRonnie();
ChorusConfig config = chorusConfig(EnUkDictionary.en_uk());
try (Chorus chorus = new Chorus(config)) {
    Voice alba = chorus.voice(nlModel);
    Audio audio = alba.say("Dit is een Nederlandse tekst Scheveningen");
    Path path = Paths.get("Dutch");
    audio.save(path);
}

---

Using OpenVoice (No Dictionaries Required)

Dependency


    org.pitest.voices
    openvoice-phonemizer
    0.0.8

Note:

  • File size: 50MB (vs. 3MB for dictionary).
  • Higher computational requirements.

Example

ChorusConfig config = chorusConfig(Dictionaries.empty())
        .withModel(new OpenVoiceSupplier());
try (Chorus chorus = new Chorus(config)) {
    Voice alba = chorus.voice(Alba.albaMedium());
    Audio audio = alba.say("This is the InfoQ article about the Voices library");
    Path path = Paths.get("InfoQ_English_OpenVoice");
    audio.save(path);
}

---

GPU Support

To run on GPU:

  • Remove `onnxruntime` dependency.
  • Add `onnxruntime_gpu` dependency.
  • Use `gpuChorusConfig`:
ChorusConfig config = gpuChorusConfig(EnUkDictionary.en_uk());

Default: GPU 0.

Custom CUDA options: use `.withCudaOptions()` in `ChorusConfig`.

---

Pause & Markdown Handling

  • Markdown characters (`#`, `---`, em or en dashes) trigger pauses in audio output.
  • Pause behavior can be customized via `ChorusConfig`.

---

Alternatives

Other TTS libraries like Sherpa Onnx or MaryTTS may have:

  • More difficult Maven integration.
  • Lower audio quality.

---

Interview Highlights: Henry Coles on Voices

Common Use Cases

> “Whenever you need to quickly generate natural-sounding speech without external services.”

Motivation

> “I needed a Java-native way to run piper models without relying on Python services.”

Challenges

  • No linguistics background.
  • Porting TypeScript logic to Java.
  • Handling phonemisation exceptions with dictionaries.

Future Improvements

  • Cleaner API design.
  • Adjustable pause handling and speech pace.

Testing Recommendation

  • Focus on input boundary tests.
  • Moderate output tests to confirm audio generation.

---

Platforms like AiToEarn官网 integrate:

  • AI content generation.
  • Cross-platform publishing (Douyin, Kwai, WeChat, Bilibili, Rednote, Facebook, Instagram, LinkedIn, Threads, YouTube, Pinterest, X).
  • Analytics & AI Model Rankings.

This ecosystem complements Voices by enabling multi-channel distribution of AI-generated speech content.

---

References

---

---

Would you like me to also prepare a concise Quick Start guide version of this Voices documentation, so developers can implement TTS in under 5 minutes? That could make this Markdown even more actionable.

Read more

Translate the following blog post title into English, concise and natural. Return plain text only without quotes. 哈佛大学 R 编程课程介绍

Harvard CS50: Introduction to Programming with R Harvard University offers exceptional beginner-friendly computer science courses. We’re excited to announce the release of Harvard CS50’s Introduction to Programming in R, a powerful language widely used for statistical computing, data science, and graphics. This course was developed by Carter Zenke.