ai-space/supertonic

Fork 0

mirror of https://github.com/supertone-inc/supertonic.git synced 2026-06-02 01:38:48 +02:00

Files

T

ANLGBOY 8d42b55965 Fix text normalization bug (#16 )

2025-11-23 13:18:15 +09:00

3.5 KiB

Raw Blame History

Supertonic Web Example

This example demonstrates how to use Supertonic in a web browser using ONNX Runtime Web.

📰 Update News

2025.11.23 - Enhanced text preprocessing with comprehensive normalization, emoji removal, symbol replacement, and punctuation handling for improved synthesis quality.

2025.11.19 - Added speed control slider to adjust speech synthesis speed (default: 1.05, recommended range: 0.9-1.5).

2025.11.19 - Added automatic text chunking for long-form inference. Long texts are split into chunks and synthesized with natural pauses.

Features

🌐 Runs entirely in the browser (no server required for inference)
🚀 WebGPU support with automatic fallback to WebAssembly
⚡ Pre-extracted voice styles for instant generation
🎨 Modern, responsive UI
🎭 Multiple voice style presets (2 Male, 2 Female)
💾 Download generated audio as WAV files
📊 Detailed generation statistics (audio length, generation time)
⏱️ Real-time progress tracking

Requirements

Node.js (for development server)
Modern web browser (Chrome, Edge, Firefox, Safari)

Installation

Install dependencies:

npm install

Running the Demo

Start the development server:

npm run dev

This will start a local development server (usually at http://localhost:3000) and open the demo in your browser.

Usage

Wait for Models to Load: The app will automatically load models and the default voice style (M1)
Select Voice Style: Choose from available voice presets
- Male 1 (M1): Default male voice
- Male 2 (M2): Alternative male voice
- Female 1 (F1): Default female voice
- Female 2 (F2): Alternative female voice
Enter Text: Type or paste the text you want to convert to speech
Adjust Settings (optional):
- Total Steps: More steps = better quality but slower (default: 5)
Generate Speech: Click the "Generate Speech" button
View Results:
- See the full input text
- View audio length and generation time statistics
- Play the generated audio in the browser
- Download as WAV file

Technical Details

Browser Compatibility

This demo uses:

ONNX Runtime Web: For running models in the browser
Web Audio API: For playing generated audio
Vite: For development and bundling

Notes

The ONNX models must be accessible at assets/onnx/ relative to the web root
Voice style JSON files must be accessible at assets/voice_styles/ relative to the web root
Pre-extracted voice styles enable instant generation without audio processing
Four voice style presets are provided (M1, M2, F1, F2)

Troubleshooting

Models not loading

Check browser console for errors
Ensure assets/onnx/ path is correct and models are accessible
Check CORS settings if serving from a different domain

WebGPU not available

WebGPU is only available in recent Chrome/Edge browsers (version 113+)
The app will automatically fall back to WebAssembly if WebGPU is not available
Check the backend badge to see which execution provider is being used

Out of memory errors

Try shorter text inputs
Reduce denoising steps
Use a browser with more available memory
Close other tabs to free up memory

Audio quality issues

Try different voice style presets
Increase denoising steps for better quality

Slow generation

If using WebAssembly, try a browser that supports WebGPU
Ensure no other heavy processes are running
Consider using fewer denoising steps for faster (but lower quality) results

3.5 KiB Raw Blame History