Files
supertonic/web
2025-11-23 15:54:59 +09:00
..
2025-11-23 15:54:59 +09:00
2025-11-23 15:54:59 +09:00
2025-11-23 15:54:59 +09:00
2025-11-23 15:54:59 +09:00
2025-11-23 15:54:59 +09:00
2025-11-23 15:54:59 +09:00
2025-11-23 15:54:59 +09:00
2025-11-23 15:54:59 +09:00
2025-11-23 15:54:59 +09:00

Supertonic Web Example

This example demonstrates how to use Supertonic in a web browser using ONNX Runtime Web.

📰 Update News

2025.11.23 - Enhanced text preprocessing with comprehensive normalization, emoji removal, symbol replacement, and punctuation handling for improved synthesis quality.

2025.11.19 - Added speed control slider to adjust speech synthesis speed (default: 1.05, recommended range: 0.9-1.5).

2025.11.19 - Added automatic text chunking for long-form inference. Long texts are split into chunks and synthesized with natural pauses.

Features

  • 🌐 Runs entirely in the browser (no server required for inference)
  • 🚀 WebGPU support with automatic fallback to WebAssembly
  • Pre-extracted voice styles for instant generation
  • 🎨 Modern, responsive UI
  • 🎭 Multiple voice style presets (2 Male, 2 Female)
  • 💾 Download generated audio as WAV files
  • 📊 Detailed generation statistics (audio length, generation time)
  • ⏱️ Real-time progress tracking

Requirements

  • Node.js (for development server)
  • Modern web browser (Chrome, Edge, Firefox, Safari)

Installation

  1. Install dependencies:
npm install

Running the Demo

Start the development server:

npm run dev

This will start a local development server (usually at http://localhost:3000) and open the demo in your browser.

Usage

  1. Wait for Models to Load: The app will automatically load models and the default voice style (M1)
  2. Select Voice Style: Choose from available voice presets
    • Male 1 (M1): Default male voice
    • Male 2 (M2): Alternative male voice
    • Female 1 (F1): Default female voice
    • Female 2 (F2): Alternative female voice
  3. Enter Text: Type or paste the text you want to convert to speech
  4. Adjust Settings (optional):
    • Total Steps: More steps = better quality but slower (default: 5)
  5. Generate Speech: Click the "Generate Speech" button
  6. View Results:
    • See the full input text
    • View audio length and generation time statistics
    • Play the generated audio in the browser
    • Download as WAV file

Technical Details

Browser Compatibility

This demo uses:

  • ONNX Runtime Web: For running models in the browser
  • Web Audio API: For playing generated audio
  • Vite: For development and bundling

Notes

  • The ONNX models must be accessible at assets/onnx/ relative to the web root
  • Voice style JSON files must be accessible at assets/voice_styles/ relative to the web root
  • Pre-extracted voice styles enable instant generation without audio processing
  • Four voice style presets are provided (M1, M2, F1, F2)

Troubleshooting

Models not loading

  • Check browser console for errors
  • Ensure assets/onnx/ path is correct and models are accessible
  • Check CORS settings if serving from a different domain

WebGPU not available

  • WebGPU is only available in recent Chrome/Edge browsers (version 113+)
  • The app will automatically fall back to WebAssembly if WebGPU is not available
  • Check the backend badge to see which execution provider is being used

Out of memory errors

  • Try shorter text inputs
  • Reduce denoising steps
  • Use a browser with more available memory
  • Close other tabs to free up memory

Audio quality issues

  • Try different voice style presets
  • Increase denoising steps for better quality

Slow generation

  • If using WebAssembly, try a browser that supports WebGPU
  • Ensure no other heavy processes are running
  • Consider using fewer denoising steps for faster (but lower quality) results