ttsfm / CHANGELOG.md
NitinBot001's picture
Upload 38 files
3ca5f72 verified

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[3.2.3] - 2025-06-27

πŸ”„ Enhanced OpenAI API Compatibility

This release consolidates the OpenAI-compatible API endpoints and introduces intelligent auto-combine functionality.

✨ Added

  • Auto-Combine Parameter: New optional auto_combine parameter in /v1/audio/speech endpoint (default: true)
  • Intelligent Text Handling: Automatically detects long text and combines audio chunks when auto_combine=true
  • Enhanced Error Messages: Better error handling for long text when auto-combine is disabled
  • Response Headers: Added X-Auto-Combine and X-Chunks-Combined headers for transparency

πŸ”„ Changed

  • Unified Endpoint: Combined /v1/audio/speech and /v1/audio/speech-combined into single endpoint
  • Backward Compatibility: Maintains full OpenAI API compatibility while adding TTSFM-specific features
  • Default Behavior: Long text is now automatically split and combined by default (can be disabled)

πŸ—‘οΈ Removed

  • Deprecated Endpoint: Removed /v1/audio/speech-combined endpoint (functionality moved to main endpoint)
  • Legacy Web Options: Removed confusing batch processing options from web interface for cleaner UX
  • Complex UI Elements: Simplified playground interface to focus on auto-combine

🧹 Streamlined Web Experience

  • User-Focused Design: Web interface now emphasizes auto-combine as the primary approach
  • Developer Features Preserved: All advanced functionality remains in Python package
  • Clear Separation: Web for users, Python package for developers

πŸ“‹ Migration Guide

  • No Breaking Changes: Existing API calls continue to work unchanged
  • Long Text: Now automatically handled by default - no need to use separate endpoint
  • Disable Auto-Combine: Add "auto_combine": false to request body to get original behavior

[3.2.2] - 2025-06-26

🎡 Combined Audio Functionality

This release introduces the revolutionary combined audio feature that allows generating single, seamless audio files from long text content.

✨ Added

  • Combined Audio Endpoints: New /api/generate-combined and /v1/audio/speech-combined endpoints
  • Intelligent Text Splitting: Smart algorithm that splits text at sentence boundaries, then word boundaries, preserving natural speech flow
  • Seamless Audio Combination: Professional audio processing to merge chunks into single continuous files
  • OpenAI Compatibility: Full OpenAI TTS API compatibility for combined audio generation
  • Advanced Fallback System: Multiple fallback mechanisms for audio combination (PyDub β†’ WAV concatenation β†’ raw concatenation)
  • Rich Metadata: Response headers with chunk count, file size, and processing information
  • Comprehensive Testing: Full test suite with unit tests, integration tests, and performance benchmarks

πŸ”„ Changed

  • Extended Character Limits: No longer limited to 4096 characters per request
  • Enhanced Web Interface: Updated documentation with combined audio endpoint information
  • Improved Error Handling: Better validation and error messages for long text processing

πŸ› οΈ Technical Features

  • Concurrent Processing: Parallel chunk processing for faster generation
  • Memory Optimization: Efficient memory usage for large text processing
  • Format Support: Works with all supported audio formats (MP3, WAV, OPUS, AAC, FLAC, PCM)
  • Performance Monitoring: Built-in performance tracking and optimization
  • Unicode Support: Full Unicode text handling for international content

πŸ“‹ Use Cases

  • Long Articles: Convert blog posts and articles to single audio files
  • Audiobooks: Generate chapters as continuous audio
  • Educational Content: Transform learning materials to audio format
  • Accessibility: Enhanced support for visually impaired users
  • Podcast Creation: Convert scripts to professional audio content

[3.1.0] - 2024-12-19

πŸ”§ Format Support Improvements

This release focuses on fixing audio format handling and improving format delivery optimization.

✨ Added

  • Smart Header Selection: Intelligent HTTP header selection to optimize format delivery from openai.fm service
  • Format Mapping Functions: Helper functions for better format handling and optimization
  • Enhanced Web Interface: Improved format selection with detailed descriptions for each format
  • Comprehensive Format Documentation: Updated README and documentation with complete format information

πŸ”„ Changed

  • File Naming Logic: Files are now saved with extensions based on the actual returned format, not the requested format
  • Enhanced Logging: Added format-specific log messages for better debugging
  • Web API Enhancement: /api/formats endpoint now provides detailed information about all supported formats
  • Documentation Updates: README and package documentation now include comprehensive format guides

πŸ› Fixed

  • MAJOR FIX: Resolved file naming issue where files were saved with incorrect double extensions (e.g., test.wav.mp3, test.opus.wav)
  • Correct File Extensions: Files now save with proper single extensions based on actual audio format (e.g., test.mp3, test.wav)
  • Format Optimization: Improved format delivery through smart request optimization
  • Format Handling: Better handling of all supported audio formats

πŸ“ Technical Details

  • Format Optimization: Smart request optimization to deliver the best quality for each format
  • Backward Compatibility: Existing code continues to work unchanged
  • Enhanced Format Support: Improved support for all 6 audio formats (MP3, WAV, OPUS, AAC, FLAC, PCM)

[3.0.0] - 2025-06-06

πŸŽ‰ First Python Package Release

This is the first release of TTSFM as an installable Python package. Previous versions (v1.x and v2.x) were service-only releases that provided the API server but not a pip-installable package.

✨ Added

  • Complete Package Restructure: Modern Python package structure with proper typing
  • Async Support: Full asynchronous client implementation with asyncio
  • OpenAI API Compatibility: Drop-in replacement for OpenAI TTS API
  • Type Hints: Complete type annotation support throughout the codebase
  • CLI Interface: Command-line tool for easy TTS generation
  • Web Application: Optional Flask-based web interface
  • Docker Support: Multi-architecture Docker images (linux/amd64, linux/arm64)
  • Comprehensive Error Handling: Detailed exception hierarchy
  • Multiple Audio Formats: Support for MP3, WAV, FLAC, and more
  • Voice Options: Multiple voice models (alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer)
  • Text Processing: Automatic text length validation and splitting
  • Rate Limiting: Built-in rate limiting and retry mechanisms
  • Configuration: Environment variable and configuration file support

πŸ”§ Technical Improvements

  • Modern Build System: Using pyproject.toml with setuptools
  • GitHub Actions: Automated Docker builds and PyPI publishing
  • Development Tools: Pre-commit hooks, linting, testing setup
  • Documentation: Comprehensive README and inline documentation
  • Package Management: Proper dependency management with optional extras

🌐 API Changes

  • Breaking: Complete API redesign for better usability
  • OpenAI Compatible: /v1/audio/speech endpoint compatibility
  • RESTful Design: Clean REST API design
  • Health Checks: Built-in health check endpoints
  • CORS Support: Cross-origin resource sharing enabled

πŸ“¦ Installation Options

# Basic installation
pip install ttsfm

# With web application support
pip install ttsfm[web]

# With development tools
pip install ttsfm[dev]

# Docker
docker run -p 8000:8000 ghcr.io/dbccccccc/ttsfm:latest

πŸš€ Quick Start

from ttsfm import TTSClient, Voice

client = TTSClient()
response = client.generate_speech(
    text="Hello! This is TTSFM v3.0.0",
    voice=Voice.CORAL
)

with open("speech.mp3", "wb") as f:
    f.write(response.audio_data)

πŸ“¦ Package vs Service History

Important Note: This v3.0.0 is the first release of TTSFM as a Python package available on PyPI. Previous versions (v1.x and v2.x) were service/API server releases only and were not available as installable packages.

  • v1.x - v2.x: Service releases (API server only, not pip-installable)
  • v3.0.0+: Full Python package releases (pip-installable with service capabilities)

πŸ› Bug Fixes

  • Fixed Docker build issues with dependency resolution
  • Improved error handling and user feedback
  • Better handling of long text inputs
  • Enhanced stability and performance

πŸ“š Documentation

  • Complete API documentation
  • Usage examples and tutorials
  • Docker deployment guide
  • Development setup instructions

Previous Service Releases (Not Available as Python Packages)

The following versions were service/API server releases only and were not available as pip-installable packages:

[2.0.0-alpha9] - 2025-04-09

  • Service improvements (alpha release)

[2.0.0-alpha8] - 2025-04-09

  • Service improvements (alpha release)

[2.0.0-alpha7] - 2025-04-07

  • Service improvements (alpha release)

[2.0.0-alpha6] - 2025-04-07

  • Service improvements (alpha release)

[2.0.0-alpha5] - 2025-04-07

  • Service improvements (alpha release)

[2.0.0-alpha4] - 2025-04-07

  • Service improvements (alpha release)

[2.0.0-alpha3] - 2025-04-07

  • Service improvements (alpha release)

[2.0.0-alpha2] - 2025-04-07

  • Service improvements (alpha release)

[2.0.0-alpha1] - 2025-04-07

  • Alpha release (DO NOT USE)

[1.3.0] - 2025-03-28

  • Support for additional audio file formats in the API
  • Alignment with formats supported by the official API

[1.2.2] - 2025-03-28

  • Fixed Docker support

[1.2.1] - 2025-03-28

  • Color change for indicator for status
  • Voice preview on webpage for each voice

[1.2.0] - 2025-03-26

  • Enhanced stability and availability by implementing advanced request handling mechanisms
  • Removed the proxy pool

[1.1.2] - 2025-03-26

  • Version display on webpage
  • Last version of 1.1.x

[1.1.1] - 2025-03-26

  • Build fixes

[1.1.0] - 2025-03-26

  • Project restructuring for better future development experiences
  • Added .env settings

[1.0.0] - 2025-03-26

  • First service release