Petite-LLM-3 / README.md
Tonic's picture
Update README.md
a4b7295 verified

A newer version of the Gradio SDK is available: 5.45.0

Upgrade
metadata
title: Petite LLM 3
emoji: πŸ’ƒπŸ»
colorFrom: green
colorTo: purple
sdk: gradio
sdk_version: 5.38.2
app_file: app.py
pinned: true
license: mit
short_description: Smollm3 for French Understanding

πŸ€– Petite Elle L'Aime 3 - Chat Interface

A complete Gradio application for the Petite Elle L'Aime 3 model, featuring the full fine-tuned version for maximum performance and quality.

πŸš€ Features

  • Multilingual Support: English, French, Italian, Portuguese, Chinese, Arabic
  • Full Fine-Tuned Model: Maximum performance and quality with full precision
  • Interactive Chat Interface: Real-time conversation with the model
  • Customizable System Prompt: Define the assistant's personality and behavior
  • Thinking Mode: Enable reasoning mode with thinking tags
  • Responsive Design: Modern UI following the reference layout
  • Chat Template Integration: Proper Jinja template formatting
  • Automatic Model Download: Downloads full model at build time

πŸ“‹ Model Information

  • Base Model: HuggingFaceTB/SmolLM3-3B
  • Parameters: ~3B
  • Context Length: 128k
  • Precision: Full fine-tuned model (float16/float32)
  • Performance: Maximum quality and accuracy
  • Languages: English, French, Italian, Portuguese, Chinese, Arabic

πŸ› οΈ Installation

  1. Clone this repository:
git clone <repository-url>
cd Petite-LLM-3
  1. Install dependencies:
pip install -r requirements.txt

πŸš€ Usage

Local Development

Run the application locally:

python app.py

The application will be available at http://localhost:7860

Hugging Face Spaces

This application is configured for deployment on Hugging Face Spaces with automatic model download:

  1. Build Process: The build.py script automatically downloads the int4 model during Space build
  2. Model Loading: Uses local model files when available, falls back to Hugging Face download
  3. Caching: Model files are cached for faster subsequent runs

πŸŽ›οΈ Interface Features

Layout Structure

The interface follows the reference layout with:

  • Title Section: Main heading and description
  • Information Panels: Features and model information
  • Input Section: Context and user input areas
  • Advanced Settings: Collapsible parameter controls
  • Chat Interface: Real-time conversation display

System Prompt

  • Default: "Tu es TonicIA, un assistant francophone rigoureux et bienveillant."
  • Editable: Users can customize the system prompt to define the assistant's personality
  • Real-time: Changes take effect immediately for new conversations

Generation Parameters

  • Max Length: Maximum number of tokens to generate (64-2048)
  • Temperature: Controls randomness in generation (0.01-1.0)
  • Top-p: Nucleus sampling parameter (0.1-1.0)
  • Enable Thinking: Enable reasoning mode with thinking tags
  • Advanced Settings: Collapsible panel for fine-tuning

πŸ”§ Technical Details

Model Loading Strategy

The application uses a smart loading strategy:

  1. Local Check: First checks if full model files exist locally
  2. Local Loading: If available, loads from ./model folder
  3. Fallback Download: If not available, downloads from Hugging Face
  4. Tokenizer: Always uses main repo for chat template and configuration

Build Process

For Hugging Face Spaces deployment:

  1. Build Script: build.py runs during Space build
  2. Model Download: download_model.py downloads full model files
  3. Local Storage: Model files stored in ./model directory
  4. Fast Loading: Subsequent runs use local files

Chat Template Integration

The application uses the custom chat template from the model, which supports:

  • System prompt integration
  • User and assistant message formatting
  • Thinking mode with <think> tags
  • Proper conversation flow management

Memory Optimization

  • Uses full fine-tuned model for maximum quality
  • Automatic device detection (CUDA/CPU)
  • Efficient tokenization and generation
  • Float16 precision on GPU for optimal performance

πŸ“ Example Usage

  1. Basic Conversation:

    • Add context in the system prompt area
    • Type your message in the user input box
    • Click the generate button to start chatting
  2. Customizing System Prompt:

    • Edit the context in the dedicated text area
    • Changes apply to new messages immediately
    • Example: "Tu es un expert en programmation Python."
  3. Advanced Settings:

    • Check the "Advanced Settings" checkbox
    • Adjust generation parameters as needed
    • Enable/disable thinking mode
  4. Real-time Chat:

    • Messages appear in the chat interface
    • Conversation history is maintained
    • Responses are generated using the model's chat template

πŸ› Troubleshooting

Common Issues

  1. Model Loading Errors:

    • Ensure you have sufficient RAM (8GB+ recommended)
    • Check your internet connection for model download
    • Verify all dependencies are installed
  2. Generation Errors:

    • Try reducing the "Max Length" parameter
    • Adjust temperature and top-p values
    • Check the console for detailed error messages
  3. Performance Issues:

    • The full model provides maximum quality but requires more memory
    • GPU acceleration recommended for optimal performance
    • Consider reducing model parameters if memory is limited
  4. System Prompt Issues:

    • Ensure the system prompt is not too long (max 1000 characters)
    • Check that the prompt follows the expected format
  5. Build Process Issues:

    • Check that download_model.py runs successfully
    • Verify that model files are downloaded to ./int4 directory
    • Ensure sufficient storage space for model files

πŸ“„ License

This project is licensed under the MIT License. The underlying model is licensed under Apache 2.0.

πŸ™ Acknowledgments

πŸ”— Links


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference