Spaces:
Running
on
Zero
A newer version of the Gradio SDK is available:
5.45.0
title: Petite LLM 3
emoji: ππ»
colorFrom: green
colorTo: purple
sdk: gradio
sdk_version: 5.38.2
app_file: app.py
pinned: true
license: mit
short_description: Smollm3 for French Understanding
π€ Petite Elle L'Aime 3 - Chat Interface
A complete Gradio application for the Petite Elle L'Aime 3 model, featuring the full fine-tuned version for maximum performance and quality.
π Features
- Multilingual Support: English, French, Italian, Portuguese, Chinese, Arabic
- Full Fine-Tuned Model: Maximum performance and quality with full precision
- Interactive Chat Interface: Real-time conversation with the model
- Customizable System Prompt: Define the assistant's personality and behavior
- Thinking Mode: Enable reasoning mode with thinking tags
- Responsive Design: Modern UI following the reference layout
- Chat Template Integration: Proper Jinja template formatting
- Automatic Model Download: Downloads full model at build time
π Model Information
- Base Model: HuggingFaceTB/SmolLM3-3B
- Parameters: ~3B
- Context Length: 128k
- Precision: Full fine-tuned model (float16/float32)
- Performance: Maximum quality and accuracy
- Languages: English, French, Italian, Portuguese, Chinese, Arabic
π οΈ Installation
- Clone this repository:
git clone <repository-url>
cd Petite-LLM-3
- Install dependencies:
pip install -r requirements.txt
π Usage
Local Development
Run the application locally:
python app.py
The application will be available at http://localhost:7860
Hugging Face Spaces
This application is configured for deployment on Hugging Face Spaces with automatic model download:
- Build Process: The
build.py
script automatically downloads the int4 model during Space build - Model Loading: Uses local model files when available, falls back to Hugging Face download
- Caching: Model files are cached for faster subsequent runs
ποΈ Interface Features
Layout Structure
The interface follows the reference layout with:
- Title Section: Main heading and description
- Information Panels: Features and model information
- Input Section: Context and user input areas
- Advanced Settings: Collapsible parameter controls
- Chat Interface: Real-time conversation display
System Prompt
- Default: "Tu es TonicIA, un assistant francophone rigoureux et bienveillant."
- Editable: Users can customize the system prompt to define the assistant's personality
- Real-time: Changes take effect immediately for new conversations
Generation Parameters
- Max Length: Maximum number of tokens to generate (64-2048)
- Temperature: Controls randomness in generation (0.01-1.0)
- Top-p: Nucleus sampling parameter (0.1-1.0)
- Enable Thinking: Enable reasoning mode with thinking tags
- Advanced Settings: Collapsible panel for fine-tuning
π§ Technical Details
Model Loading Strategy
The application uses a smart loading strategy:
- Local Check: First checks if full model files exist locally
- Local Loading: If available, loads from
./model
folder - Fallback Download: If not available, downloads from Hugging Face
- Tokenizer: Always uses main repo for chat template and configuration
Build Process
For Hugging Face Spaces deployment:
- Build Script:
build.py
runs during Space build - Model Download:
download_model.py
downloads full model files - Local Storage: Model files stored in
./model
directory - Fast Loading: Subsequent runs use local files
Chat Template Integration
The application uses the custom chat template from the model, which supports:
- System prompt integration
- User and assistant message formatting
- Thinking mode with
<think>
tags - Proper conversation flow management
Memory Optimization
- Uses full fine-tuned model for maximum quality
- Automatic device detection (CUDA/CPU)
- Efficient tokenization and generation
- Float16 precision on GPU for optimal performance
π Example Usage
Basic Conversation:
- Add context in the system prompt area
- Type your message in the user input box
- Click the generate button to start chatting
Customizing System Prompt:
- Edit the context in the dedicated text area
- Changes apply to new messages immediately
- Example: "Tu es un expert en programmation Python."
Advanced Settings:
- Check the "Advanced Settings" checkbox
- Adjust generation parameters as needed
- Enable/disable thinking mode
Real-time Chat:
- Messages appear in the chat interface
- Conversation history is maintained
- Responses are generated using the model's chat template
π Troubleshooting
Common Issues
Model Loading Errors:
- Ensure you have sufficient RAM (8GB+ recommended)
- Check your internet connection for model download
- Verify all dependencies are installed
Generation Errors:
- Try reducing the "Max Length" parameter
- Adjust temperature and top-p values
- Check the console for detailed error messages
Performance Issues:
- The full model provides maximum quality but requires more memory
- GPU acceleration recommended for optimal performance
- Consider reducing model parameters if memory is limited
System Prompt Issues:
- Ensure the system prompt is not too long (max 1000 characters)
- Check that the prompt follows the expected format
Build Process Issues:
- Check that
download_model.py
runs successfully - Verify that model files are downloaded to
./int4
directory - Ensure sufficient storage space for model files
- Check that
π License
This project is licensed under the MIT License. The underlying model is licensed under Apache 2.0.
π Acknowledgments
- Model: Tonic/petite-elle-L-aime-3-sft
- Base Model: SmolLM3-3B by HuggingFaceTB
- Training Data: legmlai/openhermes-fr
- Framework: Gradio, Transformers, PyTorch
- Layout Reference: Tonic/Nvidia-OpenReasoning
π Links
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference