Spaces:
Running
on
Zero
Running
on
Zero
title: Petite LLM 3 | |
emoji: ππ» | |
colorFrom: green | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 5.38.2 | |
app_file: app.py | |
pinned: true | |
license: mit | |
short_description: Smollm3 for French Understanding | |
# π€ Petite Elle L'Aime 3 - Chat Interface | |
A complete Gradio application for the [Petite Elle L'Aime 3](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft) model, featuring the full fine-tuned version for maximum performance and quality. | |
## π Features | |
- **Multilingual Support**: English, French, Italian, Portuguese, Chinese, Arabic | |
- **Full Fine-Tuned Model**: Maximum performance and quality with full precision | |
- **Interactive Chat Interface**: Real-time conversation with the model | |
- **Customizable System Prompt**: Define the assistant's personality and behavior | |
- **Thinking Mode**: Enable reasoning mode with thinking tags | |
- **Responsive Design**: Modern UI following the reference layout | |
- **Chat Template Integration**: Proper Jinja template formatting | |
- **Automatic Model Download**: Downloads full model at build time | |
## π Model Information | |
- **Base Model**: HuggingFaceTB/SmolLM3-3B | |
- **Parameters**: ~3B | |
- **Context Length**: 128k | |
- **Precision**: Full fine-tuned model (float16/float32) | |
- **Performance**: Maximum quality and accuracy | |
- **Languages**: English, French, Italian, Portuguese, Chinese, Arabic | |
## π οΈ Installation | |
1. Clone this repository: | |
```bash | |
git clone <repository-url> | |
cd Petite-LLM-3 | |
``` | |
2. Install dependencies: | |
```bash | |
pip install -r requirements.txt | |
``` | |
## π Usage | |
### Local Development | |
Run the application locally: | |
```bash | |
python app.py | |
``` | |
The application will be available at `http://localhost:7860` | |
### Hugging Face Spaces | |
This application is configured for deployment on Hugging Face Spaces with automatic model download: | |
1. **Build Process**: The `build.py` script automatically downloads the int4 model during Space build | |
2. **Model Loading**: Uses local model files when available, falls back to Hugging Face download | |
3. **Caching**: Model files are cached for faster subsequent runs | |
## ποΈ Interface Features | |
### Layout Structure | |
The interface follows the reference layout with: | |
- **Title Section**: Main heading and description | |
- **Information Panels**: Features and model information | |
- **Input Section**: Context and user input areas | |
- **Advanced Settings**: Collapsible parameter controls | |
- **Chat Interface**: Real-time conversation display | |
### System Prompt | |
- **Default**: "Tu es TonicIA, un assistant francophone rigoureux et bienveillant." | |
- **Editable**: Users can customize the system prompt to define the assistant's personality | |
- **Real-time**: Changes take effect immediately for new conversations | |
### Generation Parameters | |
- **Max Length**: Maximum number of tokens to generate (64-2048) | |
- **Temperature**: Controls randomness in generation (0.01-1.0) | |
- **Top-p**: Nucleus sampling parameter (0.1-1.0) | |
- **Enable Thinking**: Enable reasoning mode with thinking tags | |
- **Advanced Settings**: Collapsible panel for fine-tuning | |
## π§ Technical Details | |
### Model Loading Strategy | |
The application uses a smart loading strategy: | |
1. **Local Check**: First checks if full model files exist locally | |
2. **Local Loading**: If available, loads from `./model` folder | |
3. **Fallback Download**: If not available, downloads from Hugging Face | |
4. **Tokenizer**: Always uses main repo for chat template and configuration | |
### Build Process | |
For Hugging Face Spaces deployment: | |
1. **Build Script**: `build.py` runs during Space build | |
2. **Model Download**: `download_model.py` downloads full model files | |
3. **Local Storage**: Model files stored in `./model` directory | |
4. **Fast Loading**: Subsequent runs use local files | |
### Chat Template Integration | |
The application uses the custom chat template from the model, which supports: | |
- System prompt integration | |
- User and assistant message formatting | |
- Thinking mode with `<think>` tags | |
- Proper conversation flow management | |
### Memory Optimization | |
- Uses full fine-tuned model for maximum quality | |
- Automatic device detection (CUDA/CPU) | |
- Efficient tokenization and generation | |
- Float16 precision on GPU for optimal performance | |
## π Example Usage | |
1. **Basic Conversation**: | |
- Add context in the system prompt area | |
- Type your message in the user input box | |
- Click the generate button to start chatting | |
2. **Customizing System Prompt**: | |
- Edit the context in the dedicated text area | |
- Changes apply to new messages immediately | |
- Example: "Tu es un expert en programmation Python." | |
3. **Advanced Settings**: | |
- Check the "Advanced Settings" checkbox | |
- Adjust generation parameters as needed | |
- Enable/disable thinking mode | |
4. **Real-time Chat**: | |
- Messages appear in the chat interface | |
- Conversation history is maintained | |
- Responses are generated using the model's chat template | |
## π Troubleshooting | |
### Common Issues | |
1. **Model Loading Errors**: | |
- Ensure you have sufficient RAM (8GB+ recommended) | |
- Check your internet connection for model download | |
- Verify all dependencies are installed | |
2. **Generation Errors**: | |
- Try reducing the "Max Length" parameter | |
- Adjust temperature and top-p values | |
- Check the console for detailed error messages | |
3. **Performance Issues**: | |
- The full model provides maximum quality but requires more memory | |
- GPU acceleration recommended for optimal performance | |
- Consider reducing model parameters if memory is limited | |
4. **System Prompt Issues**: | |
- Ensure the system prompt is not too long (max 1000 characters) | |
- Check that the prompt follows the expected format | |
5. **Build Process Issues**: | |
- Check that `download_model.py` runs successfully | |
- Verify that model files are downloaded to `./int4` directory | |
- Ensure sufficient storage space for model files | |
## π License | |
This project is licensed under the MIT License. The underlying model is licensed under Apache 2.0. | |
## π Acknowledgments | |
- **Model**: [Tonic/petite-elle-L-aime-3-sft](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft) | |
- **Base Model**: SmolLM3-3B by HuggingFaceTB | |
- **Training Data**: legmlai/openhermes-fr | |
- **Framework**: Gradio, Transformers, PyTorch | |
- **Layout Reference**: [Tonic/Nvidia-OpenReasoning](https://huggingface.co/spaces/Tonic/Nvidia-OpenReasoning) | |
## π Links | |
- [Model on Hugging Face](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft) | |
- [Chat Template](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft/blob/main/chat_template.jinja) | |
- [Original App Reference](https://huggingface.co/spaces/Tonic/Nvidia-OpenReasoning) | |
--- | |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |