Petite-LLM-3 / README.md
Tonic's picture
Update README.md
a4b7295 verified
---
title: Petite LLM 3
emoji: πŸ’ƒπŸ»
colorFrom: green
colorTo: purple
sdk: gradio
sdk_version: 5.38.2
app_file: app.py
pinned: true
license: mit
short_description: Smollm3 for French Understanding
---
# πŸ€– Petite Elle L'Aime 3 - Chat Interface
A complete Gradio application for the [Petite Elle L'Aime 3](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft) model, featuring the full fine-tuned version for maximum performance and quality.
## πŸš€ Features
- **Multilingual Support**: English, French, Italian, Portuguese, Chinese, Arabic
- **Full Fine-Tuned Model**: Maximum performance and quality with full precision
- **Interactive Chat Interface**: Real-time conversation with the model
- **Customizable System Prompt**: Define the assistant's personality and behavior
- **Thinking Mode**: Enable reasoning mode with thinking tags
- **Responsive Design**: Modern UI following the reference layout
- **Chat Template Integration**: Proper Jinja template formatting
- **Automatic Model Download**: Downloads full model at build time
## πŸ“‹ Model Information
- **Base Model**: HuggingFaceTB/SmolLM3-3B
- **Parameters**: ~3B
- **Context Length**: 128k
- **Precision**: Full fine-tuned model (float16/float32)
- **Performance**: Maximum quality and accuracy
- **Languages**: English, French, Italian, Portuguese, Chinese, Arabic
## πŸ› οΈ Installation
1. Clone this repository:
```bash
git clone <repository-url>
cd Petite-LLM-3
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
## πŸš€ Usage
### Local Development
Run the application locally:
```bash
python app.py
```
The application will be available at `http://localhost:7860`
### Hugging Face Spaces
This application is configured for deployment on Hugging Face Spaces with automatic model download:
1. **Build Process**: The `build.py` script automatically downloads the int4 model during Space build
2. **Model Loading**: Uses local model files when available, falls back to Hugging Face download
3. **Caching**: Model files are cached for faster subsequent runs
## πŸŽ›οΈ Interface Features
### Layout Structure
The interface follows the reference layout with:
- **Title Section**: Main heading and description
- **Information Panels**: Features and model information
- **Input Section**: Context and user input areas
- **Advanced Settings**: Collapsible parameter controls
- **Chat Interface**: Real-time conversation display
### System Prompt
- **Default**: "Tu es TonicIA, un assistant francophone rigoureux et bienveillant."
- **Editable**: Users can customize the system prompt to define the assistant's personality
- **Real-time**: Changes take effect immediately for new conversations
### Generation Parameters
- **Max Length**: Maximum number of tokens to generate (64-2048)
- **Temperature**: Controls randomness in generation (0.01-1.0)
- **Top-p**: Nucleus sampling parameter (0.1-1.0)
- **Enable Thinking**: Enable reasoning mode with thinking tags
- **Advanced Settings**: Collapsible panel for fine-tuning
## πŸ”§ Technical Details
### Model Loading Strategy
The application uses a smart loading strategy:
1. **Local Check**: First checks if full model files exist locally
2. **Local Loading**: If available, loads from `./model` folder
3. **Fallback Download**: If not available, downloads from Hugging Face
4. **Tokenizer**: Always uses main repo for chat template and configuration
### Build Process
For Hugging Face Spaces deployment:
1. **Build Script**: `build.py` runs during Space build
2. **Model Download**: `download_model.py` downloads full model files
3. **Local Storage**: Model files stored in `./model` directory
4. **Fast Loading**: Subsequent runs use local files
### Chat Template Integration
The application uses the custom chat template from the model, which supports:
- System prompt integration
- User and assistant message formatting
- Thinking mode with `<think>` tags
- Proper conversation flow management
### Memory Optimization
- Uses full fine-tuned model for maximum quality
- Automatic device detection (CUDA/CPU)
- Efficient tokenization and generation
- Float16 precision on GPU for optimal performance
## πŸ“ Example Usage
1. **Basic Conversation**:
- Add context in the system prompt area
- Type your message in the user input box
- Click the generate button to start chatting
2. **Customizing System Prompt**:
- Edit the context in the dedicated text area
- Changes apply to new messages immediately
- Example: "Tu es un expert en programmation Python."
3. **Advanced Settings**:
- Check the "Advanced Settings" checkbox
- Adjust generation parameters as needed
- Enable/disable thinking mode
4. **Real-time Chat**:
- Messages appear in the chat interface
- Conversation history is maintained
- Responses are generated using the model's chat template
## πŸ› Troubleshooting
### Common Issues
1. **Model Loading Errors**:
- Ensure you have sufficient RAM (8GB+ recommended)
- Check your internet connection for model download
- Verify all dependencies are installed
2. **Generation Errors**:
- Try reducing the "Max Length" parameter
- Adjust temperature and top-p values
- Check the console for detailed error messages
3. **Performance Issues**:
- The full model provides maximum quality but requires more memory
- GPU acceleration recommended for optimal performance
- Consider reducing model parameters if memory is limited
4. **System Prompt Issues**:
- Ensure the system prompt is not too long (max 1000 characters)
- Check that the prompt follows the expected format
5. **Build Process Issues**:
- Check that `download_model.py` runs successfully
- Verify that model files are downloaded to `./int4` directory
- Ensure sufficient storage space for model files
## πŸ“„ License
This project is licensed under the MIT License. The underlying model is licensed under Apache 2.0.
## πŸ™ Acknowledgments
- **Model**: [Tonic/petite-elle-L-aime-3-sft](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft)
- **Base Model**: SmolLM3-3B by HuggingFaceTB
- **Training Data**: legmlai/openhermes-fr
- **Framework**: Gradio, Transformers, PyTorch
- **Layout Reference**: [Tonic/Nvidia-OpenReasoning](https://huggingface.co/spaces/Tonic/Nvidia-OpenReasoning)
## πŸ”— Links
- [Model on Hugging Face](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft)
- [Chat Template](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft/blob/main/chat_template.jinja)
- [Original App Reference](https://huggingface.co/spaces/Tonic/Nvidia-OpenReasoning)
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference