Spaces:

Tonic
/

Petite-LLM-3

Running on Zero

File size: 6,702 Bytes

---
title: Petite LLM 3
emoji: 💃🏻
colorFrom: green
colorTo: purple
sdk: gradio
sdk_version: 5.38.2
app_file: app.py
pinned: true
license: mit
short_description: Smollm3 for French Understanding
---

# 🤖 Petite Elle L'Aime 3 - Chat Interface

A complete Gradio application for the [Petite Elle L'Aime 3](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft) model, featuring the full fine-tuned version for maximum performance and quality.

## 🚀 Features

- **Multilingual Support**: English, French, Italian, Portuguese, Chinese, Arabic
- **Full Fine-Tuned Model**: Maximum performance and quality with full precision
- **Interactive Chat Interface**: Real-time conversation with the model
- **Customizable System Prompt**: Define the assistant's personality and behavior
- **Thinking Mode**: Enable reasoning mode with thinking tags
- **Responsive Design**: Modern UI following the reference layout
- **Chat Template Integration**: Proper Jinja template formatting
- **Automatic Model Download**: Downloads full model at build time

## 📋 Model Information

- **Base Model**: HuggingFaceTB/SmolLM3-3B
- **Parameters**: ~3B
- **Context Length**: 128k
- **Precision**: Full fine-tuned model (float16/float32)
- **Performance**: Maximum quality and accuracy
- **Languages**: English, French, Italian, Portuguese, Chinese, Arabic

## 🛠️ Installation

1. Clone this repository:
```bash
git clone <repository-url>
cd Petite-LLM-3
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

## 🚀 Usage

### Local Development

Run the application locally:
```bash
python app.py
```

The application will be available at `http://localhost:7860`

### Hugging Face Spaces

This application is configured for deployment on Hugging Face Spaces with automatic model download:

1. **Build Process**: The `build.py` script automatically downloads the int4 model during Space build
2. **Model Loading**: Uses local model files when available, falls back to Hugging Face download
3. **Caching**: Model files are cached for faster subsequent runs

## 🎛️ Interface Features

### Layout Structure
The interface follows the reference layout with:
- **Title Section**: Main heading and description
- **Information Panels**: Features and model information
- **Input Section**: Context and user input areas
- **Advanced Settings**: Collapsible parameter controls
- **Chat Interface**: Real-time conversation display

### System Prompt
- **Default**: "Tu es TonicIA, un assistant francophone rigoureux et bienveillant."
- **Editable**: Users can customize the system prompt to define the assistant's personality
- **Real-time**: Changes take effect immediately for new conversations

### Generation Parameters
- **Max Length**: Maximum number of tokens to generate (64-2048)
- **Temperature**: Controls randomness in generation (0.01-1.0)
- **Top-p**: Nucleus sampling parameter (0.1-1.0)
- **Enable Thinking**: Enable reasoning mode with thinking tags
- **Advanced Settings**: Collapsible panel for fine-tuning

## 🔧 Technical Details

### Model Loading Strategy
The application uses a smart loading strategy:

1. **Local Check**: First checks if full model files exist locally
2. **Local Loading**: If available, loads from `./model` folder
3. **Fallback Download**: If not available, downloads from Hugging Face
4. **Tokenizer**: Always uses main repo for chat template and configuration

### Build Process
For Hugging Face Spaces deployment:

1. **Build Script**: `build.py` runs during Space build
2. **Model Download**: `download_model.py` downloads full model files
3. **Local Storage**: Model files stored in `./model` directory
4. **Fast Loading**: Subsequent runs use local files

### Chat Template Integration
The application uses the custom chat template from the model, which supports:
- System prompt integration
- User and assistant message formatting
- Thinking mode with `<think>` tags
- Proper conversation flow management

### Memory Optimization
- Uses full fine-tuned model for maximum quality
- Automatic device detection (CUDA/CPU)
- Efficient tokenization and generation
- Float16 precision on GPU for optimal performance

## 📝 Example Usage

1. **Basic Conversation**:
   - Add context in the system prompt area
   - Type your message in the user input box
   - Click the generate button to start chatting

2. **Customizing System Prompt**:
   - Edit the context in the dedicated text area
   - Changes apply to new messages immediately
   - Example: "Tu es un expert en programmation Python."

3. **Advanced Settings**:
   - Check the "Advanced Settings" checkbox
   - Adjust generation parameters as needed
   - Enable/disable thinking mode

4. **Real-time Chat**:
   - Messages appear in the chat interface
   - Conversation history is maintained
   - Responses are generated using the model's chat template

## 🐛 Troubleshooting

### Common Issues

1. **Model Loading Errors**:
   - Ensure you have sufficient RAM (8GB+ recommended)
   - Check your internet connection for model download
   - Verify all dependencies are installed

2. **Generation Errors**:
   - Try reducing the "Max Length" parameter
   - Adjust temperature and top-p values
   - Check the console for detailed error messages

3. **Performance Issues**:
   - The full model provides maximum quality but requires more memory
   - GPU acceleration recommended for optimal performance
   - Consider reducing model parameters if memory is limited

4. **System Prompt Issues**:
   - Ensure the system prompt is not too long (max 1000 characters)
   - Check that the prompt follows the expected format

5. **Build Process Issues**:
   - Check that `download_model.py` runs successfully
   - Verify that model files are downloaded to `./int4` directory
   - Ensure sufficient storage space for model files

## 📄 License

This project is licensed under the MIT License. The underlying model is licensed under Apache 2.0.

## 🙏 Acknowledgments

- **Model**: [Tonic/petite-elle-L-aime-3-sft](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft)
- **Base Model**: SmolLM3-3B by HuggingFaceTB
- **Training Data**: legmlai/openhermes-fr
- **Framework**: Gradio, Transformers, PyTorch
- **Layout Reference**: [Tonic/Nvidia-OpenReasoning](https://huggingface.co/spaces/Tonic/Nvidia-OpenReasoning)

## 🔗 Links

- [Model on Hugging Face](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft)
- [Chat Template](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft/blob/main/chat_template.jinja)
- [Original App Reference](https://huggingface.co/spaces/Tonic/Nvidia-OpenReasoning)

---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference