File size: 6,702 Bytes
6f9d970
 
19b19f0
6f9d970
 
 
 
 
a4b7295
6f9d970
 
 
 
19b19f0
 
109031b
19b19f0
 
 
 
109031b
19b19f0
 
 
 
 
109031b
19b19f0
 
 
0004ba8
19b19f0
 
109031b
 
19b19f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109031b
 
19b19f0
 
 
 
 
 
 
109031b
 
19b19f0
 
 
 
 
 
 
 
 
 
109031b
19b19f0
 
109031b
19b19f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109031b
 
 
19b19f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a4b7295
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
---
title: Petite LLM 3
emoji: πŸ’ƒπŸ»
colorFrom: green
colorTo: purple
sdk: gradio
sdk_version: 5.38.2
app_file: app.py
pinned: true
license: mit
short_description: Smollm3 for French Understanding
---

# πŸ€– Petite Elle L'Aime 3 - Chat Interface

A complete Gradio application for the [Petite Elle L'Aime 3](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft) model, featuring the full fine-tuned version for maximum performance and quality.

## πŸš€ Features

- **Multilingual Support**: English, French, Italian, Portuguese, Chinese, Arabic
- **Full Fine-Tuned Model**: Maximum performance and quality with full precision
- **Interactive Chat Interface**: Real-time conversation with the model
- **Customizable System Prompt**: Define the assistant's personality and behavior
- **Thinking Mode**: Enable reasoning mode with thinking tags
- **Responsive Design**: Modern UI following the reference layout
- **Chat Template Integration**: Proper Jinja template formatting
- **Automatic Model Download**: Downloads full model at build time

## πŸ“‹ Model Information

- **Base Model**: HuggingFaceTB/SmolLM3-3B
- **Parameters**: ~3B
- **Context Length**: 128k
- **Precision**: Full fine-tuned model (float16/float32)
- **Performance**: Maximum quality and accuracy
- **Languages**: English, French, Italian, Portuguese, Chinese, Arabic

## πŸ› οΈ Installation

1. Clone this repository:
```bash
git clone <repository-url>
cd Petite-LLM-3
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

## πŸš€ Usage

### Local Development

Run the application locally:
```bash
python app.py
```

The application will be available at `http://localhost:7860`

### Hugging Face Spaces

This application is configured for deployment on Hugging Face Spaces with automatic model download:

1. **Build Process**: The `build.py` script automatically downloads the int4 model during Space build
2. **Model Loading**: Uses local model files when available, falls back to Hugging Face download
3. **Caching**: Model files are cached for faster subsequent runs

## πŸŽ›οΈ Interface Features

### Layout Structure
The interface follows the reference layout with:
- **Title Section**: Main heading and description
- **Information Panels**: Features and model information
- **Input Section**: Context and user input areas
- **Advanced Settings**: Collapsible parameter controls
- **Chat Interface**: Real-time conversation display

### System Prompt
- **Default**: "Tu es TonicIA, un assistant francophone rigoureux et bienveillant."
- **Editable**: Users can customize the system prompt to define the assistant's personality
- **Real-time**: Changes take effect immediately for new conversations

### Generation Parameters
- **Max Length**: Maximum number of tokens to generate (64-2048)
- **Temperature**: Controls randomness in generation (0.01-1.0)
- **Top-p**: Nucleus sampling parameter (0.1-1.0)
- **Enable Thinking**: Enable reasoning mode with thinking tags
- **Advanced Settings**: Collapsible panel for fine-tuning

## πŸ”§ Technical Details

### Model Loading Strategy
The application uses a smart loading strategy:

1. **Local Check**: First checks if full model files exist locally
2. **Local Loading**: If available, loads from `./model` folder
3. **Fallback Download**: If not available, downloads from Hugging Face
4. **Tokenizer**: Always uses main repo for chat template and configuration

### Build Process
For Hugging Face Spaces deployment:

1. **Build Script**: `build.py` runs during Space build
2. **Model Download**: `download_model.py` downloads full model files
3. **Local Storage**: Model files stored in `./model` directory
4. **Fast Loading**: Subsequent runs use local files

### Chat Template Integration
The application uses the custom chat template from the model, which supports:
- System prompt integration
- User and assistant message formatting
- Thinking mode with `<think>` tags
- Proper conversation flow management

### Memory Optimization
- Uses full fine-tuned model for maximum quality
- Automatic device detection (CUDA/CPU)
- Efficient tokenization and generation
- Float16 precision on GPU for optimal performance

## πŸ“ Example Usage

1. **Basic Conversation**:
   - Add context in the system prompt area
   - Type your message in the user input box
   - Click the generate button to start chatting

2. **Customizing System Prompt**:
   - Edit the context in the dedicated text area
   - Changes apply to new messages immediately
   - Example: "Tu es un expert en programmation Python."

3. **Advanced Settings**:
   - Check the "Advanced Settings" checkbox
   - Adjust generation parameters as needed
   - Enable/disable thinking mode

4. **Real-time Chat**:
   - Messages appear in the chat interface
   - Conversation history is maintained
   - Responses are generated using the model's chat template

## πŸ› Troubleshooting

### Common Issues

1. **Model Loading Errors**:
   - Ensure you have sufficient RAM (8GB+ recommended)
   - Check your internet connection for model download
   - Verify all dependencies are installed

2. **Generation Errors**:
   - Try reducing the "Max Length" parameter
   - Adjust temperature and top-p values
   - Check the console for detailed error messages

3. **Performance Issues**:
   - The full model provides maximum quality but requires more memory
   - GPU acceleration recommended for optimal performance
   - Consider reducing model parameters if memory is limited

4. **System Prompt Issues**:
   - Ensure the system prompt is not too long (max 1000 characters)
   - Check that the prompt follows the expected format

5. **Build Process Issues**:
   - Check that `download_model.py` runs successfully
   - Verify that model files are downloaded to `./int4` directory
   - Ensure sufficient storage space for model files

## πŸ“„ License

This project is licensed under the MIT License. The underlying model is licensed under Apache 2.0.

## πŸ™ Acknowledgments

- **Model**: [Tonic/petite-elle-L-aime-3-sft](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft)
- **Base Model**: SmolLM3-3B by HuggingFaceTB
- **Training Data**: legmlai/openhermes-fr
- **Framework**: Gradio, Transformers, PyTorch
- **Layout Reference**: [Tonic/Nvidia-OpenReasoning](https://huggingface.co/spaces/Tonic/Nvidia-OpenReasoning)

## πŸ”— Links

- [Model on Hugging Face](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft)
- [Chat Template](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft/blob/main/chat_template.jinja)
- [Original App Reference](https://huggingface.co/spaces/Tonic/Nvidia-OpenReasoning)

---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference