metadata

title: Petite LLM 3
emoji: 💃🏻
colorFrom: green
colorTo: purple
sdk: gradio
sdk_version: 5.38.2
app_file: app.py
pinned: true
license: mit
short_description: Smollm3 for French Understanding

🤖 Petite Elle L'Aime 3 - Chat Interface

A complete Gradio application for the Petite Elle L'Aime 3 model, featuring the full fine-tuned version for maximum performance and quality.

🚀 Features

Multilingual Support: English, French, Italian, Portuguese, Chinese, Arabic
Full Fine-Tuned Model: Maximum performance and quality with full precision
Interactive Chat Interface: Real-time conversation with the model
Customizable System Prompt: Define the assistant's personality and behavior
Thinking Mode: Enable reasoning mode with thinking tags
Responsive Design: Modern UI following the reference layout
Chat Template Integration: Proper Jinja template formatting
Automatic Model Download: Downloads full model at build time

📋 Model Information

Base Model: HuggingFaceTB/SmolLM3-3B
Parameters: ~3B
Context Length: 128k
Precision: Full fine-tuned model (float16/float32)
Performance: Maximum quality and accuracy
Languages: English, French, Italian, Portuguese, Chinese, Arabic

🛠️ Installation

Clone this repository:

git clone <repository-url>
cd Petite-LLM-3

Install dependencies:

pip install -r requirements.txt

🚀 Usage

Local Development

Run the application locally:

python app.py

The application will be available at http://localhost:7860

Hugging Face Spaces

This application is configured for deployment on Hugging Face Spaces with automatic model download:

Build Process: The build.py script automatically downloads the int4 model during Space build
Model Loading: Uses local model files when available, falls back to Hugging Face download
Caching: Model files are cached for faster subsequent runs

🎛️ Interface Features

Layout Structure

The interface follows the reference layout with:

Title Section: Main heading and description
Information Panels: Features and model information
Input Section: Context and user input areas
Advanced Settings: Collapsible parameter controls
Chat Interface: Real-time conversation display

System Prompt

Default: "Tu es TonicIA, un assistant francophone rigoureux et bienveillant."
Editable: Users can customize the system prompt to define the assistant's personality
Real-time: Changes take effect immediately for new conversations

Generation Parameters

Max Length: Maximum number of tokens to generate (64-2048)
Temperature: Controls randomness in generation (0.01-1.0)
Top-p: Nucleus sampling parameter (0.1-1.0)
Enable Thinking: Enable reasoning mode with thinking tags
Advanced Settings: Collapsible panel for fine-tuning

🔧 Technical Details

Model Loading Strategy

The application uses a smart loading strategy:

Local Check: First checks if full model files exist locally
Local Loading: If available, loads from ./model folder
Fallback Download: If not available, downloads from Hugging Face
Tokenizer: Always uses main repo for chat template and configuration

Build Process

For Hugging Face Spaces deployment:

Build Script: build.py runs during Space build
Model Download: download_model.py downloads full model files
Local Storage: Model files stored in ./model directory
Fast Loading: Subsequent runs use local files

Chat Template Integration

The application uses the custom chat template from the model, which supports:

System prompt integration
User and assistant message formatting
Thinking mode with <think> tags
Proper conversation flow management

Memory Optimization

Uses full fine-tuned model for maximum quality
Automatic device detection (CUDA/CPU)
Efficient tokenization and generation
Float16 precision on GPU for optimal performance

📝 Example Usage

Basic Conversation:
- Add context in the system prompt area
- Type your message in the user input box
- Click the generate button to start chatting
Customizing System Prompt:
- Edit the context in the dedicated text area
- Changes apply to new messages immediately
- Example: "Tu es un expert en programmation Python."
Advanced Settings:
- Check the "Advanced Settings" checkbox
- Adjust generation parameters as needed
- Enable/disable thinking mode
Real-time Chat:
- Messages appear in the chat interface
- Conversation history is maintained
- Responses are generated using the model's chat template

🐛 Troubleshooting

Common Issues

Model Loading Errors:
- Ensure you have sufficient RAM (8GB+ recommended)
- Check your internet connection for model download
- Verify all dependencies are installed
Generation Errors:
- Try reducing the "Max Length" parameter
- Adjust temperature and top-p values
- Check the console for detailed error messages
Performance Issues:
- The full model provides maximum quality but requires more memory
- GPU acceleration recommended for optimal performance
- Consider reducing model parameters if memory is limited
System Prompt Issues:
- Ensure the system prompt is not too long (max 1000 characters)
- Check that the prompt follows the expected format
Build Process Issues:
- Check that download_model.py runs successfully
- Verify that model files are downloaded to ./int4 directory
- Ensure sufficient storage space for model files

📄 License

This project is licensed under the MIT License. The underlying model is licensed under Apache 2.0.

🙏 Acknowledgments

Model: Tonic/petite-elle-L-aime-3-sft
Base Model: SmolLM3-3B by HuggingFaceTB
Training Data: legmlai/openhermes-fr
Framework: Gradio, Transformers, PyTorch
Layout Reference: Tonic/Nvidia-OpenReasoning

🔗 Links

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference