Spaces:

Tonic
/

Petite-LLM-3

Running on Zero

App Files Files Community

Petite-LLM-3 / README.md

Tonic

Update README.md

a4b7295 verified about 1 month ago

preview code

raw

history blame contribute delete

6.7 kB

	---
	title: Petite LLM 3
	emoji: 💃🏻
	colorFrom: green
	colorTo: purple
	sdk: gradio
	sdk_version: 5.38.2
	app_file: app.py
	pinned: true
	license: mit
	short_description: Smollm3 for French Understanding
	---

	# 🤖 Petite Elle L'Aime 3 - Chat Interface

	A complete Gradio application for the [Petite Elle L'Aime 3](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft) model, featuring the full fine-tuned version for maximum performance and quality.

	## 🚀 Features

	- Multilingual Support: English, French, Italian, Portuguese, Chinese, Arabic
	- Full Fine-Tuned Model: Maximum performance and quality with full precision
	- Interactive Chat Interface: Real-time conversation with the model
	- Customizable System Prompt: Define the assistant's personality and behavior
	- Thinking Mode: Enable reasoning mode with thinking tags
	- Responsive Design: Modern UI following the reference layout
	- Chat Template Integration: Proper Jinja template formatting
	- Automatic Model Download: Downloads full model at build time

	## 📋 Model Information

	- Base Model: HuggingFaceTB/SmolLM3-3B
	- Parameters: ~3B
	- Context Length: 128k
	- Precision: Full fine-tuned model (float16/float32)
	- Performance: Maximum quality and accuracy
	- Languages: English, French, Italian, Portuguese, Chinese, Arabic

	## 🛠️ Installation

	1. Clone this repository:
	```bash
	git clone <repository-url>
	cd Petite-LLM-3
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	## 🚀 Usage

	### Local Development

	Run the application locally:
	```bash
	python app.py
	```

	The application will be available at `http://localhost:7860`

	### Hugging Face Spaces

	This application is configured for deployment on Hugging Face Spaces with automatic model download:

	1. Build Process: The `build.py` script automatically downloads the int4 model during Space build
	2. Model Loading: Uses local model files when available, falls back to Hugging Face download
	3. Caching: Model files are cached for faster subsequent runs

	## 🎛️ Interface Features

	### Layout Structure
	The interface follows the reference layout with:
	- Title Section: Main heading and description
	- Information Panels: Features and model information
	- Input Section: Context and user input areas
	- Advanced Settings: Collapsible parameter controls
	- Chat Interface: Real-time conversation display

	### System Prompt
	- Default: "Tu es TonicIA, un assistant francophone rigoureux et bienveillant."
	- Editable: Users can customize the system prompt to define the assistant's personality
	- Real-time: Changes take effect immediately for new conversations

	### Generation Parameters
	- Max Length: Maximum number of tokens to generate (64-2048)
	- Temperature: Controls randomness in generation (0.01-1.0)
	- Top-p: Nucleus sampling parameter (0.1-1.0)
	- Enable Thinking: Enable reasoning mode with thinking tags
	- Advanced Settings: Collapsible panel for fine-tuning

	## 🔧 Technical Details

	### Model Loading Strategy
	The application uses a smart loading strategy:

	1. Local Check: First checks if full model files exist locally
	2. Local Loading: If available, loads from `./model` folder
	3. Fallback Download: If not available, downloads from Hugging Face
	4. Tokenizer: Always uses main repo for chat template and configuration

	### Build Process
	For Hugging Face Spaces deployment:

	1. Build Script: `build.py` runs during Space build
	2. Model Download: `download_model.py` downloads full model files
	3. Local Storage: Model files stored in `./model` directory
	4. Fast Loading: Subsequent runs use local files

	### Chat Template Integration
	The application uses the custom chat template from the model, which supports:
	- System prompt integration
	- User and assistant message formatting
	- Thinking mode with `<think>` tags
	- Proper conversation flow management

	### Memory Optimization
	- Uses full fine-tuned model for maximum quality
	- Automatic device detection (CUDA/CPU)
	- Efficient tokenization and generation
	- Float16 precision on GPU for optimal performance

	## 📝 Example Usage

	1. Basic Conversation:
	- Add context in the system prompt area
	- Type your message in the user input box
	- Click the generate button to start chatting

	2. Customizing System Prompt:
	- Edit the context in the dedicated text area
	- Changes apply to new messages immediately
	- Example: "Tu es un expert en programmation Python."

	3. Advanced Settings:
	- Check the "Advanced Settings" checkbox
	- Adjust generation parameters as needed
	- Enable/disable thinking mode

	4. Real-time Chat:
	- Messages appear in the chat interface
	- Conversation history is maintained
	- Responses are generated using the model's chat template

	## 🐛 Troubleshooting

	### Common Issues

	1. Model Loading Errors:
	- Ensure you have sufficient RAM (8GB+ recommended)
	- Check your internet connection for model download
	- Verify all dependencies are installed

	2. Generation Errors:
	- Try reducing the "Max Length" parameter
	- Adjust temperature and top-p values
	- Check the console for detailed error messages

	3. Performance Issues:
	- The full model provides maximum quality but requires more memory
	- GPU acceleration recommended for optimal performance
	- Consider reducing model parameters if memory is limited

	4. System Prompt Issues:
	- Ensure the system prompt is not too long (max 1000 characters)
	- Check that the prompt follows the expected format

	5. Build Process Issues:
	- Check that `download_model.py` runs successfully
	- Verify that model files are downloaded to `./int4` directory
	- Ensure sufficient storage space for model files

	## 📄 License

	This project is licensed under the MIT License. The underlying model is licensed under Apache 2.0.

	## 🙏 Acknowledgments

	- Model: [Tonic/petite-elle-L-aime-3-sft](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft)
	- Base Model: SmolLM3-3B by HuggingFaceTB
	- Training Data: legmlai/openhermes-fr
	- Framework: Gradio, Transformers, PyTorch
	- Layout Reference: [Tonic/Nvidia-OpenReasoning](https://huggingface.co/spaces/Tonic/Nvidia-OpenReasoning)

	## 🔗 Links

	- [Model on Hugging Face](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft)
	- [Chat Template](https://huggingface.co/Tonic/petite-elle-L-aime-3-sft/blob/main/chat_template.jinja)
	- [Original App Reference](https://huggingface.co/spaces/Tonic/Nvidia-OpenReasoning)

	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference