--- title: CCA emoji: 🐨 colorFrom: purple colorTo: yellow sdk: docker pinned: false --- # Multi-Model API Space This Hugging Face Space provides API access to multiple Lyon28 models with both REST API and web interface. ## Available Models - `Lyon28/Tinny-Llama` - Small language model - `Lyon28/Pythia` - Pythia-based model - `Lyon28/Bert-Tinny` - BERT variant - `Lyon28/Albert-Base-V2` - ALBERT model - `Lyon28/T5-Small` - T5 text-to-text model - `Lyon28/GPT-2` - GPT-2 variant - `Lyon28/GPT-Neo` - GPT-Neo model - `Lyon28/Distilbert-Base-Uncased` - DistilBERT model - `Lyon28/Distil_GPT-2` - Distilled GPT-2 - `Lyon28/GPT-2-Tinny` - Tiny GPT-2 - `Lyon28/Electra-Small` - ELECTRA model ## Features ### 🚀 REST API Endpoints 1. **GET /api/models** - List available and loaded models 2. **POST /api/load_model** - Load a specific model 3. **POST /api/generate** - Generate text using loaded models 4. **GET /health** - Health check ### 🎯 Web Interface - Model management interface - Interactive text generation - Parameter tuning (temperature, top_p, max_length) - Real-time model loading status ## API Usage ### Load a Model ```bash curl -X POST "https://your-space-url/api/load_model" \ -H "Content-Type: application/json" \ -d '{"model_name": "Lyon28/GPT-2"}' ``` ### Generate Text ```bash curl -X POST "https://your-space-url/api/generate" \ -H "Content-Type: application/json" \ -d '{ "model_name": "Lyon28/GPT-2", "prompt": "Hello world", "max_length": 100, "temperature": 0.7, "top_p": 0.9 }' ``` ### Python Example ```python import requests # Load model response = requests.post("https://your-space-url/api/load_model", json={"model_name": "Lyon28/GPT-2"}) # Generate text response = requests.post("https://your-space-url/api/generate", json={ "model_name": "Lyon28/GPT-2", "prompt": "The future of AI is", "max_length": 150, "temperature": 0.8 }) result = response.json() print(result["generated_text"]) ``` ## Model Types - **Causal LM**: GPT-2, GPT-Neo, Pythia, Tinny-Llama variants - **Text-to-Text**: T5-Small - **Feature Extraction**: BERT, ALBERT, DistilBERT, ELECTRA ## Performance Notes - Models are loaded on-demand to optimize memory usage - GPU acceleration used when available - Models cached for faster subsequent loads - Support for both CPU and GPU inference ## Rate Limits This is a free public API. Please use responsibly: - Max 100 requests per minute per IP - Max 500 tokens per generation - Models auto-unload after 1 hour of inactivity ## Support For issues or questions about specific Lyon28 models, please contact the model authors. --- *Powered by Hugging Face Transformers and Gradio*