metadata
title: CCA
emoji: π¨
colorFrom: purple
colorTo: yellow
sdk: docker
pinned: false
Multi-Model API Space
This Hugging Face Space provides API access to multiple Lyon28 models with both REST API and web interface.
Available Models
Lyon28/Tinny-Llama
- Small language modelLyon28/Pythia
- Pythia-based modelLyon28/Bert-Tinny
- BERT variantLyon28/Albert-Base-V2
- ALBERT modelLyon28/T5-Small
- T5 text-to-text modelLyon28/GPT-2
- GPT-2 variantLyon28/GPT-Neo
- GPT-Neo modelLyon28/Distilbert-Base-Uncased
- DistilBERT modelLyon28/Distil_GPT-2
- Distilled GPT-2Lyon28/GPT-2-Tinny
- Tiny GPT-2Lyon28/Electra-Small
- ELECTRA model
Features
π REST API Endpoints
- GET /api/models - List available and loaded models
- POST /api/load_model - Load a specific model
- POST /api/generate - Generate text using loaded models
- GET /health - Health check
π― Web Interface
- Model management interface
- Interactive text generation
- Parameter tuning (temperature, top_p, max_length)
- Real-time model loading status
API Usage
Load a Model
curl -X POST "https://your-space-url/api/load_model" \
-H "Content-Type: application/json" \
-d '{"model_name": "Lyon28/GPT-2"}'
Generate Text
curl -X POST "https://your-space-url/api/generate" \
-H "Content-Type: application/json" \
-d '{
"model_name": "Lyon28/GPT-2",
"prompt": "Hello world",
"max_length": 100,
"temperature": 0.7,
"top_p": 0.9
}'
Python Example
import requests
# Load model
response = requests.post("https://your-space-url/api/load_model",
json={"model_name": "Lyon28/GPT-2"})
# Generate text
response = requests.post("https://your-space-url/api/generate",
json={
"model_name": "Lyon28/GPT-2",
"prompt": "The future of AI is",
"max_length": 150,
"temperature": 0.8
})
result = response.json()
print(result["generated_text"])
Model Types
- Causal LM: GPT-2, GPT-Neo, Pythia, Tinny-Llama variants
- Text-to-Text: T5-Small
- Feature Extraction: BERT, ALBERT, DistilBERT, ELECTRA
Performance Notes
- Models are loaded on-demand to optimize memory usage
- GPU acceleration used when available
- Models cached for faster subsequent loads
- Support for both CPU and GPU inference
Rate Limits
This is a free public API. Please use responsibly:
- Max 100 requests per minute per IP
- Max 500 tokens per generation
- Models auto-unload after 1 hour of inactivity
Support
For issues or questions about specific Lyon28 models, please contact the model authors.
Powered by Hugging Face Transformers and Gradio