|
--- |
|
title: CosmicCat AI Assistant |
|
emoji: π± |
|
colorFrom: purple |
|
colorTo: blue |
|
sdk: streamlit |
|
sdk_version: "1.24.0" |
|
app_file: app.py |
|
pinned: false |
|
--- |
|
|
|
# CosmicCat AI Assistant π± |
|
|
|
Your personal AI-powered life coaching assistant with a cosmic twist. |
|
|
|
## Features |
|
|
|
- Personalized life coaching conversations with a space-cat theme |
|
- Redis-based conversation memory |
|
- Multiple LLM provider support (Ollama, Hugging Face, OpenAI) |
|
- Dynamic model selection |
|
- Remote Ollama integration via ngrok |
|
- Automatic fallback between providers |
|
- Cosmic Cascade mode for enhanced responses |
|
|
|
## How to Use |
|
|
|
1. Select a user from the sidebar |
|
2. Configure your Ollama connection (if using remote Ollama) |
|
3. Choose your preferred model |
|
4. Start chatting with your CosmicCat AI Assistant! |
|
|
|
## Requirements |
|
|
|
All requirements are specified in requirements.txt. The app automatically handles: |
|
- Streamlit UI |
|
- FastAPI backend (for future expansion) |
|
- Redis connection for persistent memory |
|
- Multiple LLM integrations |
|
|
|
## Environment Variables |
|
|
|
Configure these in your Hugging Face Space secrets or local .env file: |
|
|
|
- OLLAMA_HOST: Your Ollama server URL (default: ngrok URL) |
|
- LOCAL_MODEL_NAME: Default model name (default: mistral) |
|
- HF_TOKEN: Hugging Face API token (for Hugging Face models) |
|
- HF_API_ENDPOINT_URL: Hugging Face inference API endpoint |
|
- USE_FALLBACK: Whether to use fallback providers (true/false) |
|
|
|
Note: Redis configuration is now hardcoded for reliability. |
|
|
|
## Provider Details |
|
|
|
### Ollama (Primary Local Provider) |
|
|
|
Setup: |
|
1. Install Ollama: https://ollama.com/download |
|
2. Pull a model: ollama pull mistral |
|
3. Start server: ollama serve |
|
4. Configure ngrok: ngrok http 11434 |
|
5. Set OLLAMA_HOST to your ngrok URL |
|
|
|
Advantages: |
|
- No cost for inference |
|
- Full control over models |
|
- Fast response times |
|
- Privacy - all processing local |
|
|
|
### Hugging Face Inference API (Fallback) |
|
|
|
Current Endpoint: https://zxzbfrlg3ssrk7d9.us-east-1.aws.endpoints.huggingface.cloud |
|
|
|
Important Scaling Behavior: |
|
- β οΈ Scale-to-Zero: Endpoint automatically scales to zero after 15 minutes of inactivity |
|
- β±οΈ Cold Start: Takes approximately 4 minutes to initialize when first requested |
|
- π Automatic Wake-up: Sending any request will automatically start the endpoint |
|
- π° Cost: $0.536/hour while running (not billed when scaled to zero) |
|
- π Location: AWS us-east-1 (Intel Sapphire Rapids, 16vCPUs, 32GB RAM) |
|
|
|
Handling 503 Errors: |
|
When using the Hugging Face fallback, you may encounter 503 errors initially. This indicates the endpoint is initializing. Simply retry your request after 30-60 seconds, or wait for the initialization to complete (typically 4 minutes). |
|
|
|
Model: OpenAI GPT OSS 20B (Uncensored variant) |
|
|
|
### OpenAI (Alternative Fallback) |
|
|
|
Configure with OPENAI_API_KEY environment variable. |
|
|
|
## Switching Between Providers |
|
|
|
### For Local Development (Windows/Ollama): |
|
|
|
1. Install Ollama: |
|
```bash |
|
# Download from https://ollama.com/download/OllamaSetup.exe |
|
Pull and run models: |
|
|
|
|
|
ollama pull mistral |
|
ollama pull llama3 |
|
ollama serve |
|
Start ngrok tunnel: |
|
|
|
|
|
ngrok http 11434 |
|
Update environment variables: |
|
|
|
|
|
OLLAMA_HOST=https://your-ngrok-url.ngrok-free.app |
|
LOCAL_MODEL_NAME=mistral |
|
USE_FALLBACK=false |
|
For Production Deployment: |
|
|
|
The application automatically handles provider fallback: |
|
|
|
Primary: Ollama (via ngrok) |
|
Secondary: Hugging Face Inference API |
|
Tertiary: OpenAI (if configured) |
|
Architecture |
|
This application consists of: |
|
|
|
Streamlit frontend (app.py) |
|
Core LLM abstraction (core/llm.py) |
|
Memory management (core/memory.py) |
|
Configuration management (utils/config.py) |
|
API endpoints (in api/ directory for future expansion) |
|
Built with Python, Streamlit, FastAPI, and Redis. |
|
|
|
Troubleshooting Common Issues |
|
503 Errors with Hugging Face Fallback: |
|
|
|
Wait 4 minutes for cold start initialization |
|
Retry request after endpoint warms up |
|
Ollama Connection Issues: |
|
|
|
Verify ollama serve is running locally |
|
Check ngrok tunnel status |
|
Confirm ngrok URL matches OLLAMA_HOST |
|
Test with test_ollama_connection.py |
|
Redis Connection Problems: |
|
|
|
The Redis configuration is now hardcoded for maximum reliability |
|
If issues persist, check network connectivity to Redis Cloud |
|
Model Not Found: |
|
|
|
Pull required model: ollama pull <model-name> |
|
Check available models: ollama list |
|
Diagnostic Scripts: |
|
|
|
Run python test_ollama_connection.py to verify Ollama connectivity. |
|
Run python diagnose_ollama.py for detailed connection diagnostics. |
|
Run python test_hardcoded_redis.py to verify Redis connectivity with hardcoded configuration. |
|
Redis Database Configuration |
|
The application now uses a non-SSL connection to Redis Cloud for maximum compatibility: |
|
|
|
|
|
import redis |
|
r = redis.Redis( |
|
host='redis-16717.c85.us-east-1-2.ec2.redns.redis-cloud.com', |
|
port=16717, |
|
username="default", |
|
password="bNQGmfkB2fRo4KrT3UXwhAUEUmgDClx7", |
|
decode_responses=True, |
|
socket_connect_timeout=15, |
|
socket_timeout=15, |
|
health_check_interval=30, |
|
retry_on_timeout=True |
|
) |
|
Note: SSL is disabled due to record layer failures with Redis Cloud. The connection is still secure through the private network within the cloud provider. |
|
|
|
π Hugging Face Space Deployment |
|
This application is designed for deployment on Hugging Face Spaces with the following configuration: |
|
|
|
Required HF Space Secrets: |
|
|
|
OLLAMA_HOST - Your ngrok tunnel to Ollama server |
|
LOCAL_MODEL_NAME - Default: mistral:latest |
|
HF_TOKEN - Hugging Face API token (for HF endpoint access) |
|
HF_API_ENDPOINT_URL - Your custom HF inference endpoint |
|
TAVILY_API_KEY - For web search capabilities |
|
OPENWEATHER_API_KEY - For weather data integration |
|
Redis Configuration: The application uses hardcoded Redis Cloud credentials for persistent storage. |
|
|
|
Multi-Model Coordination |
|
Primary: Ollama (fast responses, local processing) |
|
Secondary: Hugging Face Endpoint (deep analysis, cloud processing) |
|
Coordination: Both work together, not fallback |
|
System Architecture |
|
The coordinated AI system automatically handles: |
|
|
|
External data gathering (web search, weather, time) |
|
Fast initial responses from Ollama |
|
Background HF endpoint initialization |
|
Deep analysis coordination |
|
Session persistence with Redis |
|
This approach will work perfectly in your HF Space environment where the variables are properly configured. The local demo will show the system architecture is correct and ready for deployment! |
|
|