Spaces:
Running
Running
ColPali Hono Proxy Server
A high-performance proxy server built with Hono that sits between your Next.js frontend and the ColPali/Vespa backend. This proxy handles caching, rate limiting, CORS, and provides a clean API interface.
Features
- Image Retrieval: Serves base64 images from Vespa as actual image files with proper caching
- Search Proxy: Forwards search requests with result caching
- Chat SSE Proxy: Handles Server-Sent Events for streaming chat responses
- Rate Limiting: Protects backend from overload
- Caching: In-memory cache for search results and images
- Health Checks: Kubernetes-ready health endpoints
- CORS Handling: Configurable CORS for frontend integration
- Request Logging: Detailed request/response logging with request IDs
Architecture
Next.js App (3000) β Hono Proxy (4000) β ColPali Backend (7860)
β Vespa Cloud
API Endpoints
Search
POST /api/search
- Search documents{ "query": "annual report 2023", "limit": 10, "ranking": "hybrid" }
Image Retrieval
GET /api/search/image/:docId/thumbnail
- Get thumbnail imageGET /api/search/image/:docId/full
- Get full-size image
Chat
POST /api/chat
- Stream chat responses (SSE){ "messages": [{"role": "user", "content": "Tell me about..."}], "context": [] }
Similarity Map
POST /api/search/similarity-map
- Generate similarity visualization
Health
GET /health
- Detailed health statusGET /health/live
- Liveness probeGET /health/ready
- Readiness probe
Setup
Development
Install dependencies:
npm install
Copy environment variables:
cp .env.example .env
Update
.env
with your configurationRun in development mode:
npm run dev
Production
Build:
npm run build
Run:
npm start
Docker
Build and run with Docker:
docker build -t colpali-hono-proxy .
docker run -p 4000:4000 --env-file .env colpali-hono-proxy
Or use docker-compose:
docker-compose up
Environment Variables
Variable | Description | Default |
---|---|---|
PORT |
Server port | 4000 |
BACKEND_URL |
ColPali backend URL | http://localhost:7860 |
CORS_ORIGIN |
Allowed CORS origin | http://localhost:3000 |
ENABLE_CACHE |
Enable caching | true |
CACHE_TTL |
Cache TTL in seconds | 300 |
RATE_LIMIT_WINDOW |
Rate limit window (ms) | 60000 |
RATE_LIMIT_MAX |
Max requests per window | 100 |
Integration with Next.js
Update your Next.js app to use the proxy:
// .env.local
NEXT_PUBLIC_API_URL=http://localhost:4000/api
// API calls
const response = await fetch(`${process.env.NEXT_PUBLIC_API_URL}/search`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query, limit })
});
Caching Strategy
- Search Results: Cached for 5 minutes (configurable)
- Images: Cached for 24 hours
- Cache Keys: Based on query parameters
- Cache Headers:
X-Cache: HIT/MISS
Rate Limiting
- Default: 100 requests per minute per IP
- Headers included:
X-RateLimit-Limit
X-RateLimit-Remaining
X-RateLimit-Reset
Monitoring
The proxy includes:
- Request logging with correlation IDs
- Performance timing
- Error tracking
- Health endpoints for monitoring
Deployment Options
Railway/Fly.io
# fly.toml
app = "colpali-proxy"
primary_region = "ord"
[http_service]
internal_port = 4000
force_https = true
auto_stop_machines = true
auto_start_machines = true
Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: colpali-proxy
spec:
replicas: 3
template:
spec:
containers:
- name: proxy
image: colpali-proxy:latest
ports:
- containerPort: 4000
livenessProbe:
httpGet:
path: /health/live
port: 4000
readinessProbe:
httpGet:
path: /health/ready
port: 4000
Performance
- Built with Hono for maximum performance
- Efficient streaming for SSE
- Connection pooling for backend requests
- In-memory caching reduces backend load
- Brotli/gzip compression enabled
Security
- Rate limiting prevents abuse
- Secure headers enabled
- CORS properly configured
- Request ID tracking
- No sensitive data logging