vk98's picture
Initial backend deployment - Hono proxy + ColPali embedding API
5dfbe50
# ColPali Hono Proxy Server
A high-performance proxy server built with Hono that sits between your Next.js frontend and the ColPali/Vespa backend. This proxy handles caching, rate limiting, CORS, and provides a clean API interface.
## Features
- **Image Retrieval**: Serves base64 images from Vespa as actual image files with proper caching
- **Search Proxy**: Forwards search requests with result caching
- **Chat SSE Proxy**: Handles Server-Sent Events for streaming chat responses
- **Rate Limiting**: Protects backend from overload
- **Caching**: In-memory cache for search results and images
- **Health Checks**: Kubernetes-ready health endpoints
- **CORS Handling**: Configurable CORS for frontend integration
- **Request Logging**: Detailed request/response logging with request IDs
## Architecture
```
Next.js App (3000) β†’ Hono Proxy (4000) β†’ ColPali Backend (7860)
β†˜ Vespa Cloud
```
## API Endpoints
### Search
- `POST /api/search` - Search documents
```json
{
"query": "annual report 2023",
"limit": 10,
"ranking": "hybrid"
}
```
### Image Retrieval
- `GET /api/search/image/:docId/thumbnail` - Get thumbnail image
- `GET /api/search/image/:docId/full` - Get full-size image
### Chat
- `POST /api/chat` - Stream chat responses (SSE)
```json
{
"messages": [{"role": "user", "content": "Tell me about..."}],
"context": []
}
```
### Similarity Map
- `POST /api/search/similarity-map` - Generate similarity visualization
### Health
- `GET /health` - Detailed health status
- `GET /health/live` - Liveness probe
- `GET /health/ready` - Readiness probe
## Setup
### Development
1. Install dependencies:
```bash
npm install
```
2. Copy environment variables:
```bash
cp .env.example .env
```
3. Update `.env` with your configuration
4. Run in development mode:
```bash
npm run dev
```
### Production
1. Build:
```bash
npm run build
```
2. Run:
```bash
npm start
```
### Docker
Build and run with Docker:
```bash
docker build -t colpali-hono-proxy .
docker run -p 4000:4000 --env-file .env colpali-hono-proxy
```
Or use docker-compose:
```bash
docker-compose up
```
## Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `PORT` | Server port | 4000 |
| `BACKEND_URL` | ColPali backend URL | http://localhost:7860 |
| `CORS_ORIGIN` | Allowed CORS origin | http://localhost:3000 |
| `ENABLE_CACHE` | Enable caching | true |
| `CACHE_TTL` | Cache TTL in seconds | 300 |
| `RATE_LIMIT_WINDOW` | Rate limit window (ms) | 60000 |
| `RATE_LIMIT_MAX` | Max requests per window | 100 |
## Integration with Next.js
Update your Next.js app to use the proxy:
```typescript
// .env.local
NEXT_PUBLIC_API_URL=http://localhost:4000/api
// API calls
const response = await fetch(`${process.env.NEXT_PUBLIC_API_URL}/search`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query, limit })
});
```
## Caching Strategy
- **Search Results**: Cached for 5 minutes (configurable)
- **Images**: Cached for 24 hours
- **Cache Keys**: Based on query parameters
- **Cache Headers**: `X-Cache: HIT/MISS`
## Rate Limiting
- Default: 100 requests per minute per IP
- Headers included:
- `X-RateLimit-Limit`
- `X-RateLimit-Remaining`
- `X-RateLimit-Reset`
## Monitoring
The proxy includes:
- Request logging with correlation IDs
- Performance timing
- Error tracking
- Health endpoints for monitoring
## Deployment Options
### Railway/Fly.io
```toml
# fly.toml
app = "colpali-proxy"
primary_region = "ord"
[http_service]
internal_port = 4000
force_https = true
auto_stop_machines = true
auto_start_machines = true
```
### Kubernetes
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: colpali-proxy
spec:
replicas: 3
template:
spec:
containers:
- name: proxy
image: colpali-proxy:latest
ports:
- containerPort: 4000
livenessProbe:
httpGet:
path: /health/live
port: 4000
readinessProbe:
httpGet:
path: /health/ready
port: 4000
```
## Performance
- Built with Hono for maximum performance
- Efficient streaming for SSE
- Connection pooling for backend requests
- In-memory caching reduces backend load
- Brotli/gzip compression enabled
## Security
- Rate limiting prevents abuse
- Secure headers enabled
- CORS properly configured
- Request ID tracking
- No sensitive data logging