ColPali Hono Proxy Server

A high-performance proxy server built with Hono that sits between your Next.js frontend and the ColPali/Vespa backend. This proxy handles caching, rate limiting, CORS, and provides a clean API interface.

Features

Image Retrieval: Serves base64 images from Vespa as actual image files with proper caching
Search Proxy: Forwards search requests with result caching
Chat SSE Proxy: Handles Server-Sent Events for streaming chat responses
Rate Limiting: Protects backend from overload
Caching: In-memory cache for search results and images
Health Checks: Kubernetes-ready health endpoints
CORS Handling: Configurable CORS for frontend integration
Request Logging: Detailed request/response logging with request IDs

Architecture

Next.js App (3000) → Hono Proxy (4000) → ColPali Backend (7860)
                                      ↘ Vespa Cloud

API Endpoints

Search

POST /api/search - Search documents

{
  "query": "annual report 2023",
  "limit": 10,
  "ranking": "hybrid"
}

Image Retrieval

GET /api/search/image/:docId/thumbnail - Get thumbnail image
GET /api/search/image/:docId/full - Get full-size image

Chat

POST /api/chat - Stream chat responses (SSE)

{
  "messages": [{"role": "user", "content": "Tell me about..."}],
  "context": []
}

Similarity Map

POST /api/search/similarity-map - Generate similarity visualization

Health

GET /health - Detailed health status
GET /health/live - Liveness probe
GET /health/ready - Readiness probe

Setup

Development

Install dependencies:
```
npm install
```
Copy environment variables:
```
cp .env.example .env
```
Update .env with your configuration
Run in development mode:
```
npm run dev
```

Production

Build:
```
npm run build
```
Run:
```
npm start
```

Docker

Build and run with Docker:

docker build -t colpali-hono-proxy .
docker run -p 4000:4000 --env-file .env colpali-hono-proxy

Or use docker-compose:

docker-compose up

Environment Variables

Variable	Description	Default
`PORT`	Server port	4000
`BACKEND_URL`	ColPali backend URL	http://localhost:7860
`CORS_ORIGIN`	Allowed CORS origin	http://localhost:3000
`ENABLE_CACHE`	Enable caching	true
`CACHE_TTL`	Cache TTL in seconds	300
`RATE_LIMIT_WINDOW`	Rate limit window (ms)	60000
`RATE_LIMIT_MAX`	Max requests per window	100

Integration with Next.js

Update your Next.js app to use the proxy:

// .env.local
NEXT_PUBLIC_API_URL=http://localhost:4000/api

// API calls
const response = await fetch(`${process.env.NEXT_PUBLIC_API_URL}/search`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ query, limit })
});

Caching Strategy

Search Results: Cached for 5 minutes (configurable)
Images: Cached for 24 hours
Cache Keys: Based on query parameters
Cache Headers: X-Cache: HIT/MISS

Rate Limiting

Default: 100 requests per minute per IP
Headers included:
- X-RateLimit-Limit
- X-RateLimit-Remaining
- X-RateLimit-Reset

Monitoring

The proxy includes:

Request logging with correlation IDs
Performance timing
Error tracking
Health endpoints for monitoring

Deployment Options

Railway/Fly.io

# fly.toml
app = "colpali-proxy"
primary_region = "ord"

[http_service]
  internal_port = 4000
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: colpali-proxy
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: proxy
        image: colpali-proxy:latest
        ports:
        - containerPort: 4000
        livenessProbe:
          httpGet:
            path: /health/live
            port: 4000
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 4000

Performance

Built with Hono for maximum performance
Efficient streaming for SSE
Connection pooling for backend requests
In-memory caching reduces backend load
Brotli/gzip compression enabled

Security

Rate limiting prevents abuse
Secure headers enabled
CORS properly configured
Request ID tracking
No sensitive data logging