vk98's picture
Initial backend deployment - Hono proxy + ColPali embedding API
5dfbe50

ColPali Hono Proxy Server

A high-performance proxy server built with Hono that sits between your Next.js frontend and the ColPali/Vespa backend. This proxy handles caching, rate limiting, CORS, and provides a clean API interface.

Features

  • Image Retrieval: Serves base64 images from Vespa as actual image files with proper caching
  • Search Proxy: Forwards search requests with result caching
  • Chat SSE Proxy: Handles Server-Sent Events for streaming chat responses
  • Rate Limiting: Protects backend from overload
  • Caching: In-memory cache for search results and images
  • Health Checks: Kubernetes-ready health endpoints
  • CORS Handling: Configurable CORS for frontend integration
  • Request Logging: Detailed request/response logging with request IDs

Architecture

Next.js App (3000) β†’ Hono Proxy (4000) β†’ ColPali Backend (7860)
                                      β†˜ Vespa Cloud

API Endpoints

Search

  • POST /api/search - Search documents
    {
      "query": "annual report 2023",
      "limit": 10,
      "ranking": "hybrid"
    }
    

Image Retrieval

  • GET /api/search/image/:docId/thumbnail - Get thumbnail image
  • GET /api/search/image/:docId/full - Get full-size image

Chat

  • POST /api/chat - Stream chat responses (SSE)
    {
      "messages": [{"role": "user", "content": "Tell me about..."}],
      "context": []
    }
    

Similarity Map

  • POST /api/search/similarity-map - Generate similarity visualization

Health

  • GET /health - Detailed health status
  • GET /health/live - Liveness probe
  • GET /health/ready - Readiness probe

Setup

Development

  1. Install dependencies:

    npm install
    
  2. Copy environment variables:

    cp .env.example .env
    
  3. Update .env with your configuration

  4. Run in development mode:

    npm run dev
    

Production

  1. Build:

    npm run build
    
  2. Run:

    npm start
    

Docker

Build and run with Docker:

docker build -t colpali-hono-proxy .
docker run -p 4000:4000 --env-file .env colpali-hono-proxy

Or use docker-compose:

docker-compose up

Environment Variables

Variable Description Default
PORT Server port 4000
BACKEND_URL ColPali backend URL http://localhost:7860
CORS_ORIGIN Allowed CORS origin http://localhost:3000
ENABLE_CACHE Enable caching true
CACHE_TTL Cache TTL in seconds 300
RATE_LIMIT_WINDOW Rate limit window (ms) 60000
RATE_LIMIT_MAX Max requests per window 100

Integration with Next.js

Update your Next.js app to use the proxy:

// .env.local
NEXT_PUBLIC_API_URL=http://localhost:4000/api

// API calls
const response = await fetch(`${process.env.NEXT_PUBLIC_API_URL}/search`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ query, limit })
});

Caching Strategy

  • Search Results: Cached for 5 minutes (configurable)
  • Images: Cached for 24 hours
  • Cache Keys: Based on query parameters
  • Cache Headers: X-Cache: HIT/MISS

Rate Limiting

  • Default: 100 requests per minute per IP
  • Headers included:
    • X-RateLimit-Limit
    • X-RateLimit-Remaining
    • X-RateLimit-Reset

Monitoring

The proxy includes:

  • Request logging with correlation IDs
  • Performance timing
  • Error tracking
  • Health endpoints for monitoring

Deployment Options

Railway/Fly.io

# fly.toml
app = "colpali-proxy"
primary_region = "ord"

[http_service]
  internal_port = 4000
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: colpali-proxy
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: proxy
        image: colpali-proxy:latest
        ports:
        - containerPort: 4000
        livenessProbe:
          httpGet:
            path: /health/live
            port: 4000
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 4000

Performance

  • Built with Hono for maximum performance
  • Efficient streaming for SSE
  • Connection pooling for backend requests
  • In-memory caching reduces backend load
  • Brotli/gzip compression enabled

Security

  • Rate limiting prevents abuse
  • Secure headers enabled
  • CORS properly configured
  • Request ID tracking
  • No sensitive data logging