Spaces:

vk98
/

colpali-backend-api

Running

File size: 4,565 Bytes

5dfbe50

# ColPali Hono Proxy Server

A high-performance proxy server built with Hono that sits between your Next.js frontend and the ColPali/Vespa backend. This proxy handles caching, rate limiting, CORS, and provides a clean API interface.

## Features

- **Image Retrieval**: Serves base64 images from Vespa as actual image files with proper caching
- **Search Proxy**: Forwards search requests with result caching
- **Chat SSE Proxy**: Handles Server-Sent Events for streaming chat responses
- **Rate Limiting**: Protects backend from overload
- **Caching**: In-memory cache for search results and images
- **Health Checks**: Kubernetes-ready health endpoints
- **CORS Handling**: Configurable CORS for frontend integration
- **Request Logging**: Detailed request/response logging with request IDs

## Architecture

```
Next.js App (3000) → Hono Proxy (4000) → ColPali Backend (7860)
                                      ↘ Vespa Cloud
```

## API Endpoints

### Search
- `POST /api/search` - Search documents
  ```json
  {
    "query": "annual report 2023",
    "limit": 10,
    "ranking": "hybrid"
  }
  ```

### Image Retrieval
- `GET /api/search/image/:docId/thumbnail` - Get thumbnail image
- `GET /api/search/image/:docId/full` - Get full-size image

### Chat
- `POST /api/chat` - Stream chat responses (SSE)
  ```json
  {
    "messages": [{"role": "user", "content": "Tell me about..."}],
    "context": []
  }
  ```

### Similarity Map
- `POST /api/search/similarity-map` - Generate similarity visualization

### Health
- `GET /health` - Detailed health status
- `GET /health/live` - Liveness probe
- `GET /health/ready` - Readiness probe

## Setup

### Development

1. Install dependencies:
   ```bash
   npm install
   ```

2. Copy environment variables:
   ```bash
   cp .env.example .env
   ```

3. Update `.env` with your configuration

4. Run in development mode:
   ```bash
   npm run dev
   ```

### Production

1. Build:
   ```bash
   npm run build
   ```

2. Run:
   ```bash
   npm start
   ```

### Docker

Build and run with Docker:
```bash
docker build -t colpali-hono-proxy .
docker run -p 4000:4000 --env-file .env colpali-hono-proxy
```

Or use docker-compose:
```bash
docker-compose up
```

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `PORT` | Server port | 4000 |
| `BACKEND_URL` | ColPali backend URL | http://localhost:7860 |
| `CORS_ORIGIN` | Allowed CORS origin | http://localhost:3000 |
| `ENABLE_CACHE` | Enable caching | true |
| `CACHE_TTL` | Cache TTL in seconds | 300 |
| `RATE_LIMIT_WINDOW` | Rate limit window (ms) | 60000 |
| `RATE_LIMIT_MAX` | Max requests per window | 100 |

## Integration with Next.js

Update your Next.js app to use the proxy:

```typescript
// .env.local
NEXT_PUBLIC_API_URL=http://localhost:4000/api

// API calls
const response = await fetch(`${process.env.NEXT_PUBLIC_API_URL}/search`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ query, limit })
});
```

## Caching Strategy

- **Search Results**: Cached for 5 minutes (configurable)
- **Images**: Cached for 24 hours
- **Cache Keys**: Based on query parameters
- **Cache Headers**: `X-Cache: HIT/MISS`

## Rate Limiting

- Default: 100 requests per minute per IP
- Headers included:
  - `X-RateLimit-Limit`
  - `X-RateLimit-Remaining`
  - `X-RateLimit-Reset`

## Monitoring

The proxy includes:
- Request logging with correlation IDs
- Performance timing
- Error tracking
- Health endpoints for monitoring

## Deployment Options

### Railway/Fly.io
```toml
# fly.toml
app = "colpali-proxy"
primary_region = "ord"

[http_service]
  internal_port = 4000
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
```

### Kubernetes
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: colpali-proxy
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: proxy
        image: colpali-proxy:latest
        ports:
        - containerPort: 4000
        livenessProbe:
          httpGet:
            path: /health/live
            port: 4000
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 4000
```

## Performance

- Built with Hono for maximum performance
- Efficient streaming for SSE
- Connection pooling for backend requests
- In-memory caching reduces backend load
- Brotli/gzip compression enabled

## Security

- Rate limiting prevents abuse
- Secure headers enabled
- CORS properly configured
- Request ID tracking
- No sensitive data logging