File size: 4,565 Bytes
5dfbe50
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
# ColPali Hono Proxy Server

A high-performance proxy server built with Hono that sits between your Next.js frontend and the ColPali/Vespa backend. This proxy handles caching, rate limiting, CORS, and provides a clean API interface.

## Features

- **Image Retrieval**: Serves base64 images from Vespa as actual image files with proper caching
- **Search Proxy**: Forwards search requests with result caching
- **Chat SSE Proxy**: Handles Server-Sent Events for streaming chat responses
- **Rate Limiting**: Protects backend from overload
- **Caching**: In-memory cache for search results and images
- **Health Checks**: Kubernetes-ready health endpoints
- **CORS Handling**: Configurable CORS for frontend integration
- **Request Logging**: Detailed request/response logging with request IDs

## Architecture

```
Next.js App (3000) β†’ Hono Proxy (4000) β†’ ColPali Backend (7860)
                                      β†˜ Vespa Cloud
```

## API Endpoints

### Search
- `POST /api/search` - Search documents
  ```json
  {
    "query": "annual report 2023",
    "limit": 10,
    "ranking": "hybrid"
  }
  ```

### Image Retrieval
- `GET /api/search/image/:docId/thumbnail` - Get thumbnail image
- `GET /api/search/image/:docId/full` - Get full-size image

### Chat
- `POST /api/chat` - Stream chat responses (SSE)
  ```json
  {
    "messages": [{"role": "user", "content": "Tell me about..."}],
    "context": []
  }
  ```

### Similarity Map
- `POST /api/search/similarity-map` - Generate similarity visualization

### Health
- `GET /health` - Detailed health status
- `GET /health/live` - Liveness probe
- `GET /health/ready` - Readiness probe

## Setup

### Development

1. Install dependencies:
   ```bash
   npm install
   ```

2. Copy environment variables:
   ```bash
   cp .env.example .env
   ```

3. Update `.env` with your configuration

4. Run in development mode:
   ```bash
   npm run dev
   ```

### Production

1. Build:
   ```bash
   npm run build
   ```

2. Run:
   ```bash
   npm start
   ```

### Docker

Build and run with Docker:
```bash
docker build -t colpali-hono-proxy .
docker run -p 4000:4000 --env-file .env colpali-hono-proxy
```

Or use docker-compose:
```bash
docker-compose up
```

## Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `PORT` | Server port | 4000 |
| `BACKEND_URL` | ColPali backend URL | http://localhost:7860 |
| `CORS_ORIGIN` | Allowed CORS origin | http://localhost:3000 |
| `ENABLE_CACHE` | Enable caching | true |
| `CACHE_TTL` | Cache TTL in seconds | 300 |
| `RATE_LIMIT_WINDOW` | Rate limit window (ms) | 60000 |
| `RATE_LIMIT_MAX` | Max requests per window | 100 |

## Integration with Next.js

Update your Next.js app to use the proxy:

```typescript
// .env.local
NEXT_PUBLIC_API_URL=http://localhost:4000/api

// API calls
const response = await fetch(`${process.env.NEXT_PUBLIC_API_URL}/search`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ query, limit })
});
```

## Caching Strategy

- **Search Results**: Cached for 5 minutes (configurable)
- **Images**: Cached for 24 hours
- **Cache Keys**: Based on query parameters
- **Cache Headers**: `X-Cache: HIT/MISS`

## Rate Limiting

- Default: 100 requests per minute per IP
- Headers included:
  - `X-RateLimit-Limit`
  - `X-RateLimit-Remaining`
  - `X-RateLimit-Reset`

## Monitoring

The proxy includes:
- Request logging with correlation IDs
- Performance timing
- Error tracking
- Health endpoints for monitoring

## Deployment Options

### Railway/Fly.io
```toml
# fly.toml
app = "colpali-proxy"
primary_region = "ord"

[http_service]
  internal_port = 4000
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
```

### Kubernetes
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: colpali-proxy
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: proxy
        image: colpali-proxy:latest
        ports:
        - containerPort: 4000
        livenessProbe:
          httpGet:
            path: /health/live
            port: 4000
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 4000
```

## Performance

- Built with Hono for maximum performance
- Efficient streaming for SSE
- Connection pooling for backend requests
- In-memory caching reduces backend load
- Brotli/gzip compression enabled

## Security

- Rate limiting prevents abuse
- Secure headers enabled
- CORS properly configured
- Request ID tracking
- No sensitive data logging