Spaces:
Running
Running
File size: 4,565 Bytes
5dfbe50 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
# ColPali Hono Proxy Server
A high-performance proxy server built with Hono that sits between your Next.js frontend and the ColPali/Vespa backend. This proxy handles caching, rate limiting, CORS, and provides a clean API interface.
## Features
- **Image Retrieval**: Serves base64 images from Vespa as actual image files with proper caching
- **Search Proxy**: Forwards search requests with result caching
- **Chat SSE Proxy**: Handles Server-Sent Events for streaming chat responses
- **Rate Limiting**: Protects backend from overload
- **Caching**: In-memory cache for search results and images
- **Health Checks**: Kubernetes-ready health endpoints
- **CORS Handling**: Configurable CORS for frontend integration
- **Request Logging**: Detailed request/response logging with request IDs
## Architecture
```
Next.js App (3000) β Hono Proxy (4000) β ColPali Backend (7860)
β Vespa Cloud
```
## API Endpoints
### Search
- `POST /api/search` - Search documents
```json
{
"query": "annual report 2023",
"limit": 10,
"ranking": "hybrid"
}
```
### Image Retrieval
- `GET /api/search/image/:docId/thumbnail` - Get thumbnail image
- `GET /api/search/image/:docId/full` - Get full-size image
### Chat
- `POST /api/chat` - Stream chat responses (SSE)
```json
{
"messages": [{"role": "user", "content": "Tell me about..."}],
"context": []
}
```
### Similarity Map
- `POST /api/search/similarity-map` - Generate similarity visualization
### Health
- `GET /health` - Detailed health status
- `GET /health/live` - Liveness probe
- `GET /health/ready` - Readiness probe
## Setup
### Development
1. Install dependencies:
```bash
npm install
```
2. Copy environment variables:
```bash
cp .env.example .env
```
3. Update `.env` with your configuration
4. Run in development mode:
```bash
npm run dev
```
### Production
1. Build:
```bash
npm run build
```
2. Run:
```bash
npm start
```
### Docker
Build and run with Docker:
```bash
docker build -t colpali-hono-proxy .
docker run -p 4000:4000 --env-file .env colpali-hono-proxy
```
Or use docker-compose:
```bash
docker-compose up
```
## Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| `PORT` | Server port | 4000 |
| `BACKEND_URL` | ColPali backend URL | http://localhost:7860 |
| `CORS_ORIGIN` | Allowed CORS origin | http://localhost:3000 |
| `ENABLE_CACHE` | Enable caching | true |
| `CACHE_TTL` | Cache TTL in seconds | 300 |
| `RATE_LIMIT_WINDOW` | Rate limit window (ms) | 60000 |
| `RATE_LIMIT_MAX` | Max requests per window | 100 |
## Integration with Next.js
Update your Next.js app to use the proxy:
```typescript
// .env.local
NEXT_PUBLIC_API_URL=http://localhost:4000/api
// API calls
const response = await fetch(`${process.env.NEXT_PUBLIC_API_URL}/search`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ query, limit })
});
```
## Caching Strategy
- **Search Results**: Cached for 5 minutes (configurable)
- **Images**: Cached for 24 hours
- **Cache Keys**: Based on query parameters
- **Cache Headers**: `X-Cache: HIT/MISS`
## Rate Limiting
- Default: 100 requests per minute per IP
- Headers included:
- `X-RateLimit-Limit`
- `X-RateLimit-Remaining`
- `X-RateLimit-Reset`
## Monitoring
The proxy includes:
- Request logging with correlation IDs
- Performance timing
- Error tracking
- Health endpoints for monitoring
## Deployment Options
### Railway/Fly.io
```toml
# fly.toml
app = "colpali-proxy"
primary_region = "ord"
[http_service]
internal_port = 4000
force_https = true
auto_stop_machines = true
auto_start_machines = true
```
### Kubernetes
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: colpali-proxy
spec:
replicas: 3
template:
spec:
containers:
- name: proxy
image: colpali-proxy:latest
ports:
- containerPort: 4000
livenessProbe:
httpGet:
path: /health/live
port: 4000
readinessProbe:
httpGet:
path: /health/ready
port: 4000
```
## Performance
- Built with Hono for maximum performance
- Efficient streaming for SSE
- Connection pooling for backend requests
- In-memory caching reduces backend load
- Brotli/gzip compression enabled
## Security
- Rate limiting prevents abuse
- Secure headers enabled
- CORS properly configured
- Request ID tracking
- No sensitive data logging |