File size: 3,051 Bytes
e5d40e3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# LLaVA API Documentation

## Overview

The LLaVA API provides a simple interface for interacting with the LLaVA model through a Gradio web interface. The API allows users to upload images and receive AI-generated responses about the image content.

## API Endpoints

### Web Interface

The main interface is served at the root URL (`/`) and provides the following components:

#### Input Components

1. **Image Upload**
   - Type: Image uploader
   - Format: PIL Image
   - Purpose: Upload an image for analysis

2. **Prompt Input**
   - Type: Text input
   - Purpose: Enter questions or prompts about the image
   - Default placeholder: "What can you see in this image?"

3. **Generation Parameters**
   - Max New Tokens (64-2048, default: 512)
   - Temperature (0.1-1.0, default: 0.7)
   - Top P (0.1-1.0, default: 0.9)

#### Output Components

1. **Response**
   - Type: Text output
   - Purpose: Displays the model's response
   - Features: Copy button, scrollable

## Usage Examples

### Basic Usage

1. Upload an image using the image uploader
2. Enter a prompt in the text input
3. Click "Generate Response"
4. View the response in the output box

### Example Prompts

- "What can you see in this image?"
- "Describe this scene in detail"
- "What emotions does this image convey?"
- "What's happening in this picture?"
- "Can you identify any objects or people in this image?"

## Error Handling

The API handles various error cases:

1. **Invalid Images**
   - Returns an error message if the image is invalid or corrupted
   - Supports common image formats (JPEG, PNG, etc.)

2. **Empty Prompts**
   - Returns an error message if no prompt is provided
   - Prompts should be non-empty strings

3. **Model Errors**
   - Returns descriptive error messages for model-related issues
   - Includes logging for debugging

## Configuration

The API can be configured through environment variables or the settings file:

- `API_HOST`: Server host (default: "0.0.0.0")
- `API_PORT`: Server port (default: 7860)
- `GRADIO_THEME`: Interface theme (default: "soft")
- `DEFAULT_MAX_NEW_TOKENS`: Default token limit (default: 512)
- `DEFAULT_TEMPERATURE`: Default temperature (default: 0.7)
- `DEFAULT_TOP_P`: Default top-p value (default: 0.9)

## Development

### Running Locally

```bash
python src/api/app.py
```

### Running Tests

```bash
pytest tests/
```

### Code Style

The project follows PEP 8 guidelines. To check your code:

```bash
flake8 src/
black src/
```

## Security Considerations

1. The API is designed for public use but should be deployed behind appropriate security measures
2. Input validation is performed on all user inputs
3. Large file uploads are handled safely
4. Error messages are sanitized to prevent information leakage

## Rate Limiting

Currently, no rate limiting is implemented. Consider implementing rate limiting for production deployments.

## Future Improvements

1. Add authentication
2. Implement rate limiting
3. Add batch processing capabilities
4. Support for video input
5. Real-time streaming responses