Spaces:

Prashant26am
/

llava-chat

Sleeping

App Files Files Community

llava-chat / docs /api /README.md

Prashant26am

fix: Update Gradio to 4.44.1 and improve interface

e5d40e3 3 months ago

preview code

raw

history blame contribute delete

3.05 kB

	# LLaVA API Documentation

	## Overview

	The LLaVA API provides a simple interface for interacting with the LLaVA model through a Gradio web interface. The API allows users to upload images and receive AI-generated responses about the image content.

	## API Endpoints

	### Web Interface

	The main interface is served at the root URL (`/`) and provides the following components:

	#### Input Components

	1. Image Upload
	- Type: Image uploader
	- Format: PIL Image
	- Purpose: Upload an image for analysis

	2. Prompt Input
	- Type: Text input
	- Purpose: Enter questions or prompts about the image
	- Default placeholder: "What can you see in this image?"

	3. Generation Parameters
	- Max New Tokens (64-2048, default: 512)
	- Temperature (0.1-1.0, default: 0.7)
	- Top P (0.1-1.0, default: 0.9)

	#### Output Components

	1. Response
	- Type: Text output
	- Purpose: Displays the model's response
	- Features: Copy button, scrollable

	## Usage Examples

	### Basic Usage

	1. Upload an image using the image uploader
	2. Enter a prompt in the text input
	3. Click "Generate Response"
	4. View the response in the output box

	### Example Prompts

	- "What can you see in this image?"
	- "Describe this scene in detail"
	- "What emotions does this image convey?"
	- "What's happening in this picture?"
	- "Can you identify any objects or people in this image?"

	## Error Handling

	The API handles various error cases:

	1. Invalid Images
	- Returns an error message if the image is invalid or corrupted
	- Supports common image formats (JPEG, PNG, etc.)

	2. Empty Prompts
	- Returns an error message if no prompt is provided
	- Prompts should be non-empty strings

	3. Model Errors
	- Returns descriptive error messages for model-related issues
	- Includes logging for debugging

	## Configuration

	The API can be configured through environment variables or the settings file:

	- `API_HOST`: Server host (default: "0.0.0.0")
	- `API_PORT`: Server port (default: 7860)
	- `GRADIO_THEME`: Interface theme (default: "soft")
	- `DEFAULT_MAX_NEW_TOKENS`: Default token limit (default: 512)
	- `DEFAULT_TEMPERATURE`: Default temperature (default: 0.7)
	- `DEFAULT_TOP_P`: Default top-p value (default: 0.9)

	## Development

	### Running Locally

	```bash
	python src/api/app.py
	```

	### Running Tests

	```bash
	pytest tests/
	```

	### Code Style

	The project follows PEP 8 guidelines. To check your code:

	```bash
	flake8 src/
	black src/
	```

	## Security Considerations

	1. The API is designed for public use but should be deployed behind appropriate security measures
	2. Input validation is performed on all user inputs
	3. Large file uploads are handled safely
	4. Error messages are sanitized to prevent information leakage

	## Rate Limiting

	Currently, no rate limiting is implemented. Consider implementing rate limiting for production deployments.

	## Future Improvements

	1. Add authentication
	2. Implement rate limiting
	3. Add batch processing capabilities
	4. Support for video input
	5. Real-time streaming responses