Spaces:

Tonic
/

l-operator-demo

Running on Zero

App Files Files Community

l-operator-demo / README.md

Joseph Pollack

adds demo

23d4aef unverified 10 days ago

preview code

raw

history blame contribute delete

10.2 kB

	---
	title: L Operator Demo
	emoji: 📊
	colorFrom: purple
	colorTo: green
	sdk: gradio
	sdk_version: 5.44.0
	app_file: app.py
	pinned: true
	license: gpl
	short_description: demo of l-operator with no commands
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	# 🤖 L-Operator: Android Device Control Demo

	A complete multimodal Gradio demo for the [L-Operator model](https://huggingface.co/Tonic/l-android-control), a fine-tuned multimodal AI agent based on LiquidAI's LFM2-VL-1.6B model, optimized for Android device control through visual understanding and action generation.

	## 🌟 Features

	- Multimodal Interface: Upload Android screenshots and provide text instructions
	- Chat Interface: Interactive chat with the model using Gradio's ChatInterface component
	- Action Generation: Generate JSON actions for Android device control
	- Example Episodes: Pre-loaded examples from extracted training episodes
	- Real-time Processing: Optimized for real-time inference
	- Beautiful UI: Modern, responsive interface with comprehensive documentation
	- ⚡ ZeroGPU Compatible: Dynamic GPU allocation for cost-effective deployment

	## 📋 Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base Model \| [LiquidAI/LFM2-VL-1.6B](https://huggingface.co/LiquidAI/LFM2-VL-1.6B) \|
	\| Architecture \| LFM2-VL (1.6B parameters) \|
	\| Fine-tuning \| LoRA (Low-Rank Adaptation) \|
	\| Training Data \| Android control episodes with screenshots and actions \|
	\| License \| Proprietary (Investment Access Required) \|

	## 🚀 Quick Start

	### Prerequisites

	1. Python 3.8+: Ensure you have Python 3.8 or higher installed
	2. Hugging Face Access: Request access to the [L-Operator model](https://huggingface.co/Tonic/l-android-control)
	3. Authentication: Login to Hugging Face using `huggingface-cli login`

	### Installation

	1. Clone the repository:
	```bash
	git clone <repository-url>
	cd l-operator-demo
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Authenticate with Hugging Face:
	```bash
	huggingface-cli login
	```

	### Running the Demo

	1. Start the demo:
	```bash
	python app.py
	```

	2. Open your browser and navigate to `http://localhost:7860`

	3. Load the model by clicking the "🚀 Load L-Operator Model" button

	4. Upload an Android screenshot and provide instructions

	5. Generate actions or use the chat interface

	## ⚡ ZeroGPU Deployment

	This demo is optimized for [Hugging Face Spaces ZeroGPU](https://huggingface.co/docs/hub/spaces-zerogpu), providing dynamic GPU allocation for cost-effective deployment.

	### ZeroGPU Features

	- 🆓 Free GPU Access: Dynamic NVIDIA H200 GPU allocation
	- ⚡ On-Demand Resources: GPUs allocated only when needed
	- 💰 Cost Efficient: Optimized resource utilization
	- 🔄 Multi-GPU Support: Leverage multiple GPUs concurrently
	- 🛡️ Automatic Management: Resources released after function completion

	### ZeroGPU Specifications

	\| Specification \| Value \|
	\|---------------\|-------\|
	\| GPU Type \| NVIDIA H200 slice \|
	\| Available VRAM \| 70GB per workload \|
	\| Supported Gradio \| 4+ \|
	\| Supported PyTorch \| 2.1.2, 2.2.2, 2.4.0, 2.5.1 \|
	\| Supported Python \| 3.10.13 \|
	\| Function Duration \| Up to 120 seconds per request \|

	### Deploying to Hugging Face Spaces

	1. Create a new Space on Hugging Face:
	- Choose Gradio SDK
	- Select ZeroGPU in hardware options
	- Upload your code

	2. Space Configuration:
	```yaml
	# app.py is automatically detected
	# requirements.txt is automatically installed
	# ZeroGPU is automatically configured
	```

	3. Access Requirements:
	- Personal accounts: PRO subscription required
	- Organizations: Enterprise Hub subscription required
	- Usage limits: 10 Spaces (personal) / 50 Spaces (organization)

	### ZeroGPU Integration Details

	The demo automatically detects ZeroGPU availability and optimizes accordingly:

	```python
	# Automatic ZeroGPU detection
	try:
	import spaces
	ZEROGPU_AVAILABLE = True
	except ImportError:
	ZEROGPU_AVAILABLE = False

	# GPU-optimized functions
	@spaces.GPU(duration=120) # 2 minutes for action generation
	def generate_action(self, image, goal, instruction):
	# GPU-accelerated inference
	pass

	@spaces.GPU(duration=90) # 1.5 minutes for chat responses
	def chat_with_model(self, message, history, image):
	# Interactive chat with GPU acceleration
	pass
	```

	## 🎯 How to Use

	### Basic Usage

	1. Load Model: Click "🚀 Load L-Operator Model" to initialize the model
	2. Upload Screenshot: Upload an Android device screenshot
	3. Provide Instructions:
	- Goal: Describe what you want to achieve
	- Step: Provide specific step instructions
	4. Generate Action: Click "🎯 Generate Action" to get JSON output

	### Chat Interface

	1. Upload Screenshot: Upload an Android screenshot
	2. Send Message: Use structured format:
	```
	Goal: Open the Settings app and navigate to Display settings
	Step: Tap on the Settings app icon on the home screen
	```
	3. Get Response: The model will generate JSON actions

	### Example Episodes

	The demo includes pre-loaded examples from the training episodes:

	- Episode 13: Cruise deals app navigation
	- Episode 53: Pinterest search for sustainability art
	- Episode 73: Moon phases app usage

	## 📊 Expected Output Format

	The model generates JSON actions in the following format:

	```json
	{
	"action_type": "tap",
	"x": 540,
	"y": 1200,
	"text": "Settings",
	"app_name": "com.android.settings",
	"confidence": 0.92
	}
	```

	### Action Types

	- `tap`: Tap at specific coordinates
	- `click`: Click at specific coordinates
	- `scroll`: Scroll in a direction (up/down/left/right)
	- `input_text`: Input text
	- `open_app`: Open a specific app
	- `wait`: Wait for a moment

	## 🛠️ Technical Details

	### Model Configuration

	- Device: Automatically detects CUDA/CPU
	- Precision: bfloat16 for CUDA, float32 for CPU
	- Generation: Temperature 0.7, Top-p 0.9
	- Max Tokens: 128 for action generation

	### Architecture

	- Base Model: LFM2-VL-1.6B from LiquidAI
	- Fine-tuning: LoRA with rank 16, alpha 32
	- Target Modules: q_proj, v_proj, fc1, fc2, linear, gate_proj, up_proj, down_proj

	### Performance

	- Model Size: ~1.6B parameters
	- Memory Usage: ~4GB VRAM (CUDA) / ~8GB RAM (CPU)
	- Inference Speed: Optimized for real-time use
	- Accuracy: 98% action accuracy on test episodes

	## 🎯 Use Cases

	### 1. Mobile App Testing
	- Automated UI testing for Android applications
	- Cross-device compatibility validation
	- Regression testing with visual verification

	### 2. Accessibility Applications
	- Voice-controlled device navigation
	- Assistive technology integration
	- Screen reader enhancement tools

	### 3. Remote Support
	- Remote device troubleshooting
	- Automated device configuration
	- Support ticket automation

	### 4. Development Workflows
	- UI/UX testing automation
	- User flow validation
	- Performance testing integration

	## ⚠️ Important Notes

	### Access Requirements

	- Investment Access: This model is proprietary technology available exclusively to qualified investors under NDA
	- Authentication Required: Must be authenticated with Hugging Face
	- Evaluation Only: Access granted solely for investment evaluation purposes
	- Confidentiality: All technical details are confidential

	### ZeroGPU Limitations

	- Compatibility: Currently exclusive to Gradio SDK
	- PyTorch Versions: Limited to supported versions (2.1.2, 2.2.2, 2.4.0, 2.5.1)
	- Function Duration: Maximum 60 seconds default, customizable up to 120 seconds
	- Queue Priority: PRO users get x5 more daily usage and highest priority

	### General Limitations

	- Market Hours: Some features may be limited during market hours
	- Device Requirements: Requires sufficient RAM/VRAM for model loading
	- Network: Requires internet connection for model download
	- Authentication: Must have approved access to the model

	## 🔧 Troubleshooting

	### Common Issues

	1. Model Loading Error:
	- Ensure you're authenticated: `huggingface-cli login`
	- Check internet connection
	- Verify model access approval

	2. Memory Issues:
	- Use CPU if GPU memory is insufficient
	- Close other applications
	- Consider using smaller batch sizes

	3. Authentication Errors:
	- Re-login to Hugging Face
	- Check access approval status
	- Contact support if issues persist

	4. ZeroGPU Issues:
	- Verify ZeroGPU is selected in Space settings
	- Check PyTorch version compatibility
	- Ensure function duration is within limits

	### Performance Optimization

	- GPU Usage: Use CUDA for faster inference
	- Memory Management: Monitor VRAM usage
	- Batch Processing: Process multiple images efficiently
	- ZeroGPU Optimization: Specify appropriate function durations

	## 📞 Support

	- Investment Inquiries: For investment-related questions and due diligence
	- Technical Support: For technical issues with the demo
	- Model Access: For access requests to the L-Operator model
	- ZeroGPU Support: [ZeroGPU Documentation](https://huggingface.co/docs/hub/spaces-zerogpu)

	## 📄 License

	This demo is provided under the same terms as the L-Operator model:

	- Proprietary Technology: Owned by Tonic
	- Investment Evaluation: Access granted solely for investment evaluation
	- NDA Required: All access is subject to Non-Disclosure Agreement
	- No Commercial Use: Without written consent

	## 🙏 Acknowledgments

	- LiquidAI: For the base LFM2-VL model
	- Hugging Face: For the transformers library, hosting, and ZeroGPU infrastructure
	- Gradio: For the excellent UI framework

	## 🔗 Links

	- [L-Operator Model](https://huggingface.co/Tonic/l-android-control)
	- [Base Model (LFM2-VL-1.6B)](https://huggingface.co/LiquidAI/LFM2-VL-1.6B)
	- [ZeroGPU Documentation](https://huggingface.co/docs/hub/spaces-zerogpu)
	- [LiquidAI](https://liquid.ai/)
	- [Tonic](https://tonic.ai/)

	---

	Made with ❤️ by Tonic