l-operator-demo / README.md
Joseph Pollack
adds demo
23d4aef unverified

A newer version of the Gradio SDK is available: 5.44.1

Upgrade
metadata
title: L Operator Demo
emoji: πŸ“Š
colorFrom: purple
colorTo: green
sdk: gradio
sdk_version: 5.44.0
app_file: app.py
pinned: true
license: gpl
short_description: demo of l-operator with no commands

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

πŸ€– L-Operator: Android Device Control Demo

A complete multimodal Gradio demo for the L-Operator model, a fine-tuned multimodal AI agent based on LiquidAI's LFM2-VL-1.6B model, optimized for Android device control through visual understanding and action generation.

🌟 Features

  • Multimodal Interface: Upload Android screenshots and provide text instructions
  • Chat Interface: Interactive chat with the model using Gradio's ChatInterface component
  • Action Generation: Generate JSON actions for Android device control
  • Example Episodes: Pre-loaded examples from extracted training episodes
  • Real-time Processing: Optimized for real-time inference
  • Beautiful UI: Modern, responsive interface with comprehensive documentation
  • ⚑ ZeroGPU Compatible: Dynamic GPU allocation for cost-effective deployment

πŸ“‹ Model Details

Property Value
Base Model LiquidAI/LFM2-VL-1.6B
Architecture LFM2-VL (1.6B parameters)
Fine-tuning LoRA (Low-Rank Adaptation)
Training Data Android control episodes with screenshots and actions
License Proprietary (Investment Access Required)

πŸš€ Quick Start

Prerequisites

  1. Python 3.8+: Ensure you have Python 3.8 or higher installed
  2. Hugging Face Access: Request access to the L-Operator model
  3. Authentication: Login to Hugging Face using huggingface-cli login

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd l-operator-demo
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Authenticate with Hugging Face:

    huggingface-cli login
    

Running the Demo

  1. Start the demo:

    python app.py
    
  2. Open your browser and navigate to http://localhost:7860

  3. Load the model by clicking the "πŸš€ Load L-Operator Model" button

  4. Upload an Android screenshot and provide instructions

  5. Generate actions or use the chat interface

⚑ ZeroGPU Deployment

This demo is optimized for Hugging Face Spaces ZeroGPU, providing dynamic GPU allocation for cost-effective deployment.

ZeroGPU Features

  • πŸ†“ Free GPU Access: Dynamic NVIDIA H200 GPU allocation
  • ⚑ On-Demand Resources: GPUs allocated only when needed
  • πŸ’° Cost Efficient: Optimized resource utilization
  • πŸ”„ Multi-GPU Support: Leverage multiple GPUs concurrently
  • πŸ›‘οΈ Automatic Management: Resources released after function completion

ZeroGPU Specifications

Specification Value
GPU Type NVIDIA H200 slice
Available VRAM 70GB per workload
Supported Gradio 4+
Supported PyTorch 2.1.2, 2.2.2, 2.4.0, 2.5.1
Supported Python 3.10.13
Function Duration Up to 120 seconds per request

Deploying to Hugging Face Spaces

  1. Create a new Space on Hugging Face:

    • Choose Gradio SDK
    • Select ZeroGPU in hardware options
    • Upload your code
  2. Space Configuration:

    # app.py is automatically detected
    # requirements.txt is automatically installed
    # ZeroGPU is automatically configured
    
  3. Access Requirements:

    • Personal accounts: PRO subscription required
    • Organizations: Enterprise Hub subscription required
    • Usage limits: 10 Spaces (personal) / 50 Spaces (organization)

ZeroGPU Integration Details

The demo automatically detects ZeroGPU availability and optimizes accordingly:

# Automatic ZeroGPU detection
try:
    import spaces
    ZEROGPU_AVAILABLE = True
except ImportError:
    ZEROGPU_AVAILABLE = False

# GPU-optimized functions
@spaces.GPU(duration=120)  # 2 minutes for action generation
def generate_action(self, image, goal, instruction):
    # GPU-accelerated inference
    pass

@spaces.GPU(duration=90)   # 1.5 minutes for chat responses
def chat_with_model(self, message, history, image):
    # Interactive chat with GPU acceleration
    pass

🎯 How to Use

Basic Usage

  1. Load Model: Click "πŸš€ Load L-Operator Model" to initialize the model
  2. Upload Screenshot: Upload an Android device screenshot
  3. Provide Instructions:
    • Goal: Describe what you want to achieve
    • Step: Provide specific step instructions
  4. Generate Action: Click "🎯 Generate Action" to get JSON output

Chat Interface

  1. Upload Screenshot: Upload an Android screenshot
  2. Send Message: Use structured format:
    Goal: Open the Settings app and navigate to Display settings
    Step: Tap on the Settings app icon on the home screen
    
  3. Get Response: The model will generate JSON actions

Example Episodes

The demo includes pre-loaded examples from the training episodes:

  • Episode 13: Cruise deals app navigation
  • Episode 53: Pinterest search for sustainability art
  • Episode 73: Moon phases app usage

πŸ“Š Expected Output Format

The model generates JSON actions in the following format:

{
  "action_type": "tap",
  "x": 540,
  "y": 1200,
  "text": "Settings",
  "app_name": "com.android.settings",
  "confidence": 0.92
}

Action Types

  • tap: Tap at specific coordinates
  • click: Click at specific coordinates
  • scroll: Scroll in a direction (up/down/left/right)
  • input_text: Input text
  • open_app: Open a specific app
  • wait: Wait for a moment

πŸ› οΈ Technical Details

Model Configuration

  • Device: Automatically detects CUDA/CPU
  • Precision: bfloat16 for CUDA, float32 for CPU
  • Generation: Temperature 0.7, Top-p 0.9
  • Max Tokens: 128 for action generation

Architecture

  • Base Model: LFM2-VL-1.6B from LiquidAI
  • Fine-tuning: LoRA with rank 16, alpha 32
  • Target Modules: q_proj, v_proj, fc1, fc2, linear, gate_proj, up_proj, down_proj

Performance

  • Model Size: ~1.6B parameters
  • Memory Usage: ~4GB VRAM (CUDA) / ~8GB RAM (CPU)
  • Inference Speed: Optimized for real-time use
  • Accuracy: 98% action accuracy on test episodes

🎯 Use Cases

1. Mobile App Testing

  • Automated UI testing for Android applications
  • Cross-device compatibility validation
  • Regression testing with visual verification

2. Accessibility Applications

  • Voice-controlled device navigation
  • Assistive technology integration
  • Screen reader enhancement tools

3. Remote Support

  • Remote device troubleshooting
  • Automated device configuration
  • Support ticket automation

4. Development Workflows

  • UI/UX testing automation
  • User flow validation
  • Performance testing integration

⚠️ Important Notes

Access Requirements

  • Investment Access: This model is proprietary technology available exclusively to qualified investors under NDA
  • Authentication Required: Must be authenticated with Hugging Face
  • Evaluation Only: Access granted solely for investment evaluation purposes
  • Confidentiality: All technical details are confidential

ZeroGPU Limitations

  • Compatibility: Currently exclusive to Gradio SDK
  • PyTorch Versions: Limited to supported versions (2.1.2, 2.2.2, 2.4.0, 2.5.1)
  • Function Duration: Maximum 60 seconds default, customizable up to 120 seconds
  • Queue Priority: PRO users get x5 more daily usage and highest priority

General Limitations

  • Market Hours: Some features may be limited during market hours
  • Device Requirements: Requires sufficient RAM/VRAM for model loading
  • Network: Requires internet connection for model download
  • Authentication: Must have approved access to the model

πŸ”§ Troubleshooting

Common Issues

  1. Model Loading Error:

    • Ensure you're authenticated: huggingface-cli login
    • Check internet connection
    • Verify model access approval
  2. Memory Issues:

    • Use CPU if GPU memory is insufficient
    • Close other applications
    • Consider using smaller batch sizes
  3. Authentication Errors:

    • Re-login to Hugging Face
    • Check access approval status
    • Contact support if issues persist
  4. ZeroGPU Issues:

    • Verify ZeroGPU is selected in Space settings
    • Check PyTorch version compatibility
    • Ensure function duration is within limits

Performance Optimization

  • GPU Usage: Use CUDA for faster inference
  • Memory Management: Monitor VRAM usage
  • Batch Processing: Process multiple images efficiently
  • ZeroGPU Optimization: Specify appropriate function durations

πŸ“ž Support

  • Investment Inquiries: For investment-related questions and due diligence
  • Technical Support: For technical issues with the demo
  • Model Access: For access requests to the L-Operator model
  • ZeroGPU Support: ZeroGPU Documentation

πŸ“„ License

This demo is provided under the same terms as the L-Operator model:

  • Proprietary Technology: Owned by Tonic
  • Investment Evaluation: Access granted solely for investment evaluation
  • NDA Required: All access is subject to Non-Disclosure Agreement
  • No Commercial Use: Without written consent

πŸ™ Acknowledgments

  • LiquidAI: For the base LFM2-VL model
  • Hugging Face: For the transformers library, hosting, and ZeroGPU infrastructure
  • Gradio: For the excellent UI framework

πŸ”— Links


Made with ❀️ by Tonic