You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

πŸ¦™ Unlimited Llama - AI Desktop Assistant AiLo Core

Unlimited Llama

A complete AI desktop assistant with chat, web search, speech synthesis, and OCR.

Python License Windows Download on Hugging Face


✨ Features

  • πŸ’¬ Smart chat with local GGUF models
  • 🌐 Integrated web search for up-to-date information
  • πŸ”Š Text-to-Speech (TTS) and Speech Recognition (STT)
  • πŸ“· OCR to extract text from images
  • πŸ’Ύ Advanced session management
  • πŸŽ›οΈ Supports any LLM model size
  • πŸ”Œ OpenAI-compatible API server
  • πŸ“€ Export in JSON, TXT, and Markdown
  • 🌐 Integrated distribuited computing

πŸ“¦ Download

Download on Hugging Face


πŸš€ Quick Start Guide

First Launch

  1. Load a model β†’ πŸ€– Model β†’ πŸ“ Load Model
  2. Start chatting β†’ type in the box below and press Enter
  3. Sessions are saved automatically

πŸ” Web Search

  • Enable/disable using the 🌐 Web Search toggle
  • Automatically searches for news, recent info, or local data
  • Displays the sources used

πŸ”Š Speech Synthesis (TTS)

  • Enable via πŸ”Š TTS in the sidebar
  • The assistant reads responses aloud
  • Use πŸ”‡ STOP to interrupt

🎀 Speech Recognition

  • 🎀 Voice Input for single input
  • 🎀 Start Listening for continuous mode

πŸ“· OCR from Images

  • Click πŸ“· Image OCR
  • Select an image (PNG, JPG, etc.)
  • Extracted text is automatically inserted into the chat

πŸ› οΈ Troubleshooting

❌ β€œModel not found”

  • Make sure the GGUF file is in the /models folder
  • Verify the file format is .gguf
  • Check that you have enough disk space

❌ β€œTesseract not found”

  • Install Tesseract OCR following the instructions below
  • Restart the application after installation


βš™οΈ Configuration Memory Optimization Memory Mapping (MMAP) What it does: Maps model directly from disk instead of loading entirely into RAM

Benefits: Reduces RAM usage by up to 70%, faster startup

Use when: Limited RAM, large models (>7GB)

Performance: Slightly slower inference, much less RAM usage

Memory Locking (MLOCK) What it does: Locks model in RAM preventing swap to disk

Benefits: Maximum performance, consistent response times

Use when: Abundant RAM, performance-critical applications

Performance: Fastest inference, permanent RAM occupation

βš™οΈ System Requirements

Minimum

  • OS: Windows 10/11, macOS 10.15+, Linux (Ubuntu 18.04+)
  • RAM: 8 GB (16 GB recommended)
  • Disk Space: 2 GB + space for models
  • CPU: Modern 64-bit processor

Recommended

  • RAM: 16 GB+ for large models
  • GPU: NVIDIA/AMD with CUDA or Metal (optional)
  • Disk Space: 10 GB+ for large models

πŸ”§ Installation

1. Install Tesseract OCR (Required for OCR)

Windows

# Using Chocolatey (recommended)
choco install tesseract
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support