You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

🦙 Unlimited Llama - AI Desktop Assistant AiLo Core

A complete AI desktop assistant with chat, web search, speech synthesis, and OCR.

✨ Features

💬 Smart chat with local GGUF models
🌐 Integrated web search for up-to-date information
🔊 Text-to-Speech (TTS) and Speech Recognition (STT)
📷 OCR to extract text from images
💾 Advanced session management
🎛️ Supports any LLM model size
🔌 OpenAI-compatible API server
📤 Export in JSON, TXT, and Markdown
🌐 Integrated distribuited computing

📦 Download

🚀 Quick Start Guide

First Launch

Load a model → 🤖 Model → 📁 Load Model
Start chatting → type in the box below and press Enter
Sessions are saved automatically

🔍 Web Search

Enable/disable using the 🌐 Web Search toggle
Automatically searches for news, recent info, or local data
Displays the sources used

🔊 Speech Synthesis (TTS)

Enable via 🔊 TTS in the sidebar
The assistant reads responses aloud
Use 🔇 STOP to interrupt

🎤 Speech Recognition

🎤 Voice Input for single input
🎤 Start Listening for continuous mode

📷 OCR from Images

Click 📷 Image OCR
Select an image (PNG, JPG, etc.)
Extracted text is automatically inserted into the chat

🛠️ Troubleshooting

❌ “Model not found”

Make sure the GGUF file is in the /models folder
Verify the file format is .gguf
Check that you have enough disk space

❌ “Tesseract not found”

Install Tesseract OCR following the instructions below
Restart the application after installation

⚙️ Configuration Memory Optimization Memory Mapping (MMAP) What it does: Maps model directly from disk instead of loading entirely into RAM

Benefits: Reduces RAM usage by up to 70%, faster startup

Use when: Limited RAM, large models (>7GB)

Performance: Slightly slower inference, much less RAM usage

Memory Locking (MLOCK) What it does: Locks model in RAM preventing swap to disk

Benefits: Maximum performance, consistent response times

Use when: Abundant RAM, performance-critical applications

Performance: Fastest inference, permanent RAM occupation

⚙️ System Requirements

Minimum

OS: Windows 10/11, macOS 10.15+, Linux (Ubuntu 18.04+)
RAM: 8 GB (16 GB recommended)
Disk Space: 2 GB + space for models
CPU: Modern 64-bit processor

🔧 Installation

1. Install Tesseract OCR (Required for OCR)

Windows

# Using Chocolatey (recommended)
choco install tesseract

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

xxrickyxx
/

AiLo_UnlimitedLLama