Documentation - MoA Chat

What is MoA Chat?

MoA (Model of Agents) is a method that lets multiple AI agents (different LLMs) collaborate to generate a higher-quality response than a single model could.

MoA Chat implements a simple version of this architecture by:

Querying several different models (LLM-A, LLM-B, LLM-C) at once.
Combining their answers using another model (LLM-D, the aggregator).
Delivering a smart, single, structured reply to the user.

How MoA Works (Visual)

MoA Architecture

Source: Together MoA Architecture Concept

How to Use

Click ⚙️ to open the configuration panel.
Select your preferred models for LLM-A, LLM-B, LLM-C, and Aggregator (LLM-D).
Type your message in the input box.
Press Send.
Watch multiple models collaborate for the best response!

Features

Parallel querying of multiple free or premium models (via OpenRouter, Together, Grok, etc.).
Structured prompts for each model to encourage quality responses.
Aggregator model intelligently fuses outputs into one reply.
Dynamic Light/Dark mode (Gruvbox Material Theme).
Minimal, fast, and secure frontend with no API keys exposed.

Deployment

MoA Chat is optimized to run on Hugging Face Spaces or any platform that supports Python 3.11+, Flask, and Docker-based containers.

Requires setting your API keys via Hugging Face's Secrets system (never expose them to the frontend).

File Structure

app.py — Flask backend server.
llm/agents.py — Query and aggregation logic for MoA system.
llm/model_config.json — Define available models and providers.
templates/ — Contains index.html and docs.html.
static/ — Contains style.css and script.js.

Credits

Made with ❤️ in Panamá by Until Dot. Inspired by Together's MoA architecture.