@Entz on Hugging Face: "London Property Market Analyst — GPT + HM Land Registry + Gradio (full…"

User query (natural language) ↓ Triage Agent (GPT) — classifies intentions. — extracts structured params: district, property type, new/old build, time window, metric (median/mean/count) ↓ Analytics Engine (pure Python + Pandas) — queries pre-aggregated Parquet files (not raw CSV) — returns structured JSON ↓ Synthesis Agent (GPT) — receives structured JSON, writes prose analysis — hard rules prevent hallucination of years, ranges, stats ↓ Chart Agent (Matplotlib) — 10 chart types: line, multi-line, stacked bar, h-bar, diverging bar, band trend, growth ranking, table — returned as base64 PNG → gr.Image on frontend

Key design decisions

Pre-aggregated Parquets, not raw CSV
Raw CSV is 4–5GB. Instead: aggregated data (~smaller than 100MB total) covering all dimension combos × 16 years. Query time: <50ms. Generated by a Python script that runs daily and automatically ETL — the same pipeline I built for the earlier PDF report system.

Triage before synthesis
GPT smaller model (fast, cheap) handles intent classification and parameter extraction. Then another GPT runs on the final synthesis step with clean structured data. Cost: ~$0.01–0.02/query.

Hard rules in synthesis prompt, not soft suggestions
LLMs reliably ignore "try not to..." instructions when they think context justifies it. For data accuracy I use hard "NEVER cite X unless user's question contains word Y" phrasing. Confirmed more reliable via A/B testing on edge cases.

Public/private Space split
Private backend (data, agents, pipeline) + public frontend (thin Gradio UI, gradio_client, supporting both UI and API communication). Frontend only has Pillow in requirements.txt.

Partial-year handling
Data includes 2026 (partial year). Charts always show it; This was one of hardest one to crack surprisingly.

Automated data pipeline
Scheduled notebook monitor daily whether the raw gov.uk dataset has been updated → if so, regenerates Parquets → uploads to data server → Space detects git commit and auto-restarts. Zero manual effort once deployed.

Join the conversation