Lorentz Yeung PRO

Entz
ยท

AI & ML interests

None yet

Recent Activity

upvoted a changelog about 24 hours ago
Protected Spaces with Public URLs
repliedto their post 1 day ago
London Property Market Analyst โ€” GPT + HM Land Registry + Gradio (full pipeline breakdown) --- Built a production RAG-style chatbot over 32 million rows of official UK property transaction data. Sharing the architecture in case it's useful for others building structured-data Q&A systems. This is the latest step in a project that started in 2021 as a basic Python analyst script, grew into automatic PDF report generation, then became a fully automatic daily pipeline โ€” and now a conversational AI anyone can query. **Live demo:** https://uk-property-app.entzai.com/ **My Space:** https://huggingface.co/spaces/Entz/uk-property-app --- ### The problem with LLMs over tabular data The naive approach โ€” dump your CSV into the context window โ€” breaks down fast at scale. The raw HM Land Registry file is a 4โ€“5GB CSV covering 32 million transactions across England & Wales. I filtered it to ~1.76M London transactions (2010โ€“2026), but even that doesn't fit in any context window, and even if it did, asking GPT to do GROUP BY in its head is asking for hallucinations. The solution: **a structured analytics layer between the LLM and the data.** --- ### Architecture ``` User query (natural language) โ†“ Triage Agent (GPT) โ€” classifies intentions. โ€” extracts structured params: district, property type, new/old build, time window, metric (median/mean/count) โ†“ Analytics Engine (pure Python + Pandas) โ€” queries pre-aggregated Parquet files (not raw CSV) โ€” returns structured JSON โ†“ Synthesis Agent (GPT) โ€” receives structured JSON, writes prose analysis โ€” hard rules prevent hallucination of years, ranges, stats โ†“ Chart Agent (Matplotlib) โ€” 10 chart types: line, multi-line, stacked bar, h-bar, diverging bar, band trend, growth ranking, table โ€” returned as base64 PNG โ†’ gr.Image on frontend ```
posted an update 1 day ago
London Property Market Analyst โ€” GPT + HM Land Registry + Gradio (full pipeline breakdown) --- Built a production RAG-style chatbot over 32 million rows of official UK property transaction data. Sharing the architecture in case it's useful for others building structured-data Q&A systems. This is the latest step in a project that started in 2021 as a basic Python analyst script, grew into automatic PDF report generation, then became a fully automatic daily pipeline โ€” and now a conversational AI anyone can query. **Live demo:** https://uk-property-app.entzai.com/ **My Space:** https://huggingface.co/spaces/Entz/uk-property-app --- ### The problem with LLMs over tabular data The naive approach โ€” dump your CSV into the context window โ€” breaks down fast at scale. The raw HM Land Registry file is a 4โ€“5GB CSV covering 32 million transactions across England & Wales. I filtered it to ~1.76M London transactions (2010โ€“2026), but even that doesn't fit in any context window, and even if it did, asking GPT to do GROUP BY in its head is asking for hallucinations. The solution: **a structured analytics layer between the LLM and the data.** --- ### Architecture ``` User query (natural language) โ†“ Triage Agent (GPT) โ€” classifies intentions. โ€” extracts structured params: district, property type, new/old build, time window, metric (median/mean/count) โ†“ Analytics Engine (pure Python + Pandas) โ€” queries pre-aggregated Parquet files (not raw CSV) โ€” returns structured JSON โ†“ Synthesis Agent (GPT) โ€” receives structured JSON, writes prose analysis โ€” hard rules prevent hallucination of years, ranges, stats โ†“ Chart Agent (Matplotlib) โ€” 10 chart types: line, multi-line, stacked bar, h-bar, diverging bar, band trend, growth ranking, table โ€” returned as base64 PNG โ†’ gr.Image on frontend ```
View all activity

Organizations

None yet