Spaces:
Running
title: The Programming Framework
emoji: π οΈ
colorFrom: yellow
colorTo: red
sdk: static
pinned: true
license: mit
π οΈ The Programming Framework
A Universal Method for Process Analysis
Summary
The Programming Framework is a universal meta-tool for analyzing complex processes across any discipline by combining Large Language Models (LLMs) with visual flowchart representation. The Framework transforms textual process descriptions into structured, interactive Mermaid flowcharts stored as JSON, enabling systematic analysis, visualization, and integration with knowledge systems.
Successfully demonstrated through GLMP (Genome Logic Modeling Project) with 50+ biological processes, and applied across Chemistry, Mathematics, Physics, and Computer Science. The Framework serves as the foundational methodology for the CopernicusAI Knowledge Engine, enabling domain-specific process visualization and analysis.
π Prior Work & Research Contributions
Overview
The Programming Framework represents prior work that demonstrates a novel methodology for analyzing complex processes by combining Large Language Models (LLMs) with visual flowchart representation. This research establishes a universal, domain-agnostic approach to process analysis that transforms textual descriptions into structured, interactive visualizations.
π¬ Research Contributions
- Universal Process Analysis: Domain-agnostic methodology applicable across biology, chemistry, software engineering, business processes, and more
- LLM-Powered Extraction: Automated extraction of process steps, decision points, and logic flows using Google Gemini 2.0 Flash
- Structured Visualization: Mermaid.js-based flowchart generation encoded as JSON for programmatic access and integration
- Iterative Refinement: Systematic approach enabling continuous improvement through visualization and LLM-assisted refinement
βοΈ Technical Achievements
- Meta-Tool Architecture: Framework for creating specialized process analysis tools (demonstrated by GLMP)
- JSON-Based Storage: Structured data format enabling version control, cross-referencing, and API integration
- Multi-Domain Application: Successfully applied to biological processes (GLMP), with extensions planned for software, business, and engineering domains
- Integration Framework: Designed for integration with knowledge engines, research databases, and collaborative platforms
π― Position Within CopernicusAI Knowledge Engine
The Programming Framework serves as the foundational meta-tool of the CopernicusAI Knowledge Engine, providing the underlying methodology that enables specialized applications:
- GLMP (Genome Logic Modeling Project) - First specialized application demonstrating biological process visualization
- CopernicusAI - Main knowledge engine integrating Framework outputs with AI podcasts and research synthesis
- Research Tools Dashboard (β Implemented December 2025) - Fully operational web interface with knowledge graph visualization, vector search, RAG queries, and content browsing. Processes from Chemistry, Physics, Mathematics, and Computer Science are accessible through the unified dashboard. Live at: https://copernicus-frontend-phzp4ie2sq-uc.a.run.app/knowledge-engine
- Public Project Interface (β Implemented January 2025) - Comprehensive public-facing page providing access to all CopernicusAI Knowledge Engine components. Live at: https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/copernicusai-public-reviewer.html
- Research Papers Metadata Database - Integration for linking processes to source literature (12,000+ papers indexed)
- Science Video Database - Potential integration for multi-modal process explanations
This work establishes a proof-of-concept for AI-assisted process analysis, demonstrating how LLMs can systematically extract and visualize complex logic from textual sources across diverse domains. The Knowledge Engine now provides a unified interface for exploring processes alongside research papers, podcasts, and other content types.
π― Overview
The Programming Framework is a meta-toolβa tool for creating tools. It provides a systematic method for analyzing any complex process by combining the analytical power of Large Language Models with the clarity of visual flowcharts.
π‘ The Core Idea
Problem: Complex processes are difficult to understand because they involve many steps, decision points, and interactions. Traditional text descriptions are hard to follow.
Solution: Use LLMs to extract process logic from literature, then encode it as Mermaid flowcharts stored in JSON. Result: Clear, interactive visualizations that reveal hidden patterns and enable systematic analysis.
βοΈ How It Works
- Input Process - Provide scientific papers, documentation, or process descriptions
- LLM Analysis - AI extracts steps, decisions, branches, and logic flow
- Generate Flowchart - Create Mermaid diagram encoded as JSON structure
- Visualize & Iterate - Interactive flowchart reveals insights and enables refinement
π Core Principles
Domain Agnostic
Works across any field: biology, chemistry, software engineering, business processes, legal workflows, manufacturing, and beyond.
Iterative Refinement
Start with rough analysis, visualize, identify gaps, refine with LLM, repeat until the process logic is crystal clear.
Structured Data
JSON storage enables programmatic access, version control, cross-referencing, and integration with other tools and databases.
π Applications
𧬠GLMP - Genome Logic Modeling (Live)
First specialized application: visualizing biochemical processes like DNA replication, metabolic pathways, and cell signaling.
π Process Diagram Collections
The Programming Framework has been applied across multiple scientific disciplines. Explore interactive flowchart collections organized by domain:
Process Database Statistics (As of January 2025)
| Discipline | Processes | Subcategories | Status | Database Table |
|---|---|---|---|---|
| Biology | 52 | 8 | β Complete | View Database |
| Chemistry | 91 | 14 | β Complete | View Database |
| Physics | 21 | 7 | β Complete | View Database |
| Computer Science | 21 | 7 | β Complete | View Database |
| Mathematics | 20 | 7 | β Complete | View Database |
| GLMP (Molecular Biology) | 108 | 10+ | β Complete | View Database |
| Total | 313 | 53+ | β Operational | All databases publicly accessible |
Note: All processes include Mermaid flowcharts, source citations, and comprehensive metadata. See individual database tables for detailed statistics, complexity metrics, and process details. Statistics are dynamically updated - see Public Project Interface for current counts.
𧬠Biology
- Biology Processes Database - Interactive database with 52 higher-level organismal processes across 8 categories (reproduction, development, behavior, defense, nutrition, sensory, transport, coordination)
- GLMP Database Table - Genome Logic Modeling Project: Biochemical/molecular processes database (108 processes)
- Note: Biology Processes Database focuses on organismal, developmental, behavioral, and ecological processes. GLMP focuses on molecular-level biochemical processes. Together they provide comprehensive biological process coverage.
βοΈ Chemistry
- Chemistry Database Table - Interactive database with 91 processes across 14 subcategories
π’ Mathematics
- Mathematics Database Table - Interactive database with 20 processes across 7 subcategories
βοΈ Physics
- Physics Database Table - Interactive database with 21 processes across 7 subcategories
π» Computer Science
- Computer Science Database Table - Interactive database with 21 processes across 7 subcategories
β οΈ Limitations & Future Directions
Current Limitations
- Process Validation: Flowcharts are LLM-generated and benefit from expert validation for domain-specific accuracy (validation process ongoing)
- Source Linking: Not all processes yet linked to specific research papers (work in progress per Quality Standards)
- Scale: Current database (313 processes) represents proof-of-concept; target is 1,000+ processes
- Domain Coverage: Some disciplines better represented than others; actively expanding coverage
- LLM Dependency: Framework requires LLM access (Google Gemini 2.0 Flash); alternative models may produce different results
- Complexity Limits: Very complex processes (>100 nodes) may require manual refinement
Future Work
- Expansion: Scale to 1,000+ processes across all disciplines (see DISCIPLINE_DATABASES_PLAN.md)
- Validation: Implement systematic peer review process for process flowcharts
- Source Integration: Enhanced linking to research papers using vector search from 23,246+ indexed papers
- Automation: Automated source paper suggestion and linking
- Quality Assurance: Systematic validation framework for flowchart accuracy
- Multi-LLM Support: Extend to support multiple LLM providers for comparison and validation
- Interactive Refinement: User interface for iterative flowchart improvement
Known Areas for Improvement
- Accuracy Validation: Not all flowcharts yet validated by domain experts; systematic validation in progress
- Source Citations: Some processes need additional source paper citations (work in progress)
- Cross-Discipline Links: Enhanced cross-referencing between related processes across disciplines
π§ Technical Architecture
LLM Integration
- Primary Model: Google Gemini 2.0 Flash for process analysis
- Deployment: Vertex AI for enterprise-scale deployment
- Prompt Engineering: Custom prompts optimized for process extraction and structured output
- Output Format: Structured JSON with Mermaid flowchart syntax
- Version: Framework tested with Gemini 2.0 Flash; compatible with other LLMs
Visualization Stack
- Rendering Engine: Mermaid.js for flowchart visualization
- Data Validation: JSON schema for data validation and consistency
- Output Formats: Interactive SVG output with export to PNG/PDF supported
- Color Schemes: Discipline-based color coding following Programming Framework standards
Data Storage
- Primary Storage: Google Cloud Storage for JSON process files
- Metadata Indexing: Firestore for metadata indexing and search
- Version Control: Git for code and documentation versioning
- Cross-Referencing: Integration with research papers database (23,246+ papers indexed)
Integration Points
- GLMP: Specialized biological process collections
- CopernicusAI: Knowledge graph integration for unified exploration
- Research Papers Database: Cross-linking with 23,246+ indexed papers
- API Endpoints: Programmatic access for integration with other systems
- Research Tools Dashboard: Unified interface for exploring processes alongside papers and other content
How to Cite This Work
BibTeX Format
@article{welz2025programming,
title={The Programming Framework: A General Method for Process Analysis Using LLMs and Mermaid Visualization},
author={Welz, Gary},
journal={Nature Communications},
year={2025},
note={Submitted},
url={https://huggingface.co/spaces/garywelz/programming_framework},
note={Preprint available upon publication}
}
Standard Citation Format
Welz, G. (2024β2025). The Programming Framework: A Universal Method for Process Analysis. Hugging Face Spaces. https://huggingface.co/spaces/garywelz/programming_framework
Welz, G. (2024). From Inspiration to AI: Biology as Visual Programming. Medium. https://medium.com/@garywelz_47126/from-inspiration-to-ai-biology-as-visual-programming-520ee523029a
Note: When published, this citation will be updated with DOI and publication details from Nature Communications.
This project serves as a foundational meta-tool for AI-assisted process analysis, enabling systematic extraction and visualization of complex logic from textual sources across diverse scientific and technical domains.
The Programming Framework is designed as infrastructure for AI-assisted science, providing a universal methodology that can be specialized for domain-specific applications.
π Data Availability
Research Data:
- Process Flowcharts: All process flowcharts are publicly available in Google Cloud Storage with interactive database tables:
- Biology Processes Database - 52 processes across 8 subcategories
- Chemistry Processes Database - 91 processes across 14 subcategories
- Physics Processes Database - 21 processes across 7 subcategories
- Mathematics Processes Database - 20 processes across 7 subcategories
- Computer Science Processes Database - 21 processes across 7 subcategories
- GLMP Database - 108+ molecular biology processes
- Process Metadata: Each process includes JSON metadata with Mermaid flowchart syntax, source citations, complexity metrics, and related process links.
- Current Statistics: Dynamically updated statistics available at Public Project Interface.
Source Code & Methodology:
- Methodology: Fully documented in this README and the Programming Framework paper (submitted to Nature Communications).
- Process Generation: LLM-powered extraction using Google Gemini 2.0 Flash via Vertex AI, with custom prompts for process extraction and structured JSON output formatting.
- Visualization: Mermaid.js-based flowchart generation with JSON schema for data validation.
- Data Format: Standardized JSON structure documented in project files (see Technical Architecture section).
- Database Schemas: Process database schemas and metadata structures documented in project documentation.
Access:
- Public Access: All process databases and database tables are publicly accessible (no authentication required).
- Individual Process Viewers: Each process has a dedicated viewer accessible via links in database tables.
- Research Tools Dashboard: Processes are integrated into the Research Tools Dashboard for unified exploration alongside research papers and other content.
- Hugging Face Spaces: Framework documentation and examples available at Programming Framework Space.
Reproducibility:
- All process flowcharts include source citations linking to research papers used to create each flowchart.
- Methodology is fully documented and can be replicated using Google Gemini 2.0 Flash or compatible LLMs.
- JSON schema and data structures are standardized and documented.
- Process generation workflow is transparent: input (textual process description) β LLM analysis β Mermaid flowchart generation β JSON storage.
- All components are publicly accessible for verification, reuse, and extension to other domains.
Process Database Statistics:
- Total Processes: 313+ validated processes across 6 databases
- Disciplines Covered: Biology, Chemistry, Physics, Mathematics, Computer Science, Molecular Biology (GLMP)
- Validation: 100% syntax accuracy, β₯85% metadata quality, all processes include source citations
- Format: All processes stored as JSON files with Mermaid flowchart syntax, publicly accessible via Google Cloud Storage
π Related Projects
𧬠GLMP - Genome Logic Modeling
First specialized application of the Programming Framework to biochemical processes. 100+ biological pathways visualized.
π¬ CopernicusAI
Knowledge engine integrating the Programming Framework with AI podcasts, research papers, and knowledge graph for scientific discovery.
π¨ Interactive Demo
The space includes interactive examples showing the framework applied to:
- Scientific Method
- Software Deployment Pipeline
- Customer Support Workflow
- Research Paper Publication
Each example demonstrates how LLMs extract process logic and encode it as visual flowcharts.
π» Technology Stack
- LLM: Google Gemini 2.0 Flash, Vertex AI
- Visualization: Mermaid.js
- Storage: Google Cloud Storage, Firestore
- Format: JSON with Mermaid syntax
- Frontend: Static HTML + Tailwind CSS
π Vision
As AI systems become more capable of understanding complex processes, the Programming Framework provides the bridge between human comprehension and machine analysis. It's a tool for truth-seekingβtransforming complexity into clarity.
A Universal Method for Process Analysis
Β© 2025 Gary Welz. All rights reserved.