garywelz's picture
Upload 2 files
7921de6 verified
metadata
title: The Programming Framework
emoji: πŸ› οΈ
colorFrom: yellow
colorTo: red
sdk: static
pinned: true
license: mit

πŸ› οΈ The Programming Framework

A Universal Method for Process Analysis

Summary

The Programming Framework is a universal meta-tool for analyzing complex processes across any discipline by combining Large Language Models (LLMs) with visual flowchart representation. The Framework transforms textual process descriptions into structured, interactive Mermaid flowcharts stored as JSON, enabling systematic analysis, visualization, and integration with knowledge systems.

Successfully demonstrated through GLMP (Genome Logic Modeling Project) with 50+ biological processes, and applied across Chemistry, Mathematics, Physics, and Computer Science. The Framework serves as the foundational methodology for the CopernicusAI Knowledge Engine, enabling domain-specific process visualization and analysis.

πŸ“š Prior Work & Research Contributions

Overview

The Programming Framework represents prior work that demonstrates a novel methodology for analyzing complex processes by combining Large Language Models (LLMs) with visual flowchart representation. This research establishes a universal, domain-agnostic approach to process analysis that transforms textual descriptions into structured, interactive visualizations.

πŸ”¬ Research Contributions

  • Universal Process Analysis: Domain-agnostic methodology applicable across biology, chemistry, software engineering, business processes, and more
  • LLM-Powered Extraction: Automated extraction of process steps, decision points, and logic flows using Google Gemini 2.0 Flash
  • Structured Visualization: Mermaid.js-based flowchart generation encoded as JSON for programmatic access and integration
  • Iterative Refinement: Systematic approach enabling continuous improvement through visualization and LLM-assisted refinement

βš™οΈ Technical Achievements

  • Meta-Tool Architecture: Framework for creating specialized process analysis tools (demonstrated by GLMP)
  • JSON-Based Storage: Structured data format enabling version control, cross-referencing, and API integration
  • Multi-Domain Application: Successfully applied to biological processes (GLMP), with extensions planned for software, business, and engineering domains
  • Integration Framework: Designed for integration with knowledge engines, research databases, and collaborative platforms

🎯 Position Within CopernicusAI Knowledge Engine

The Programming Framework serves as the foundational meta-tool of the CopernicusAI Knowledge Engine, providing the underlying methodology that enables specialized applications:

  • GLMP (Genome Logic Modeling Project) - First specialized application demonstrating biological process visualization
  • CopernicusAI - Main knowledge engine integrating Framework outputs with AI podcasts and research synthesis
  • Research Tools Dashboard (βœ… Implemented December 2025) - Fully operational web interface with knowledge graph visualization, vector search, RAG queries, and content browsing. Processes from Chemistry, Physics, Mathematics, and Computer Science are accessible through the unified dashboard. Live at: https://copernicus-frontend-phzp4ie2sq-uc.a.run.app/knowledge-engine
  • Public Project Interface (βœ… Implemented January 2025) - Comprehensive public-facing page providing access to all CopernicusAI Knowledge Engine components. Live at: https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/copernicusai-public-reviewer.html
  • Research Papers Metadata Database - Integration for linking processes to source literature (12,000+ papers indexed)
  • Science Video Database - Potential integration for multi-modal process explanations

This work establishes a proof-of-concept for AI-assisted process analysis, demonstrating how LLMs can systematically extract and visualize complex logic from textual sources across diverse domains. The Knowledge Engine now provides a unified interface for exploring processes alongside research papers, podcasts, and other content types.

🎯 Overview

The Programming Framework is a meta-toolβ€”a tool for creating tools. It provides a systematic method for analyzing any complex process by combining the analytical power of Large Language Models with the clarity of visual flowcharts.

πŸ’‘ The Core Idea

Problem: Complex processes are difficult to understand because they involve many steps, decision points, and interactions. Traditional text descriptions are hard to follow.

Solution: Use LLMs to extract process logic from literature, then encode it as Mermaid flowcharts stored in JSON. Result: Clear, interactive visualizations that reveal hidden patterns and enable systematic analysis.

βš™οΈ How It Works

  1. Input Process - Provide scientific papers, documentation, or process descriptions
  2. LLM Analysis - AI extracts steps, decisions, branches, and logic flow
  3. Generate Flowchart - Create Mermaid diagram encoded as JSON structure
  4. Visualize & Iterate - Interactive flowchart reveals insights and enables refinement

🌍 Core Principles

Domain Agnostic

Works across any field: biology, chemistry, software engineering, business processes, legal workflows, manufacturing, and beyond.

Iterative Refinement

Start with rough analysis, visualize, identify gaps, refine with LLM, repeat until the process logic is crystal clear.

Structured Data

JSON storage enables programmatic access, version control, cross-referencing, and integration with other tools and databases.

πŸš€ Applications

🧬 GLMP - Genome Logic Modeling (Live)

First specialized application: visualizing biochemical processes like DNA replication, metabolic pathways, and cell signaling.

πŸ“š Process Diagram Collections

The Programming Framework has been applied across multiple scientific disciplines. Explore interactive flowchart collections organized by domain:

Process Database Statistics (As of January 2025)

Discipline Processes Subcategories Status Database Table
Biology 52 8 βœ… Complete View Database
Chemistry 91 14 βœ… Complete View Database
Physics 21 7 βœ… Complete View Database
Computer Science 21 7 βœ… Complete View Database
Mathematics 20 7 βœ… Complete View Database
GLMP (Molecular Biology) 108 10+ βœ… Complete View Database
Total 313 53+ βœ… Operational All databases publicly accessible

Note: All processes include Mermaid flowcharts, source citations, and comprehensive metadata. See individual database tables for detailed statistics, complexity metrics, and process details. Statistics are dynamically updated - see Public Project Interface for current counts.

🧬 Biology

  • Biology Processes Database - Interactive database with 52 higher-level organismal processes across 8 categories (reproduction, development, behavior, defense, nutrition, sensory, transport, coordination)
  • GLMP Database Table - Genome Logic Modeling Project: Biochemical/molecular processes database (108 processes)
  • Note: Biology Processes Database focuses on organismal, developmental, behavioral, and ecological processes. GLMP focuses on molecular-level biochemical processes. Together they provide comprehensive biological process coverage.

βš—οΈ Chemistry

πŸ”’ Mathematics

βš›οΈ Physics

πŸ’» Computer Science

⚠️ Limitations & Future Directions

Current Limitations

  • Process Validation: Flowcharts are LLM-generated and benefit from expert validation for domain-specific accuracy (validation process ongoing)
  • Source Linking: Not all processes yet linked to specific research papers (work in progress per Quality Standards)
  • Scale: Current database (313 processes) represents proof-of-concept; target is 1,000+ processes
  • Domain Coverage: Some disciplines better represented than others; actively expanding coverage
  • LLM Dependency: Framework requires LLM access (Google Gemini 2.0 Flash); alternative models may produce different results
  • Complexity Limits: Very complex processes (>100 nodes) may require manual refinement

Future Work

  • Expansion: Scale to 1,000+ processes across all disciplines (see DISCIPLINE_DATABASES_PLAN.md)
  • Validation: Implement systematic peer review process for process flowcharts
  • Source Integration: Enhanced linking to research papers using vector search from 23,246+ indexed papers
  • Automation: Automated source paper suggestion and linking
  • Quality Assurance: Systematic validation framework for flowchart accuracy
  • Multi-LLM Support: Extend to support multiple LLM providers for comparison and validation
  • Interactive Refinement: User interface for iterative flowchart improvement

Known Areas for Improvement

  • Accuracy Validation: Not all flowcharts yet validated by domain experts; systematic validation in progress
  • Source Citations: Some processes need additional source paper citations (work in progress)
  • Cross-Discipline Links: Enhanced cross-referencing between related processes across disciplines

πŸ”§ Technical Architecture

LLM Integration

  • Primary Model: Google Gemini 2.0 Flash for process analysis
  • Deployment: Vertex AI for enterprise-scale deployment
  • Prompt Engineering: Custom prompts optimized for process extraction and structured output
  • Output Format: Structured JSON with Mermaid flowchart syntax
  • Version: Framework tested with Gemini 2.0 Flash; compatible with other LLMs

Visualization Stack

  • Rendering Engine: Mermaid.js for flowchart visualization
  • Data Validation: JSON schema for data validation and consistency
  • Output Formats: Interactive SVG output with export to PNG/PDF supported
  • Color Schemes: Discipline-based color coding following Programming Framework standards

Data Storage

  • Primary Storage: Google Cloud Storage for JSON process files
  • Metadata Indexing: Firestore for metadata indexing and search
  • Version Control: Git for code and documentation versioning
  • Cross-Referencing: Integration with research papers database (23,246+ papers indexed)

Integration Points

  • GLMP: Specialized biological process collections
  • CopernicusAI: Knowledge graph integration for unified exploration
  • Research Papers Database: Cross-linking with 23,246+ indexed papers
  • API Endpoints: Programmatic access for integration with other systems
  • Research Tools Dashboard: Unified interface for exploring processes alongside papers and other content

How to Cite This Work

BibTeX Format

@article{welz2025programming,
  title={The Programming Framework: A General Method for Process Analysis Using LLMs and Mermaid Visualization},
  author={Welz, Gary},
  journal={Nature Communications},
  year={2025},
  note={Submitted},
  url={https://huggingface.co/spaces/garywelz/programming_framework},
  note={Preprint available upon publication}
}

Standard Citation Format

Welz, G. (2024–2025). The Programming Framework: A Universal Method for Process Analysis. Hugging Face Spaces. https://huggingface.co/spaces/garywelz/programming_framework

Welz, G. (2024). From Inspiration to AI: Biology as Visual Programming. Medium. https://medium.com/@garywelz_47126/from-inspiration-to-ai-biology-as-visual-programming-520ee523029a

Note: When published, this citation will be updated with DOI and publication details from Nature Communications.

This project serves as a foundational meta-tool for AI-assisted process analysis, enabling systematic extraction and visualization of complex logic from textual sources across diverse scientific and technical domains.

The Programming Framework is designed as infrastructure for AI-assisted science, providing a universal methodology that can be specialized for domain-specific applications.

πŸ“Š Data Availability

Research Data:

Source Code & Methodology:

  • Methodology: Fully documented in this README and the Programming Framework paper (submitted to Nature Communications).
  • Process Generation: LLM-powered extraction using Google Gemini 2.0 Flash via Vertex AI, with custom prompts for process extraction and structured JSON output formatting.
  • Visualization: Mermaid.js-based flowchart generation with JSON schema for data validation.
  • Data Format: Standardized JSON structure documented in project files (see Technical Architecture section).
  • Database Schemas: Process database schemas and metadata structures documented in project documentation.

Access:

  • Public Access: All process databases and database tables are publicly accessible (no authentication required).
  • Individual Process Viewers: Each process has a dedicated viewer accessible via links in database tables.
  • Research Tools Dashboard: Processes are integrated into the Research Tools Dashboard for unified exploration alongside research papers and other content.
  • Hugging Face Spaces: Framework documentation and examples available at Programming Framework Space.

Reproducibility:

  • All process flowcharts include source citations linking to research papers used to create each flowchart.
  • Methodology is fully documented and can be replicated using Google Gemini 2.0 Flash or compatible LLMs.
  • JSON schema and data structures are standardized and documented.
  • Process generation workflow is transparent: input (textual process description) β†’ LLM analysis β†’ Mermaid flowchart generation β†’ JSON storage.
  • All components are publicly accessible for verification, reuse, and extension to other domains.

Process Database Statistics:

  • Total Processes: 313+ validated processes across 6 databases
  • Disciplines Covered: Biology, Chemistry, Physics, Mathematics, Computer Science, Molecular Biology (GLMP)
  • Validation: 100% syntax accuracy, β‰₯85% metadata quality, all processes include source citations
  • Format: All processes stored as JSON files with Mermaid flowchart syntax, publicly accessible via Google Cloud Storage

πŸ”— Related Projects

🧬 GLMP - Genome Logic Modeling

First specialized application of the Programming Framework to biochemical processes. 100+ biological pathways visualized.

πŸ”¬ CopernicusAI

Knowledge engine integrating the Programming Framework with AI podcasts, research papers, and knowledge graph for scientific discovery.

🎨 Interactive Demo

The space includes interactive examples showing the framework applied to:

  • Scientific Method
  • Software Deployment Pipeline
  • Customer Support Workflow
  • Research Paper Publication

Each example demonstrates how LLMs extract process logic and encode it as visual flowcharts.

πŸ’» Technology Stack

  • LLM: Google Gemini 2.0 Flash, Vertex AI
  • Visualization: Mermaid.js
  • Storage: Google Cloud Storage, Firestore
  • Format: JSON with Mermaid syntax
  • Frontend: Static HTML + Tailwind CSS

🌟 Vision

As AI systems become more capable of understanding complex processes, the Programming Framework provides the bridge between human comprehension and machine analysis. It's a tool for truth-seekingβ€”transforming complexity into clarity.


A Universal Method for Process Analysis

Β© 2025 Gary Welz. All rights reserved.