gabrielaltay commited on
Commit
e812ccd
·
1 Parent(s): b51f751
.dockerignore ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .streamlit/
2
+
3
+ .git
4
+ .gitignore
5
+ README.md
6
+ DEV.md
7
+ __pycache__
8
+ *.pyc
9
+ *.pyo
10
+ *.pyd
11
+ .pytest_cache
12
+ .coverage
13
+ .env
14
+ .venv
15
+ *.log
.gitignore ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .streamlit/
2
+ .env
3
+ docker.env
4
+ local.env
5
+
6
+ __pycache__/
7
+ *.py[cod]
8
+ .venv/
9
+ *.log
10
+ .python-version
11
+ chromadb
12
+ chromadb/
Dockerfile CHANGED
@@ -1,20 +1,31 @@
1
- FROM python:3.13.5-slim
 
2
 
 
3
  WORKDIR /app
4
 
 
5
  RUN apt-get update && apt-get install -y \
6
  build-essential \
7
  curl \
8
  git \
9
  && rm -rf /var/lib/apt/lists/*
10
 
11
- COPY requirements.txt ./
 
 
 
 
12
  COPY src/ ./src/
13
 
14
- RUN pip3 install -r requirements.txt
 
15
 
 
16
  EXPOSE 8501
17
 
 
18
  HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
19
 
20
- ENTRYPOINT ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]
 
 
1
+ # Use Python 3.13 slim image
2
+ FROM python:3.13-slim
3
 
4
+ # Set working directory
5
  WORKDIR /app
6
 
7
+ # Install system dependencies
8
  RUN apt-get update && apt-get install -y \
9
  build-essential \
10
  curl \
11
  git \
12
  && rm -rf /var/lib/apt/lists/*
13
 
14
+ # Install uv
15
+ COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
16
+
17
+ # Copy project files
18
+ COPY pyproject.toml uv.lock ./
19
  COPY src/ ./src/
20
 
21
+ # Install dependencies
22
+ RUN uv sync --frozen
23
 
24
+ # Expose port (default 8501, can be overridden)
25
  EXPOSE 8501
26
 
27
+ # Health check
28
  HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
29
 
30
+ # Run the application
31
+ CMD ["sh", "-c", "uv run streamlit run src/legisqa_local/app.py --server.port=${PORT:-8501} --server.address=0.0.0.0"]
README.md CHANGED
@@ -1,20 +1,64 @@
1
  ---
2
- title: Legisqa Local
3
- emoji: 🚀
4
- colorFrom: red
5
  colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
  pinned: false
11
- short_description: Streamlit template space
12
  license: mit
13
  ---
14
 
15
- # Welcome to Streamlit!
16
 
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
18
 
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: LegisQA Local
3
+ emoji: 🏛️
4
+ colorFrom: blue
5
  colorTo: red
6
+ sdk: streamlit
7
+ sdk_version: 1.49.1
8
+ app_file: src/app.py
 
9
  pinned: false
 
10
  license: mit
11
  ---
12
 
13
+ # LegisQA Local
14
 
15
+ Query Congressional Bills with AI using local ChromaDB vector search.
16
 
17
+ ## Features
18
+
19
+ - 🏛️ Query US Congressional legislation from sessions 113-119
20
+ - 🔍 Semantic search powered by ChromaDB and google/embeddinggemma-300m embeddings
21
+ - 🤖 Multiple AI providers: OpenAI, Anthropic, Together AI, Google
22
+ - 📊 Side-by-side comparison of different models
23
+ - 🏠 Runs completely locally - no external vector database needed
24
+
25
+ ## Setup
26
+
27
+ The app will automatically download and set up the vector database on first run. This includes:
28
+
29
+ 1. Downloading the HuggingFace dataset `hyperdemocracy/usc-vecs-s8192-o512-google-embeddinggemma-300m`
30
+ 2. Loading Congress 119 data (first 200 documents) into local ChromaDB
31
+ 3. Setting up the vector search index
32
+
33
+ For local development, you can also run:
34
+
35
+ ```bash
36
+ # Install dependencies
37
+ uv sync
38
+
39
+ # Load test data
40
+ uv run python load_chromadb.py
41
+
42
+ # Run the app
43
+ uv run streamlit run src/app.py
44
+ ```
45
+
46
+ ## Dataset
47
+
48
+ Uses the HuggingFace dataset containing US Congressional legislation with pre-computed embeddings:
49
+ - **Source**: `hyperdemocracy/usc-vecs-s8192-o512-google-embeddinggemma-300m`
50
+ - **Embeddings**: google/embeddinggemma-300m (768 dimensions)
51
+ - **Coverage**: Congress sessions 113-119 (2013-2025)
52
+ - **Documents**: ~233K total (test mode uses 200 from Congress 119)
53
+
54
+ ## Architecture
55
+
56
+ - **Frontend**: Streamlit
57
+ - **Vector Store**: ChromaDB (local)
58
+ - **Embeddings**: HuggingFace Transformers (google/embeddinggemma-300m)
59
+ - **LLMs**: Multiple providers via LangChain
60
+ - **Data**: HuggingFace Datasets
61
+
62
+ ## Migration from Pinecone
63
+
64
+ This app was migrated from using Pinecone to local ChromaDB. See `MIGRATION.md` for details.
env.example ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Example environment variables for Docker deployment
2
+ # Copy this to .env and fill in your actual values
3
+
4
+ # LangChain configuration
5
+ LANGCHAIN_API_KEY=your_langchain_api_key_here
6
+ LANGCHAIN_PROJECT=legisqa-local
7
+
8
+ # LLM Provider API Keys
9
+ OPENAI_API_KEY=your_openai_api_key_here
10
+ ANTHROPIC_API_KEY=your_anthropic_api_key_here
11
+ TOGETHER_API_KEY=your_together_api_key_here
12
+ GOOGLE_API_KEY=your_google_api_key_here
13
+
14
+ # Port configuration (optional, defaults to 8501 for local, 8505 for Docker)
15
+ PORT=8505
16
+
17
+ # ChromaDB configuration
18
+ # For local development: use ./chromadb (relative to project root)
19
+ # For Docker: use /app/chroma_data (container path, will be mounted as volume)
20
+ CHROMA_PERSIST_DIRECTORY=./chromadb
21
+ CHROMA_COLLECTION_NAME=usc
pyproject.toml ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "legisqa-local"
3
+ version = "0.1.0"
4
+ description = "Congressional Legislation Query and Analysis Tool"
5
+ readme = "README.md"
6
+ requires-python = ">=3.13"
7
+ dependencies = [
8
+ "chromadb>=1.1.0",
9
+ "datasets>=3.0.0",
10
+ "langchain>=0.3.27",
11
+ "langchain-anthropic>=0.3.19",
12
+ "langchain-chroma>=0.1.4",
13
+ "langchain-community>=0.3.29",
14
+ "langchain-core>=0.3.75",
15
+ "langchain-google-genai>=2.1.10",
16
+ "langchain-huggingface>=0.3.1",
17
+ "langchain-openai>=0.3.32",
18
+ "langchain-together>=0.3.1",
19
+ "sentence-transformers>=5.1.0",
20
+ "streamlit>=1.49.1",
21
+ "tqdm>=4.66.0",
22
+ ]
23
+
24
+ [build-system]
25
+ requires = ["setuptools>=61.0", "wheel"]
26
+ build-backend = "setuptools.build_meta"
27
+
28
+ [tool.setuptools.packages.find]
29
+ where = ["src"]
30
+ include = ["legisqa_local*"]
requirements.txt DELETED
@@ -1,3 +0,0 @@
1
- altair
2
- pandas
3
- streamlit
 
 
 
 
run-docker.sh ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Script to run LegisQA with Docker, mounting the ChromaDB volume
4
+
5
+ # Port configuration (default to 8505 to avoid conflicts)
6
+ PORT=${PORT:-8505}
7
+
8
+ # ChromaDB host path (local chromadb directory)
9
+ CHROMA_HOST_PATH="$(pwd)/chromadb"
10
+
11
+ # Container path where ChromaDB will be mounted
12
+ CHROMA_CONTAINER_PATH="/app/chroma_data"
13
+
14
+ echo "Starting LegisQA Docker container..."
15
+ echo "Port: $PORT"
16
+ echo "ChromaDB Host Path: $CHROMA_HOST_PATH"
17
+ echo "ChromaDB Container Path: $CHROMA_CONTAINER_PATH"
18
+ echo "Environment File: docker.env"
19
+ echo ""
20
+
21
+ # Check if ChromaDB path exists
22
+ if [ ! -d "$CHROMA_HOST_PATH" ]; then
23
+ echo "ERROR: ChromaDB path does not exist: $CHROMA_HOST_PATH"
24
+ exit 1
25
+ fi
26
+
27
+ # Check if env file exists
28
+ if [ ! -f "docker.env" ]; then
29
+ echo "ERROR: Environment file does not exist: docker.env"
30
+ exit 1
31
+ fi
32
+
33
+ # Run Docker container
34
+ docker run -p $PORT:$PORT \
35
+ --env-file docker.env \
36
+ -e PORT=$PORT \
37
+ -v "$CHROMA_HOST_PATH:$CHROMA_CONTAINER_PATH" \
38
+ --name legisqa-container \
39
+ --rm \
40
+ legisqa-local
41
+
42
+ echo "Docker container stopped."
src/legisqa_local/__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ """
2
+ LegisQA: Congressional Legislation Query and Analysis Tool
3
+ """
4
+
5
+ __version__ = "0.1.0"
src/legisqa_local/app.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Main Streamlit application for LegisQA"""
2
+
3
+ import streamlit as st
4
+ from legisqa_local.config.settings import STREAMLIT_CONFIG, setup_environment
5
+ from legisqa_local.components.sidebar import render_sidebar
6
+ from legisqa_local.tabs.rag_tab import RAGTab
7
+ from legisqa_local.tabs.rag_sbs_tab import RAGSideBySideTab
8
+ from legisqa_local.tabs.guide_tab import GuideTab
9
+
10
+
11
+ def main():
12
+ """Main application function"""
13
+ # Configure Streamlit
14
+ st.set_page_config(**STREAMLIT_CONFIG)
15
+
16
+ # Setup environment
17
+ setup_environment()
18
+
19
+ # Main content
20
+ st.title(":classical_building: LegisQA :classical_building:")
21
+ st.header("Query Congressional Bills")
22
+
23
+ # Sidebar
24
+ with st.sidebar:
25
+ render_sidebar()
26
+
27
+ # Create tab instances
28
+ rag_tab = RAGTab()
29
+ rag_sbs_tab = RAGSideBySideTab()
30
+ guide_tab = GuideTab()
31
+
32
+ # Create tabs
33
+ query_rag_tab, query_rag_sbs_tab, guide_tab_ui = st.tabs([
34
+ rag_tab.name,
35
+ rag_sbs_tab.name,
36
+ guide_tab.name,
37
+ ])
38
+
39
+ # Render tab content
40
+ with query_rag_tab:
41
+ rag_tab.render()
42
+
43
+ with query_rag_sbs_tab:
44
+ rag_sbs_tab.render()
45
+
46
+ with guide_tab_ui:
47
+ guide_tab.render()
48
+
49
+
50
+ if __name__ == "__main__":
51
+ main()
src/streamlit_app.py DELETED
@@ -1,40 +0,0 @@
1
- import altair as alt
2
- import numpy as np
3
- import pandas as pd
4
- import streamlit as st
5
-
6
- """
7
- # Welcome to Streamlit!
8
-
9
- Edit `/streamlit_app.py` to customize this app to your heart's desire :heart:.
10
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
11
- forums](https://discuss.streamlit.io).
12
-
13
- In the meantime, below is an example of what you can do with just a few lines of code:
14
- """
15
-
16
- num_points = st.slider("Number of points in spiral", 1, 10000, 1100)
17
- num_turns = st.slider("Number of turns in spiral", 1, 300, 31)
18
-
19
- indices = np.linspace(0, 1, num_points)
20
- theta = 2 * np.pi * num_turns * indices
21
- radius = indices
22
-
23
- x = radius * np.cos(theta)
24
- y = radius * np.sin(theta)
25
-
26
- df = pd.DataFrame({
27
- "x": x,
28
- "y": y,
29
- "idx": indices,
30
- "rand": np.random.randn(num_points),
31
- })
32
-
33
- st.altair_chart(alt.Chart(df, height=700, width=700)
34
- .mark_point(filled=True)
35
- .encode(
36
- x=alt.X("x", axis=None),
37
- y=alt.Y("y", axis=None),
38
- color=alt.Color("idx", legend=None, scale=alt.Scale()),
39
- size=alt.Size("rand", legend=None, scale=alt.Scale(range=[1, 150])),
40
- ))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
uv.lock ADDED
The diff for this file is too large to render. See raw diff