gabrielaltay commited on
Commit
9f9da04
Β·
1 Parent(s): 2c38d10
Files changed (2) hide show
  1. Dockerfile +3 -3
  2. README.md +0 -52
Dockerfile CHANGED
@@ -18,8 +18,8 @@ COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
18
  COPY pyproject.toml uv.lock ./
19
  COPY src/ ./src/
20
 
21
- # Install dependencies
22
- RUN uv sync --frozen
23
 
24
  # Expose port (default 8501, can be overridden)
25
  EXPOSE 8501
@@ -28,4 +28,4 @@ EXPOSE 8501
28
  HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
29
 
30
  # Run the application
31
- CMD ["sh", "-c", "uv run streamlit run src/legisqa_local/app.py --server.port=${PORT:-8501} --server.address=0.0.0.0"]
 
18
  COPY pyproject.toml uv.lock ./
19
  COPY src/ ./src/
20
 
21
+ # Install dependencies and the package
22
+ RUN uv sync --frozen && uv pip install -e .
23
 
24
  # Expose port (default 8501, can be overridden)
25
  EXPOSE 8501
 
28
  HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
29
 
30
  # Run the application
31
+ CMD ["sh", "-c", "cd /app && uv run streamlit run src/legisqa_local/app.py --server.port=${PORT:-8501} --server.address=0.0.0.0"]
README.md CHANGED
@@ -10,55 +10,3 @@ pinned: false
10
  license: mit
11
  ---
12
 
13
- # LegisQA Local
14
-
15
- Query Congressional Bills with AI using local ChromaDB vector search.
16
-
17
- ## Features
18
-
19
- - πŸ›οΈ Query US Congressional legislation from sessions 113-119
20
- - πŸ” Semantic search powered by ChromaDB and google/embeddinggemma-300m embeddings
21
- - πŸ€– Multiple AI providers: OpenAI, Anthropic, Together AI, Google
22
- - πŸ“Š Side-by-side comparison of different models
23
- - 🏠 Runs completely locally - no external vector database needed
24
-
25
- ## Setup
26
-
27
- The app will automatically download and set up the vector database on first run. This includes:
28
-
29
- 1. Downloading the HuggingFace dataset `hyperdemocracy/usc-vecs-s8192-o512-google-embeddinggemma-300m`
30
- 2. Loading Congress 119 data (first 200 documents) into local ChromaDB
31
- 3. Setting up the vector search index
32
-
33
- For local development, you can also run:
34
-
35
- ```bash
36
- # Install dependencies
37
- uv sync
38
-
39
- # Load test data
40
- uv run python load_chromadb.py
41
-
42
- # Run the app
43
- uv run streamlit run src/app.py
44
- ```
45
-
46
- ## Dataset
47
-
48
- Uses the HuggingFace dataset containing US Congressional legislation with pre-computed embeddings:
49
- - **Source**: `hyperdemocracy/usc-vecs-s8192-o512-google-embeddinggemma-300m`
50
- - **Embeddings**: google/embeddinggemma-300m (768 dimensions)
51
- - **Coverage**: Congress sessions 113-119 (2013-2025)
52
- - **Documents**: ~233K total (test mode uses 200 from Congress 119)
53
-
54
- ## Architecture
55
-
56
- - **Frontend**: Streamlit
57
- - **Vector Store**: ChromaDB (local)
58
- - **Embeddings**: HuggingFace Transformers (google/embeddinggemma-300m)
59
- - **LLMs**: Multiple providers via LangChain
60
- - **Data**: HuggingFace Datasets
61
-
62
- ## Migration from Pinecone
63
-
64
- This app was migrated from using Pinecone to local ChromaDB. See `MIGRATION.md` for details.
 
10
  license: mit
11
  ---
12