Spaces:

hyperdemocracy
/

legisqa-local

Sleeping

App Files Files Community

gabrielaltay commited on Sep 28

Commit

9f9da04

1 Parent(s): 2c38d10

update

Browse files

Files changed (2) hide show

Dockerfile +3 -3
README.md +0 -52

Dockerfile CHANGED Viewed

@@ -18,8 +18,8 @@ COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
 COPY pyproject.toml uv.lock ./
 COPY src/ ./src/
-# Install dependencies
-RUN uv sync --frozen
 # Expose port (default 8501, can be overridden)
 EXPOSE 8501
@@ -28,4 +28,4 @@ EXPOSE 8501
 HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
 # Run the application
-CMD ["sh", "-c", "uv run streamlit run src/legisqa_local/app.py --server.port=${PORT:-8501} --server.address=0.0.0.0"]

 COPY pyproject.toml uv.lock ./
 COPY src/ ./src/
+# Install dependencies and the package
+RUN uv sync --frozen && uv pip install -e .
 # Expose port (default 8501, can be overridden)
 EXPOSE 8501
 HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
 # Run the application
+CMD ["sh", "-c", "cd /app && uv run streamlit run src/legisqa_local/app.py --server.port=${PORT:-8501} --server.address=0.0.0.0"]

README.md CHANGED Viewed

@@ -10,55 +10,3 @@ pinned: false
 license: mit
 ---
-# LegisQA Local
-Query Congressional Bills with AI using local ChromaDB vector search.
-## Features
-- 🏛️ Query US Congressional legislation from sessions 113-119
-- 🔍 Semantic search powered by ChromaDB and google/embeddinggemma-300m embeddings
-- 🤖 Multiple AI providers: OpenAI, Anthropic, Together AI, Google
-- 📊 Side-by-side comparison of different models
-- 🏠 Runs completely locally - no external vector database needed
-## Setup
-The app will automatically download and set up the vector database on first run. This includes:
-1. Downloading the HuggingFace dataset `hyperdemocracy/usc-vecs-s8192-o512-google-embeddinggemma-300m`
-2. Loading Congress 119 data (first 200 documents) into local ChromaDB
-3. Setting up the vector search index
-For local development, you can also run:
-```bash
-# Install dependencies
-uv sync
-# Load test data
-uv run python load_chromadb.py
-# Run the app
-uv run streamlit run src/app.py
-```
-## Dataset
-Uses the HuggingFace dataset containing US Congressional legislation with pre-computed embeddings:
-- **Source**: `hyperdemocracy/usc-vecs-s8192-o512-google-embeddinggemma-300m`
-- **Embeddings**: google/embeddinggemma-300m (768 dimensions)
-- **Coverage**: Congress sessions 113-119 (2013-2025)
-- **Documents**: ~233K total (test mode uses 200 from Congress 119)
-## Architecture
-- **Frontend**: Streamlit
-- **Vector Store**: ChromaDB (local)
-- **Embeddings**: HuggingFace Transformers (google/embeddinggemma-300m)
-- **LLMs**: Multiple providers via LangChain
-- **Data**: HuggingFace Datasets
-## Migration from Pinecone
-This app was migrated from using Pinecone to local ChromaDB. See `MIGRATION.md` for details.


10	license: mit
11	---
12