Spaces:
Sleeping
Sleeping
Commit
Β·
9f9da04
1
Parent(s):
2c38d10
update
Browse files- Dockerfile +3 -3
- README.md +0 -52
Dockerfile
CHANGED
|
@@ -18,8 +18,8 @@ COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
|
|
| 18 |
COPY pyproject.toml uv.lock ./
|
| 19 |
COPY src/ ./src/
|
| 20 |
|
| 21 |
-
# Install dependencies
|
| 22 |
-
RUN uv sync --frozen
|
| 23 |
|
| 24 |
# Expose port (default 8501, can be overridden)
|
| 25 |
EXPOSE 8501
|
|
@@ -28,4 +28,4 @@ EXPOSE 8501
|
|
| 28 |
HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
|
| 29 |
|
| 30 |
# Run the application
|
| 31 |
-
CMD ["sh", "-c", "uv run streamlit run src/legisqa_local/app.py --server.port=${PORT:-8501} --server.address=0.0.0.0"]
|
|
|
|
| 18 |
COPY pyproject.toml uv.lock ./
|
| 19 |
COPY src/ ./src/
|
| 20 |
|
| 21 |
+
# Install dependencies and the package
|
| 22 |
+
RUN uv sync --frozen && uv pip install -e .
|
| 23 |
|
| 24 |
# Expose port (default 8501, can be overridden)
|
| 25 |
EXPOSE 8501
|
|
|
|
| 28 |
HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
|
| 29 |
|
| 30 |
# Run the application
|
| 31 |
+
CMD ["sh", "-c", "cd /app && uv run streamlit run src/legisqa_local/app.py --server.port=${PORT:-8501} --server.address=0.0.0.0"]
|
README.md
CHANGED
|
@@ -10,55 +10,3 @@ pinned: false
|
|
| 10 |
license: mit
|
| 11 |
---
|
| 12 |
|
| 13 |
-
# LegisQA Local
|
| 14 |
-
|
| 15 |
-
Query Congressional Bills with AI using local ChromaDB vector search.
|
| 16 |
-
|
| 17 |
-
## Features
|
| 18 |
-
|
| 19 |
-
- ποΈ Query US Congressional legislation from sessions 113-119
|
| 20 |
-
- π Semantic search powered by ChromaDB and google/embeddinggemma-300m embeddings
|
| 21 |
-
- π€ Multiple AI providers: OpenAI, Anthropic, Together AI, Google
|
| 22 |
-
- π Side-by-side comparison of different models
|
| 23 |
-
- π Runs completely locally - no external vector database needed
|
| 24 |
-
|
| 25 |
-
## Setup
|
| 26 |
-
|
| 27 |
-
The app will automatically download and set up the vector database on first run. This includes:
|
| 28 |
-
|
| 29 |
-
1. Downloading the HuggingFace dataset `hyperdemocracy/usc-vecs-s8192-o512-google-embeddinggemma-300m`
|
| 30 |
-
2. Loading Congress 119 data (first 200 documents) into local ChromaDB
|
| 31 |
-
3. Setting up the vector search index
|
| 32 |
-
|
| 33 |
-
For local development, you can also run:
|
| 34 |
-
|
| 35 |
-
```bash
|
| 36 |
-
# Install dependencies
|
| 37 |
-
uv sync
|
| 38 |
-
|
| 39 |
-
# Load test data
|
| 40 |
-
uv run python load_chromadb.py
|
| 41 |
-
|
| 42 |
-
# Run the app
|
| 43 |
-
uv run streamlit run src/app.py
|
| 44 |
-
```
|
| 45 |
-
|
| 46 |
-
## Dataset
|
| 47 |
-
|
| 48 |
-
Uses the HuggingFace dataset containing US Congressional legislation with pre-computed embeddings:
|
| 49 |
-
- **Source**: `hyperdemocracy/usc-vecs-s8192-o512-google-embeddinggemma-300m`
|
| 50 |
-
- **Embeddings**: google/embeddinggemma-300m (768 dimensions)
|
| 51 |
-
- **Coverage**: Congress sessions 113-119 (2013-2025)
|
| 52 |
-
- **Documents**: ~233K total (test mode uses 200 from Congress 119)
|
| 53 |
-
|
| 54 |
-
## Architecture
|
| 55 |
-
|
| 56 |
-
- **Frontend**: Streamlit
|
| 57 |
-
- **Vector Store**: ChromaDB (local)
|
| 58 |
-
- **Embeddings**: HuggingFace Transformers (google/embeddinggemma-300m)
|
| 59 |
-
- **LLMs**: Multiple providers via LangChain
|
| 60 |
-
- **Data**: HuggingFace Datasets
|
| 61 |
-
|
| 62 |
-
## Migration from Pinecone
|
| 63 |
-
|
| 64 |
-
This app was migrated from using Pinecone to local ChromaDB. See `MIGRATION.md` for details.
|
|
|
|
| 10 |
license: mit
|
| 11 |
---
|
| 12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|