|
--- |
|
language: en |
|
library_name: splade-index |
|
tags: |
|
- splade |
|
- splade-index |
|
- retrieval |
|
- search |
|
- sparse |
|
--- |
|
|
|
# Splade-Index |
|
|
|
This is an index created with the [splade-index](https://github.com/rasyosef/splade-index) library (version `0.1.1`) |
|
|
|
## Installation |
|
|
|
You can install the `splade-index` library with `pip`: |
|
|
|
```bash |
|
pip install "splade-index==0.1.1" |
|
|
|
# Include extra dependencies like stemmer |
|
pip install "splade-index[full]==0.1.1" |
|
|
|
# For huggingface hub usage |
|
pip install huggingface_hub |
|
``` |
|
|
|
## Load this Index |
|
|
|
You can use the following code to load this SPLADE index from Hugging Face hub: |
|
|
|
```python |
|
from sentence_transformers import SparseEncoder |
|
from splade_index import SPLADE |
|
|
|
# Download the SPLADE model that was used to create the index from the HuggingFace Hub |
|
model_id = "rasyosef/splade-tiny" # The splade model id |
|
model = SparseEncoder(model_id) |
|
|
|
repo_id = "yosefw/natural_questions_3m_splade_index" |
|
|
|
# Load a SPLADE index from the Hugging Face model hub |
|
retriever = SPLADE.load_from_hub(repo_id, model=model) |
|
``` |
|
|
|
## Stats |
|
|
|
This dataset was created using the following data: |
|
|
|
| Statistic | Value | |
|
| --- | --- | |
|
| Number of documents | 2681468 | |
|
| Number of tokens | 464573223 | |
|
| Average tokens per document | 173.25 | |
|
|
|
|