File size: 1,246 Bytes
ec13db6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8b108d3
ec13db6
 
 
 
 
8b108d3
ec13db6
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
language: en
library_name: splade-index
tags:
- splade
- splade-index
- retrieval
- search
- sparse
---

# Splade-Index

This is an index created with the [splade-index](https://github.com/rasyosef/splade-index) library (version `0.1.1`)

## Installation

You can install the `splade-index` library with `pip`:

```bash
pip install "splade-index==0.1.1"

# Include extra dependencies like stemmer
pip install "splade-index[full]==0.1.1"

# For huggingface hub usage
pip install huggingface_hub
```

## Load this Index

You can use the following code to load this SPLADE index from Hugging Face hub:

```python
from sentence_transformers import SparseEncoder
from splade_index import SPLADE

# Download the SPLADE model that was used to create the index from the HuggingFace Hub
model_id = "rasyosef/splade-tiny" # The splade model id
model = SparseEncoder(model_id)

repo_id = "yosefw/natural_questions_3m_splade_index"

# Load a SPLADE index from the Hugging Face model hub
retriever = SPLADE.load_from_hub(repo_id, model=model)
```

## Stats

This dataset was created using the following data:

| Statistic | Value |
| --- | --- |
| Number of documents | 2681468 |
| Number of tokens | 464573223 |
| Average tokens per document | 173.25 |