File size: 4,901 Bytes
ca69df9
4579def
 
 
 
 
 
 
ca69df9
 
4579def
 
 
 
 
 
ca69df9
4579def
ca69df9
4579def
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca69df9
 
 
4579def
ca69df9
 
4579def
a757ebb
ca69df9
4579def
ca69df9
4579def
 
ca69df9
4579def
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f3a71a5
ca69df9
 
 
 
4579def
 
 
 
ca69df9
4579def
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
---
license: mit
datasets:
- mteb/mtop_intent
language:
- en
pipeline_tag: text-classification
library_name: sentence-transformers
tags:
- mteb
- text
- transformers
- text-embeddings-inference
- sparse-encoder
- sparse
- csr
model-index:
- name: CSR
  results:
    - dataset:
        name: MTEB MTOPIntentClassification (en)
        type: mteb/mtop_intent
        revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
        config: en
        split: test
        languages:
          - eng-Latn
      metrics:
        - type: accuracy
          value: 0.906407
        - type: f1
          value: 0.694457
        - type: f1_weighted
          value: 0.917326
        - type: main_score
          value: 0.906407
      task:
        type: Classification

    - dataset:
        name: MTEB MTOPIntentClassification (de)
        type: mteb/mtop_intent
        revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
        config: de
        split: test
        languages:
          - deu-Latn
      metrics:
        - type: accuracy
          value: 0.851
        - type: f1
          value: 0.601279
        - type: f1_weighted
          value: 0.863969
        - type: main_score
          value: 0.851
      task:
        type: Classification

    - dataset:
        name: MTEB MTOPIntentClassification (es)
        type: mteb/mtop_intent
        revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
        config: es
        split: test
        languages:
          - spa-Latn
      metrics:
        - type: accuracy
          value: 0.906738
        - type: f1
          value: 0.642295
        - type: f1_weighted
          value: 0.910882
        - type: main_score
          value: 0.906738
      task:
        type: Classification

    - dataset:
        name: MTEB MTOPIntentClassification (fr)
        type: mteb/mtop_intent
        revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
        config: fr
        split: test
        languages:
          - fra-Latn
      metrics:
        - type: accuracy
          value: 0.849045
        - type: f1
          value: 0.59923
        - type: f1_weighted
          value: 0.863301
        - type: main_score
          value: 0.849045
      task:
        type: Classification

    - dataset:
        name: MTEB MTOPIntentClassification (hi)
        type: mteb/mtop_intent
        revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
        config: hi
        split: test
        languages:
          - hin-Deva
      metrics:
        - type: accuracy
          value: 0.751094
        - type: f1
          value: 0.44095
        - type: f1_weighted
          value: 0.762567
        - type: main_score
          value: 0.751094
      task:
        type: Classification

    - dataset:
        name: MTEB MTOPIntentClassification (th)
        type: mteb/mtop_intent
        revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
        config: th
        split: test
        languages:
          - tha-Thai
      metrics:
        - type: accuracy
          value: 0.75566
        - type: f1
          value: 0.498529
        - type: f1_weighted
          value: 0.76994
        - type: main_score
          value: 0.75566
      task:
        type: Classification
base_model:
  - nvidia/NV-Embed-v2
---


For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [Github](https://github.com/neilwen987/CSR_Adaptive_Rep).


## Usage
📌 **Tip**: For NV-Embed-V2, using Transformers versions **later** than 4.47.0 may lead to performance degradation, as ``model_type=bidir_mistral`` in ``config.json`` is no longer supported.

We recommend using ``Transformers 4.47.0.``

### Sentence Transformers Usage
You can evaluate this model loaded by Sentence Transformers with the following code snippet:
```python
import mteb
from sentence_transformers import SparseEncoder
model = SparseEncoder(
    "Y-Research-Group/CSR-NV_Embed_v2-Classification-MTOPIntent",
    trust_remote_code=True
)
model.prompts = {
  "MTOPIntentClassification": "Instruct: Classify the intent of the given utterance in task-oriented conversation\nQuery:"
}
task = mteb.get_tasks(tasks=["MTOPIntentClassification"])
evaluation = mteb.MTEB(tasks=task)
evaluation.run(model, 
    eval_splits=["test"], 
    output_folder="./results/MTOPIntentClassification", 
    show_progress_bar=True
    encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8},
)  # MTEB don't support sparse tensors yet, so we need to convert to dense tensors
```

## Citation
```bibtex
@inproceedings{wenbeyond,
  title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation},
  author={Wen, Tiansheng and Wang, Yifei and Zeng, Zequn and Peng, Zhong and Su, Yudi and Liu, Xinyang and Chen, Bo and Liu, Hongwei and Jegelka, Stefanie and You, Chenyu},
  booktitle={Forty-second International Conference on Machine Learning}
}
```