README.md · Y-Research-Group/CSR-NV_Embed_v2-Classification-MTOPIntent at main

CSR-NV_Embed_v2-Classification-MTOPIntent

File size: 4,901 Bytes

---
license: mit
datasets:
- mteb/mtop_intent
language:
- en
pipeline_tag: text-classification
library_name: sentence-transformers
tags:
- mteb
- text
- transformers
- text-embeddings-inference
- sparse-encoder
- sparse
- csr
model-index:
- name: CSR
  results:
    - dataset:
        name: MTEB MTOPIntentClassification (en)
        type: mteb/mtop_intent
        revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
        config: en
        split: test
        languages:
          - eng-Latn
      metrics:
        - type: accuracy
          value: 0.906407
        - type: f1
          value: 0.694457
        - type: f1_weighted
          value: 0.917326
        - type: main_score
          value: 0.906407
      task:
        type: Classification

    - dataset:
        name: MTEB MTOPIntentClassification (de)
        type: mteb/mtop_intent
        revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
        config: de
        split: test
        languages:
          - deu-Latn
      metrics:
        - type: accuracy
          value: 0.851
        - type: f1
          value: 0.601279
        - type: f1_weighted
          value: 0.863969
        - type: main_score
          value: 0.851
      task:
        type: Classification

    - dataset:
        name: MTEB MTOPIntentClassification (es)
        type: mteb/mtop_intent
        revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
        config: es
        split: test
        languages:
          - spa-Latn
      metrics:
        - type: accuracy
          value: 0.906738
        - type: f1
          value: 0.642295
        - type: f1_weighted
          value: 0.910882
        - type: main_score
          value: 0.906738
      task:
        type: Classification

    - dataset:
        name: MTEB MTOPIntentClassification (fr)
        type: mteb/mtop_intent
        revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
        config: fr
        split: test
        languages:
          - fra-Latn
      metrics:
        - type: accuracy
          value: 0.849045
        - type: f1
          value: 0.59923
        - type: f1_weighted
          value: 0.863301
        - type: main_score
          value: 0.849045
      task:
        type: Classification

    - dataset:
        name: MTEB MTOPIntentClassification (hi)
        type: mteb/mtop_intent
        revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
        config: hi
        split: test
        languages:
          - hin-Deva
      metrics:
        - type: accuracy
          value: 0.751094
        - type: f1
          value: 0.44095
        - type: f1_weighted
          value: 0.762567
        - type: main_score
          value: 0.751094
      task:
        type: Classification

    - dataset:
        name: MTEB MTOPIntentClassification (th)
        type: mteb/mtop_intent
        revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
        config: th
        split: test
        languages:
          - tha-Thai
      metrics:
        - type: accuracy
          value: 0.75566
        - type: f1
          value: 0.498529
        - type: f1_weighted
          value: 0.76994
        - type: main_score
          value: 0.75566
      task:
        type: Classification
base_model:
  - nvidia/NV-Embed-v2
---


For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [Github](https://github.com/neilwen987/CSR_Adaptive_Rep).


## Usage
📌 **Tip**: For NV-Embed-V2, using Transformers versions **later** than 4.47.0 may lead to performance degradation, as ``model_type=bidir_mistral`` in ``config.json`` is no longer supported.

We recommend using ``Transformers 4.47.0.``

### Sentence Transformers Usage
You can evaluate this model loaded by Sentence Transformers with the following code snippet:
```python
import mteb
from sentence_transformers import SparseEncoder
model = SparseEncoder(
    "Y-Research-Group/CSR-NV_Embed_v2-Classification-MTOPIntent",
    trust_remote_code=True
)
model.prompts = {
  "MTOPIntentClassification": "Instruct: Classify the intent of the given utterance in task-oriented conversation\nQuery:"
}
task = mteb.get_tasks(tasks=["MTOPIntentClassification"])
evaluation = mteb.MTEB(tasks=task)
evaluation.run(model, 
    eval_splits=["test"], 
    output_folder="./results/MTOPIntentClassification", 
    show_progress_bar=True
    encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8},
)  # MTEB don't support sparse tensors yet, so we need to convert to dense tensors
```

## Citation
```bibtex
@inproceedings{wenbeyond,
  title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation},
  author={Wen, Tiansheng and Wang, Yifei and Zeng, Zequn and Peng, Zhong and Su, Yudi and Liu, Xinyang and Chen, Bo and Liu, Hongwei and Jegelka, Stefanie and You, Chenyu},
  booktitle={Forty-second International Conference on Machine Learning}
}
```