---
language: en
library_name: sentence-transformers
license: mit
pipeline_tag: sentence-similarity
tags:
- cross-encoder
- regression
- trail-rag
- pathfinder-rag
- hotpotqa
- multi-hop-question-answering
- sentence-transformers
model-index:
- name: trailrag-cross-encoder-hotpotqa-enhanced
  results:
  - task:
      type: question-answering
    dataset:
      name: HotpotQA
      type: hotpotqa
    metrics:
    - type: mse
      value: 0.0557947916534922
    - type: mae
      value: 0.1418474710541999
    - type: rmse
      value: 0.2362092116186248
    - type: r2_score
      value: 0.6484965021143569
    - type: pearson_correlation
      value: 0.8754595236036868
    - type: spearman_correlation
      value: 0.8618191776300459
---

# TrailRAG Cross-Encoder: HotpotQA Enhanced

This is a fine-tuned cross-encoder model specifically optimized for **Multi-hop Question Answering** tasks, trained as part of the PathfinderRAG research project.

## Model Details

- **Model Type**: Cross-Encoder for Regression (continuous similarity scores)
- **Base Model**: `cross-encoder/ms-marco-MiniLM-L-6-v2`
- **Training Dataset**: HotpotQA (Complex reasoning dataset requiring multi-step inference)
- **Task**: Multi-hop Question Answering
- **Library**: sentence-transformers
- **License**: MIT

## Performance Metrics

### Final Regression Metrics

| Metric | Value | Description |
|--------|-------|-------------|
| **MSE** | **0.055795** | Mean Squared Error (lower is better) |
| **MAE** | **0.141847** | Mean Absolute Error (lower is better) |
| **RMSE** | **0.236209** | Root Mean Squared Error (lower is better) |
| **R² Score** | **0.648497** | Coefficient of determination (higher is better) |
| **Pearson Correlation** | **0.875460** | Linear correlation (higher is better) |
| **Spearman Correlation** | **0.861819** | Rank correlation (higher is better) |

### Training Details

- **Training Duration**: 28 minutes
- **Epochs**: 8
- **Early Stopping**: No
- **Best Correlation Score**: 0.936744
- **Final MSE**: 0.055795

### Training Configuration

- **Batch Size**: 16
- **Learning Rate**: 2e-05
- **Max Epochs**: 8
- **Weight Decay**: 0.01
- **Warmup Steps**: 150

## Usage

This model can be used with the sentence-transformers library for computing semantic similarity scores between query-document pairs.

### Installation

```bash
pip install sentence-transformers
```

### Basic Usage

```python
from sentence_transformers import CrossEncoder

# Load the model
model = CrossEncoder('OloriBern/trailrag-cross-encoder-hotpotqa-enhanced')

# Example usage
pairs = [
    ['What is artificial intelligence?', 'AI is a field of computer science focused on creating intelligent machines.'],
    ['What is artificial intelligence?', 'Paris is the capital of France.']
]

# Get similarity scores (continuous values, not binary)
scores = model.predict(pairs)
print(scores)  # Higher scores indicate better semantic match
```

### Advanced Usage in PathfinderRAG

```python
from sentence_transformers import CrossEncoder

# Initialize for PathfinderRAG exploration
cross_encoder = CrossEncoder('OloriBern/trailrag-cross-encoder-hotpotqa-enhanced')

def score_query_document_pair(query: str, document: str) -> float:
    """Score a query-document pair for relevance."""
    score = cross_encoder.predict([[query, document]])[0]
    return float(score)

# Use in document exploration
query = "Your research query"
documents = ["Document 1 text", "Document 2 text", ...]

# Score all pairs
scores = cross_encoder.predict([[query, doc] for doc in documents])
ranked_docs = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True)
```

## Training Process

This model was trained using **regression metrics** (not classification) to predict continuous similarity scores in the range [0, 1]. The training process focused on:

1. **Data Quality**: Used authentic HotpotQA examples with careful contamination filtering
2. **Regression Approach**: Avoided binary classification, maintaining continuous label distribution
3. **Correlation Optimization**: Maximized Spearman correlation for effective ranking
4. **Scientific Rigor**: All metrics derived from real training runs without simulation

### Why Regression Over Classification?

Cross-encoders for information retrieval should predict **continuous similarity scores**, not binary classifications. This approach:

- Preserves fine-grained similarity distinctions
- Enables better ranking and document selection
- Provides more informative scores for downstream applications
- Aligns with the mathematical foundation of information retrieval

## Dataset

**HotpotQA**: Complex reasoning dataset requiring multi-step inference

- **Task Type**: Multi-hop Question Answering
- **Training Examples**: 1,000 high-quality pairs
- **Validation Split**: 20% (200 examples)
- **Quality Threshold**: ≥0.70 (authentic TrailRAG metrics)
- **Contamination**: Zero overlap between splits

## Limitations

- Optimized specifically for multi-hop question answering tasks
- Performance may vary on out-of-domain data
- Requires sentence-transformers library for inference
- CPU-based training (GPU optimization available for future versions)

## Citation

```bibtex
@misc{trailrag-cross-encoder-hotpotqa,
  title = {TrailRAG Cross-Encoder: HotpotQA Enhanced},
  author = {PathfinderRAG Team},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/OloriBern/trailrag-cross-encoder-hotpotqa-enhanced}
}
```

## Model Card Contact

For questions about this model, please open an issue in the [PathfinderRAG repository](https://github.com/your-org/trail-rag-1) or contact the development team.

---

*This model card was automatically generated using the TrailRAG model card generator with authentic training metrics.*