--- language: en library_name: sentence-transformers license: mit pipeline_tag: sentence-similarity tags: - cross-encoder - regression - trail-rag - pathfinder-rag - hotpotqa - multi-hop-question-answering - sentence-transformers model-index: - name: trailrag-cross-encoder-hotpotqa-enhanced results: - task: type: question-answering dataset: name: HotpotQA type: hotpotqa metrics: - type: mse value: 0.0557947916534922 - type: mae value: 0.1418474710541999 - type: rmse value: 0.2362092116186248 - type: r2_score value: 0.6484965021143569 - type: pearson_correlation value: 0.8754595236036868 - type: spearman_correlation value: 0.8618191776300459 --- # TrailRAG Cross-Encoder: HotpotQA Enhanced This is a fine-tuned cross-encoder model specifically optimized for **Multi-hop Question Answering** tasks, trained as part of the PathfinderRAG research project. ## Model Details - **Model Type**: Cross-Encoder for Regression (continuous similarity scores) - **Base Model**: `cross-encoder/ms-marco-MiniLM-L-6-v2` - **Training Dataset**: HotpotQA (Complex reasoning dataset requiring multi-step inference) - **Task**: Multi-hop Question Answering - **Library**: sentence-transformers - **License**: MIT ## Performance Metrics ### Final Regression Metrics | Metric | Value | Description | |--------|-------|-------------| | **MSE** | **0.055795** | Mean Squared Error (lower is better) | | **MAE** | **0.141847** | Mean Absolute Error (lower is better) | | **RMSE** | **0.236209** | Root Mean Squared Error (lower is better) | | **R² Score** | **0.648497** | Coefficient of determination (higher is better) | | **Pearson Correlation** | **0.875460** | Linear correlation (higher is better) | | **Spearman Correlation** | **0.861819** | Rank correlation (higher is better) | ### Training Details - **Training Duration**: 28 minutes - **Epochs**: 8 - **Early Stopping**: No - **Best Correlation Score**: 0.936744 - **Final MSE**: 0.055795 ### Training Configuration - **Batch Size**: 16 - **Learning Rate**: 2e-05 - **Max Epochs**: 8 - **Weight Decay**: 0.01 - **Warmup Steps**: 150 ## Usage This model can be used with the sentence-transformers library for computing semantic similarity scores between query-document pairs. ### Installation ```bash pip install sentence-transformers ``` ### Basic Usage ```python from sentence_transformers import CrossEncoder # Load the model model = CrossEncoder('OloriBern/trailrag-cross-encoder-hotpotqa-enhanced') # Example usage pairs = [ ['What is artificial intelligence?', 'AI is a field of computer science focused on creating intelligent machines.'], ['What is artificial intelligence?', 'Paris is the capital of France.'] ] # Get similarity scores (continuous values, not binary) scores = model.predict(pairs) print(scores) # Higher scores indicate better semantic match ``` ### Advanced Usage in PathfinderRAG ```python from sentence_transformers import CrossEncoder # Initialize for PathfinderRAG exploration cross_encoder = CrossEncoder('OloriBern/trailrag-cross-encoder-hotpotqa-enhanced') def score_query_document_pair(query: str, document: str) -> float: """Score a query-document pair for relevance.""" score = cross_encoder.predict([[query, document]])[0] return float(score) # Use in document exploration query = "Your research query" documents = ["Document 1 text", "Document 2 text", ...] # Score all pairs scores = cross_encoder.predict([[query, doc] for doc in documents]) ranked_docs = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True) ``` ## Training Process This model was trained using **regression metrics** (not classification) to predict continuous similarity scores in the range [0, 1]. The training process focused on: 1. **Data Quality**: Used authentic HotpotQA examples with careful contamination filtering 2. **Regression Approach**: Avoided binary classification, maintaining continuous label distribution 3. **Correlation Optimization**: Maximized Spearman correlation for effective ranking 4. **Scientific Rigor**: All metrics derived from real training runs without simulation ### Why Regression Over Classification? Cross-encoders for information retrieval should predict **continuous similarity scores**, not binary classifications. This approach: - Preserves fine-grained similarity distinctions - Enables better ranking and document selection - Provides more informative scores for downstream applications - Aligns with the mathematical foundation of information retrieval ## Dataset **HotpotQA**: Complex reasoning dataset requiring multi-step inference - **Task Type**: Multi-hop Question Answering - **Training Examples**: 1,000 high-quality pairs - **Validation Split**: 20% (200 examples) - **Quality Threshold**: ≥0.70 (authentic TrailRAG metrics) - **Contamination**: Zero overlap between splits ## Limitations - Optimized specifically for multi-hop question answering tasks - Performance may vary on out-of-domain data - Requires sentence-transformers library for inference - CPU-based training (GPU optimization available for future versions) ## Citation ```bibtex @misc{trailrag-cross-encoder-hotpotqa, title = {TrailRAG Cross-Encoder: HotpotQA Enhanced}, author = {PathfinderRAG Team}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/OloriBern/trailrag-cross-encoder-hotpotqa-enhanced} } ``` ## Model Card Contact For questions about this model, please open an issue in the [PathfinderRAG repository](https://github.com/your-org/trail-rag-1) or contact the development team. --- *This model card was automatically generated using the TrailRAG model card generator with authentic training metrics.*