FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain
Abstract
FinTrust is a benchmark designed to evaluate the trustworthiness of LLMs in finance applications, focusing on alignment issues and revealing gaps in legal awareness.
Recent LLMs have demonstrated promising ability in solving finance related problems. However, applying LLMs in real-world finance application remains challenging due to its high risk and high stakes property. This paper introduces FinTrust, a comprehensive benchmark specifically designed for evaluating the trustworthiness of LLMs in finance applications. Our benchmark focuses on a wide range of alignment issues based on practical context and features fine-grained tasks for each dimension of trustworthiness evaluation. We assess eleven LLMs on FinTrust and find that proprietary models like o4-mini outperforms in most tasks such as safety while open-source models like DeepSeek-V3 have advantage in specific areas like industry-level fairness. For challenging task like fiduciary alignment and disclosure, all LLMs fall short, showing a significant gap in legal awareness. We believe that FinTrust can be a valuable benchmark for LLMs' trustworthiness evaluation in finance domain.
Community
Recent LLMs have demonstrated promising ability in solving finance related problems. However, applying LLMs in real-world finance application remains challenging due to its high risk and high stakes property. This paper introduces FinTrust, a comprehensive benchmark specifically designed for evaluating the trustworthiness of LLMs in finance applications. Our benchmark focuses on a wide range of alignment issues based on practical context and features fine-grained tasks for each dimension of trustworthiness evaluation. We assess eleven LLMs on FinTrust and find that proprietary models like o4-mini outperforms in most tasks such as safety while open-source models like DeepSeek-V3 have advantage in specific areas like industry-level fairness. For challenging task like fiduciary alignment and disclosure, all LLMs fall short, showing a significant gap in legal awareness. We believe that FinTrust can be a valuable benchmark for LLMs' trustworthiness evaluation in finance domain.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- INSEva: A Comprehensive Chinese Benchmark for Large Language Models in Insurance (2025)
- TRUSTVIS: A Multi-Dimensional Trustworthiness Evaluation Framework for Large Language Models (2025)
- HALF: Harm-Aware LLM Fairness Evaluation Aligned with Deployment (2025)
- FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs (2025)
- LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling (2025)
- Synthesizing Behaviorally-Grounded Reasoning Chains: A Data-Generation Framework for Personal Finance LLMs (2025)
- StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper