arxiv:2506.02515

FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning

Published on Jun 3

· Submitted by

Zhuohan on Jun 5

Upvote

Authors:

Zhuohan Xie ,

Abstract

A new benchmark called FinChain evaluates multi-step symbolic reasoning in financial tasks with a focus on intermediate reasoning steps, introducing ChainEval as a metric for assessing both final answers and reasoning processes.

AI-generated summary

Multi-step symbolic reasoning is critical for advancing downstream performance on financial tasks. Yet, benchmarks for systematically evaluating this capability are lacking. Existing datasets like FinQA and ConvFinQA supervise only final numerical answers, without assessing intermediate reasoning steps. To address this, we introduce FinChain, the first symbolic benchmark designed for verifiable Chain-of- Thought (CoT) financial reasoning. Spanning 54 topics across 12 financial domains, Fin- Chain offers five parameterized templates per topic, each varying in reasoning complexity and domain expertise required. Each dataset instance includes an executable Python trace, enabling automatic generation of extensive training data and easy adaptation to other domains. We also introduce ChainEval, a new metric for automatic evaluation of both final answers and intermediate reasoning. Benchmarking 30 LLMs on our dataset, we find that even state-of-the-art models have considerable room for improvement in multi-step financial reasoning. All templates and evaluation metrics for FinChain are available at https: //github.com/mbzuai-nlp/finchain.

View arXiv page View PDF Add to collection

Community

Zhuohan

Paper author Paper submitter 3 days ago

🔍 FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning
We introduce FinChain, a benchmark designed to evaluate and improve the reasoning abilities of LLMs in financial tasks. Unlike prior work that relies on final-answer supervision, FinChain provides symbolic, executable chain-of-thought traces across 54 financial topics.
It enables fine-grained, verifiable reasoning supervision and allows for the development of interpretable, trustworthy financial agents.
📌 Dataset & code: github.com/mbzuai-nlp/finchain

librarian-bot

2 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.02515 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.02515 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.02515 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.