FinR1-llama-8b-multi-language-thinking
Overview
FinR1-llama-8b-multi-language-thinking is an 8-billion-parameter model fine-tuned for financial reasoning, multilingual analysis, and structured thinking.
It is built on top of meta-llama/Llama-3.1-8B-Instruct and extends prior Phi-series work by introducing deeper multilingual support, reasoning-trace integration, and improved numerical reliability in quantitative tasks.
This release focuses on OPTIONAL thought-process modeling using <think> reasoning tags and multi-turn financial dialogues across 60+ languages.
Sponsored with the generous support of : Cherry Republic
๐ง Core Objective
The FinR1 (Finance Reasoning 1) line targets:
- Reasoned Financial Analysis: multi-step logic across accounting, markets, and macroeconomics
- Cross-lingual Finance QA: trained in Arabic, Uzbek, Chinese, Spanish, and more
- Data Interpretation Tasks: understands and restructures tables, reports, and datasets
- Quantitative Precision: improved calculation reliability and explanation clarity
๐ Training Phases
1. Base Adaptation
- Model:
meta-llama/Llama-3.1-8B-Instruct - Dataset: Finance-Instruct-500k
- Goal: establish strong instruction-following and financial domain foundation.
2. Reasoning Trace Integration
- Added reasoning traces filtered from Gemini-based synthetic outputs (finance-related subset).
- Each entry follows a
<think>โฆ</think>structure to promote transparent reasoning. - Result: more interpretable reasoning patterns with lower hallucination rates.
3. Multilingual Finance QA Expansion
Datasets:
And more unreleased multi language datasets. Coverage extended to 60+ languages.
Full Languages List Used: "Arabic", "Amharic", "Azerbaijani", "Bengali", "Burmese", "Chinese (Simplified)", "Chinese (Traditional)", "Czech", "Danish", "Dutch", "English", "Finnish", "French", "Georgian", "German", "Greek", "Gujarati", "Haitian Creole", "Hausa", "Hebrew", "Hindi", "Hungarian", "Igbo", "Indonesian", "Italian", "Japanese", "Javanese", "Kazakh", "Khmer", "Korean", "Lao", "Malay", "Marathi", "Persian", "Polish", "Portuguese", "Punjabi", "Quechua", "Romanian", "Russian", "Serbian/Croatian/Bosnian", "Sinhala", "Somali", "Spanish", "Swahili", "Swedish", "Tagalog", "Tamil", "Telugu", "Thai", "Turkish", "Turkmen", "Ukrainian", "Urdu", "Uzbek", "Vietnamese", "Yoruba", "Zulu"
Using comprehensive seed data from: https://huggingface.co/datasets/Josephgflowers/finance_curriculum_topics
Covering 7,794 financial topic.
4. Quantitative & Analytical Calibration
- Secondary fine-tune on tabular financial reasoning (FinQA, LIMO, Cortex-1).
- Reinforced structured arithmetic steps and explanation fidelity.
5. Evaluation & Bench Testing
- Benchmarked against prior Phi-mini reasoning models and base Llama-3.1-8B:Example benchmark:
Task Metric Base 8B FinR1-8B Spreadsheet conversion Structural accuracy 0.74 0.98 Financial difference calc Numerical correctness 0.67 1.00 Instruction following Pass rate 0.81 0.96 Multilingual finance QA F1 (avg 10 langs) 0.61 0.89 - Prompt: โCompare POS vs. Online Store total sales.โ
- Output: precise arithmetic (ฮ = $1,821,466.27) with clear step-by-step reasoning.
- Confidence: high, no rounding drift.
โ๏ธ Model Capabilities
- Financial Data Interpretation
Extracts and summarizes structured tables, spreadsheets, and ledgers. - Analytical Reasoning
Performs step-wise quantitative comparisons and explains calculations. - Instruction Following
Adheres strictly to user/system directives in chain-of-thought or tagged format. - Multilingual QA
Responds natively in 60+ languages with localized financial terminology. - Structured Outputs
Supports JSON, CSV, or XML reasoning output for integration with RAG and pipelines.
๐งฉ Example Usage
System Prompt
You are a multilingual financial reasoning assistant.
Explain your reasoning step by step using <think>...</think> tags.
Input
Sales channel data:
POS total: $2,075,743.54
Online total: $254,277.27
How much more is POS doing?
Output
<think>
To find the difference, subtract Online total from POS total:
2,075,743.54 - 254,277.27 = 1,821,466.27
</think>
The Point of Sale channel generated $1,821,466.27 more than the Online Store.
๐งฎ Testing Summary
Recent limited evaluation tests by GPT 5 showed:
- 99% structural accuracy on table reconstruction tasks
- Error rate <1% on numerical difference queries
- High cross-lingual consistency โ identical reasoning structure reproduced in Arabic, French, and Uzbek
- No instruction degradation after long-context (8โ10 k tokens) sequences
The modelโs reasoning outputs mirror the structure defined in Pollinations dataset generation scripts used in development.
๐ง Technical Details
| Parameter | Value |
|---|---|
| Base | Llama-3.1-8B-Instruct |
| Architecture | 8 B parameters |
| Context Length | 16 k |
| Precision | bfloat16 / 4-bit LoRA compatible |
| License | llama3.1 |
| Author | Joseph G. Flowers |
| Framework | Hugging Face Transformers + TRL / Unsloth |
๐ Usage Example (Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Josephgflowers/FinR1-llama-8b-multi-language-thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
prompt = """You are a financial assistant. Use <think>...</think> to explain your steps.
Sales increased from $500K to $650K. What is the percentage growth?"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Citation
@model{josephgflowers2025finr1llama8b,
title={FinR1-llama-8b-multi-language-thinking},
author={Joseph G. Flowers},
year={2025},
url={https://huggingface.co/Josephgflowers/FinR1-llama-8b-multi-language-thinking}
}
๐งญ Notes
Additional multilingual finance datasets (v2 + v3) and extended Gemini-filtered reasoning traces will be uploaded soon to support reproducibility and expansion for FinR2. Future plans include dataset release under the Finance-Reasoning-Hub collection for structured evaluation across reasoning, translation, and quantitative accuracy.
- Downloads last month
- 24
