FinR1-llama-8b-multi-language-thinking

Overview

FinR1-llama-8b-multi-language-thinking is an 8-billion-parameter model fine-tuned for financial reasoning, multilingual analysis, and structured thinking.
It is built on top of meta-llama/Llama-3.1-8B-Instruct and extends prior Phi-series work by introducing deeper multilingual support, reasoning-trace integration, and improved numerical reliability in quantitative tasks.

This release focuses on OPTIONAL thought-process modeling using <think> reasoning tags and multi-turn financial dialogues across 60+ languages.

🧠 Core Objective

The FinR1 (Finance Reasoning 1) line targets:

Reasoned Financial Analysis: multi-step logic across accounting, markets, and macroeconomics
Cross-lingual Finance QA: trained in Arabic, Uzbek, Chinese, Spanish, and more
Data Interpretation Tasks: understands and restructures tables, reports, and datasets
Quantitative Precision: improved calculation reliability and explanation clarity

🔄 Training Phases

1. Base Adaptation

Model: meta-llama/Llama-3.1-8B-Instruct
Dataset: Finance-Instruct-500k
Goal: establish strong instruction-following and financial domain foundation.

2. Reasoning Trace Integration

Added reasoning traces filtered from Gemini-based synthetic outputs (finance-related subset).
Each entry follows a <think>…</think> structure to promote transparent reasoning.
Result: more interpretable reasoning patterns with lower hallucination rates.

3. Multilingual Finance QA Expansion

Datasets:
And more unreleased multi language datasets. Coverage extended to 60+ languages.
Full Languages List Used: "Arabic", "Amharic", "Azerbaijani", "Bengali", "Burmese", "Chinese (Simplified)", "Chinese (Traditional)", "Czech", "Danish", "Dutch", "English", "Finnish", "French", "Georgian", "German", "Greek", "Gujarati", "Haitian Creole", "Hausa", "Hebrew", "Hindi", "Hungarian", "Igbo", "Indonesian", "Italian", "Japanese", "Javanese", "Kazakh", "Khmer", "Korean", "Lao", "Malay", "Marathi", "Persian", "Polish", "Portuguese", "Punjabi", "Quechua", "Romanian", "Russian", "Serbian/Croatian/Bosnian", "Sinhala", "Somali", "Spanish", "Swahili", "Swedish", "Tagalog", "Tamil", "Telugu", "Thai", "Turkish", "Turkmen", "Ukrainian", "Urdu", "Uzbek", "Vietnamese", "Yoruba", "Zulu"

Using comprehensive seed data from: https://huggingface.co/datasets/Josephgflowers/finance_curriculum_topics

Covering 7,794 financial topic.

4. Quantitative & Analytical Calibration

Secondary fine-tune on tabular financial reasoning (FinQA, LIMO, Cortex-1).
Reinforced structured arithmetic steps and explanation fidelity.

5. Evaluation & Bench Testing

Benchmarked against prior Phi-mini reasoning models and base Llama-3.1-8B:

Task	Metric	Base 8B	FinR1-8B
Spreadsheet conversion	Structural accuracy	0.74	0.98
Financial difference calc	Numerical correctness	0.67	1.00
Instruction following	Pass rate	0.81	0.96
Multilingual finance QA	F1 (avg 10 langs)	0.61	0.89

Example benchmark:

Prompt: “Compare POS vs. Online Store total sales.”
Output: precise arithmetic (Δ = $1,821,466.27) with clear step-by-step reasoning.
Confidence: high, no rounding drift.

⚙️ Model Capabilities

Financial Data Interpretation
Extracts and summarizes structured tables, spreadsheets, and ledgers.
Analytical Reasoning
Performs step-wise quantitative comparisons and explains calculations.
Instruction Following
Adheres strictly to user/system directives in chain-of-thought or tagged format.
Multilingual QA
Responds natively in 60+ languages with localized financial terminology.
Structured Outputs
Supports JSON, CSV, or XML reasoning output for integration with RAG and pipelines.

🧩 Example Usage

System Prompt


You are a multilingual financial reasoning assistant.
Explain your reasoning step by step using <think>...</think> tags.

Input


Sales channel data:
POS total: $2,075,743.54
Online total: $254,277.27
How much more is POS doing?

Output


<think>
To find the difference, subtract Online total from POS total:
2,075,743.54 - 254,277.27 = 1,821,466.27
</think>
The Point of Sale channel generated $1,821,466.27 more than the Online Store.

🧮 Testing Summary

Recent limited evaluation tests by GPT 5 showed:

99% structural accuracy on table reconstruction tasks
Error rate <1% on numerical difference queries
High cross-lingual consistency — identical reasoning structure reproduced in Arabic, French, and Uzbek
No instruction degradation after long-context (8–10 k tokens) sequences

The model’s reasoning outputs mirror the structure defined in Pollinations dataset generation scripts used in development.

🔧 Technical Details

Parameter	Value
Base	Llama-3.1-8B-Instruct
Architecture	8 B parameters
Context Length	16 k
Precision	bfloat16 / 4-bit LoRA compatible
License	llama3.1
Author	Joseph G. Flowers
Framework	Hugging Face Transformers + TRL / Unsloth

🚀 Usage Example (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Josephgflowers/FinR1-llama-8b-multi-language-thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

prompt = """You are a financial assistant. Use <think>...</think> to explain your steps.
Sales increased from $500K to $650K. What is the percentage growth?"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@model{josephgflowers2025finr1llama8b,
  title={FinR1-llama-8b-multi-language-thinking},
  author={Joseph G. Flowers},
  year={2025},
  url={https://huggingface.co/Josephgflowers/FinR1-llama-8b-multi-language-thinking}
}

🧭 Notes

Additional multilingual finance datasets (v2 + v3) and extended Gemini-filtered reasoning traces will be uploaded soon to support reproducibility and expansion for FinR2. Future plans include dataset release under the Finance-Reasoning-Hub collection for structured evaluation across reasoning, translation, and quantitative accuracy.

Downloads last month: 24

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Josephgflowers/FinR1-llama-8b-multi-language-thinking

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct