image

FinR1-llama-8b-multi-language-thinking

Overview

FinR1-llama-8b-multi-language-thinking is an 8-billion-parameter model fine-tuned for financial reasoning, multilingual analysis, and structured thinking.
It is built on top of meta-llama/Llama-3.1-8B-Instruct and extends prior Phi-series work by introducing deeper multilingual support, reasoning-trace integration, and improved numerical reliability in quantitative tasks.

This release focuses on OPTIONAL thought-process modeling using <think> reasoning tags and multi-turn financial dialogues across 60+ languages.

Sponsored with the generous support of : Cherry Republic

๐Ÿง  Core Objective

The FinR1 (Finance Reasoning 1) line targets:

  • Reasoned Financial Analysis: multi-step logic across accounting, markets, and macroeconomics
  • Cross-lingual Finance QA: trained in Arabic, Uzbek, Chinese, Spanish, and more
  • Data Interpretation Tasks: understands and restructures tables, reports, and datasets
  • Quantitative Precision: improved calculation reliability and explanation clarity

๐Ÿ”„ Training Phases

1. Base Adaptation

  • Model: meta-llama/Llama-3.1-8B-Instruct
  • Dataset: Finance-Instruct-500k
  • Goal: establish strong instruction-following and financial domain foundation.

2. Reasoning Trace Integration

  • Added reasoning traces filtered from Gemini-based synthetic outputs (finance-related subset).
  • Each entry follows a <think>โ€ฆ</think> structure to promote transparent reasoning.
  • Result: more interpretable reasoning patterns with lower hallucination rates.

3. Multilingual Finance QA Expansion

  • Datasets:

  • And more unreleased multi language datasets. Coverage extended to 60+ languages.

  • Full Languages List Used: "Arabic", "Amharic", "Azerbaijani", "Bengali", "Burmese", "Chinese (Simplified)", "Chinese (Traditional)", "Czech", "Danish", "Dutch", "English", "Finnish", "French", "Georgian", "German", "Greek", "Gujarati", "Haitian Creole", "Hausa", "Hebrew", "Hindi", "Hungarian", "Igbo", "Indonesian", "Italian", "Japanese", "Javanese", "Kazakh", "Khmer", "Korean", "Lao", "Malay", "Marathi", "Persian", "Polish", "Portuguese", "Punjabi", "Quechua", "Romanian", "Russian", "Serbian/Croatian/Bosnian", "Sinhala", "Somali", "Spanish", "Swahili", "Swedish", "Tagalog", "Tamil", "Telugu", "Thai", "Turkish", "Turkmen", "Ukrainian", "Urdu", "Uzbek", "Vietnamese", "Yoruba", "Zulu"

    Using comprehensive seed data from: https://huggingface.co/datasets/Josephgflowers/finance_curriculum_topics

    Covering 7,794 financial topic.

4. Quantitative & Analytical Calibration

  • Secondary fine-tune on tabular financial reasoning (FinQA, LIMO, Cortex-1).
  • Reinforced structured arithmetic steps and explanation fidelity.

5. Evaluation & Bench Testing

  • Benchmarked against prior Phi-mini reasoning models and base Llama-3.1-8B:
    Task Metric Base 8B FinR1-8B
    Spreadsheet conversion Structural accuracy 0.74 0.98
    Financial difference calc Numerical correctness 0.67 1.00
    Instruction following Pass rate 0.81 0.96
    Multilingual finance QA F1 (avg 10 langs) 0.61 0.89
    Example benchmark:
    • Prompt: โ€œCompare POS vs. Online Store total sales.โ€
    • Output: precise arithmetic (ฮ” = $1,821,466.27) with clear step-by-step reasoning.
    • Confidence: high, no rounding drift.

โš™๏ธ Model Capabilities

  • Financial Data Interpretation
    Extracts and summarizes structured tables, spreadsheets, and ledgers.
  • Analytical Reasoning
    Performs step-wise quantitative comparisons and explains calculations.
  • Instruction Following
    Adheres strictly to user/system directives in chain-of-thought or tagged format.
  • Multilingual QA
    Responds natively in 60+ languages with localized financial terminology.
  • Structured Outputs
    Supports JSON, CSV, or XML reasoning output for integration with RAG and pipelines.

๐Ÿงฉ Example Usage

System Prompt


You are a multilingual financial reasoning assistant.
Explain your reasoning step by step using <think>...</think> tags.

Input


Sales channel data:
POS total: $2,075,743.54
Online total: $254,277.27
How much more is POS doing?

Output


<think>
To find the difference, subtract Online total from POS total:
2,075,743.54 - 254,277.27 = 1,821,466.27
</think>
The Point of Sale channel generated $1,821,466.27 more than the Online Store.

๐Ÿงฎ Testing Summary

Recent limited evaluation tests by GPT 5 showed:

  • 99% structural accuracy on table reconstruction tasks
  • Error rate <1% on numerical difference queries
  • High cross-lingual consistency โ€” identical reasoning structure reproduced in Arabic, French, and Uzbek
  • No instruction degradation after long-context (8โ€“10 k tokens) sequences

The modelโ€™s reasoning outputs mirror the structure defined in Pollinations dataset generation scripts used in development.


๐Ÿ”ง Technical Details

Parameter Value
Base Llama-3.1-8B-Instruct
Architecture 8 B parameters
Context Length 16 k
Precision bfloat16 / 4-bit LoRA compatible
License llama3.1
Author Joseph G. Flowers
Framework Hugging Face Transformers + TRL / Unsloth

๐Ÿš€ Usage Example (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Josephgflowers/FinR1-llama-8b-multi-language-thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

prompt = """You are a financial assistant. Use <think>...</think> to explain your steps.
Sales increased from $500K to $650K. What is the percentage growth?"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@model{josephgflowers2025finr1llama8b,
  title={FinR1-llama-8b-multi-language-thinking},
  author={Joseph G. Flowers},
  year={2025},
  url={https://huggingface.co/Josephgflowers/FinR1-llama-8b-multi-language-thinking}
}

๐Ÿงญ Notes

Additional multilingual finance datasets (v2 + v3) and extended Gemini-filtered reasoning traces will be uploaded soon to support reproducibility and expansion for FinR2. Future plans include dataset release under the Finance-Reasoning-Hub collection for structured evaluation across reasoning, translation, and quantitative accuracy.


Downloads last month
24
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Josephgflowers/FinR1-llama-8b-multi-language-thinking

Finetuned
(2012)
this model
Quantizations
2 models

Datasets used to train Josephgflowers/FinR1-llama-8b-multi-language-thinking