|
--- |
|
language: en |
|
license: other |
|
tags: |
|
- finance |
|
- risk-relation |
|
- retrieval |
|
- encoder |
|
- feature-extraction |
|
- stock-prediction |
|
pipeline_tag: feature-extraction |
|
--- |
|
|
|
# Financial Risk Identification through Dual-view Adaptation — Encoder |
|
|
|
This repository hosts the pretrained encoder from the work **“Financial Risk Identification through Dual-view Adaptation.”** |
|
The model is designed to uncover **inter-firm risk relations** from financial text, supporting downstream tasks such as **retrieval**, **relation mining**, and **stock-signal experiments** where relation strength acts as a feature. |
|
|
|
> **Files** |
|
> - `pytorch_model.safetensors` — model weights |
|
> - `config.json` — model configuration |
|
> - `README.md` (this file) |
|
|
|
--- |
|
|
|
## ✨ What’s special (Dual-view Adaptation) |
|
|
|
The model aligns two complementary “views” of firm relations and adapts them during training: |
|
|
|
- **Lexical view (`lex`)** — focuses on token/phrase-level and domain terms common in 10-K and financial news. |
|
- **Temporal view (`time`)** — encourages stability/consistency of relations across reporting periods and evolving events. |
|
|
|
A **two-view combination (“Best”)** integrates both signals and yields stronger retrieval quality and more stable risk-relation estimates. Ablations (`lex`, `time`) are also supported for analysis. |
|
|
|
--- |
|
|
|
## 🔧 Intended Use |
|
|
|
- **Feature extraction / sentence embeddings** for paragraphs, sections, or documents in financial filings. |
|
- **Retrieval & ranking**: compute similarities between queries (e.g., “supply chain risk for X”) and candidate passages. |
|
- **Risk-relation estimation**: aggregate cross-document similarities to produce pairwise firm relation scores used in downstream analytics. |
|
|
|
> ⚠️ Not a generative LLM. Use it as an **encoder** (feature extractor). |
|
|
|
--- |
|
|
|
## 🚀 Quickstart (Transformers) |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
MODEL_ID = "william0816/Dual_View_Financial_Encoder" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True) |
|
model = AutoModel.from_pretrained(MODEL_ID) |
|
|
|
def mean_pool(last_hidden_state, attention_mask): |
|
# Mean-pool w.r.t. the attention mask |
|
mask = attention_mask.unsqueeze(-1).type_as(last_hidden_state) |
|
summed = (last_hidden_state * mask).sum(dim=1) |
|
counts = torch.clamp(mask.sum(dim=1), min=1e-9) |
|
return summed / counts |
|
|
|
texts = [ |
|
"The company faces supplier concentration risk due to a single-source vendor.", |
|
"Management reported foreign exchange exposure impacting Q4 margins." |
|
] |
|
|
|
enc = tokenizer(texts, padding=True, truncation=True, return_tensors="pt") |
|
with torch.no_grad(): |
|
outputs = model(**enc) |
|
embeddings = mean_pool(outputs.last_hidden_state, enc["attention_mask"]) |
|
|
|
# Cosine similarity for retrieval |
|
emb_norm = torch.nn.functional.normalize(embeddings, p=2, dim=1) |
|
similarity = emb_norm @ emb_norm.T |
|
print(similarity) |
|
``` |
|
|
|
## 🖇️ Citation |
|
If you use this model or the dual-view methodology, please cite: |
|
```bibtex |
|
@misc{financial_risk_dualview_2025, |
|
title = {Financial Risk Identification through Dual-view Adaptation}, |
|
author = {Chiu, Wei-Ning and collaborators}, |
|
year = {2025}, |
|
note = {Preprint/Project}, |
|
howpublished = {\url{https://huggingface.co/william0816/Dual_View_Financial_Encoder}} |
|
} |
|
|