Chinese Crypto News Importance Scoring Model | 中文加密货币新闻重要性评分模型 (v1.1)

模型描述 | Model Description

本模型基于 LocalOptimum/chinese-crypto-sentiment 进行 LoRA 微调，专门用于评估中文加密货币新闻的“市场重要性”，而不是传统的情感极性。

模型采用双头结构，同时输出：

importance_score：0-100 连续分数，用于衡量新闻对市场的潜在影响
importance_bin：4 档区间分类，分别为 noise / low / medium / high

它要回答的问题是：这条新闻是否值得交易员、研究员或自动化新闻流优先关注，而不只是判断文本是利好还是利空。

This model is LoRA fine-tuned from LocalOptimum/chinese-crypto-sentiment for Chinese cryptocurrency news importance scoring rather than plain sentiment classification. It outputs both a continuous score and a 4-way importance bin for ranking and filtering workflows.

训练数据 | Training Data

数据量 | Size: 20286 条中文加密货币新闻样本 | 20286 Chinese crypto news samples
数据来源 | Source: EventAlpha / WatchTower 采集的 19729 条新闻 + 557 条推文 | 19729 news articles + 557 tweets collected via EventAlpha / WatchTower
标注方式 | Labeling: 自动四维评分管线 + 规则修正 | 4-axis automatic scoring pipeline with rule-based cleanup
划分方式 | Split: 随机划分，训练集 17243 / 验证集 3043 | Random split with 17243 train and 3043 validation samples
平均分数 | Average Score: 41.7

标注维度 | Scoring Axes

Axis	Range	Description
Market Reaction	0-40	Post-news price move, volume expansion, and volatility reaction
Novelty	0-30	Whether the item is first-hand, repeated, or part of a digest
Content Quality	0-20	Information density, numeric detail, token relevance, and noise penalties
Source Authority	0-10	Credibility of the outlet, platform, and whether it is official

数据分布 | Label Distribution

Bin	Score Range	Count	Share	含义 / Interpretation
`noise`	0-25	1626	8.0%	Low-signal, duplicate, digest, or weakly relevant content
`low`	25-50	14773	72.8%	Routine updates that rarely move the market on their own
`medium`	50-75	3840	18.9%	Tradeable developments with meaningful but limited impact
`high`	75-100	47	0.2%	Major events that may materially change price or risk appetite

性能指标 | Performance Metrics

当前公开版本在验证集上的表现如下：

指标 Metric	数值 Value
MAE	6.87
Bin Accuracy	61.8%
Pearson r	0.532
Best Epoch	4

分数解释 | Score Interpretation

Bin	Score Range	典型含义
`noise`	0-25	摘要类、弱相关信息、重复快讯、低信号内容
`low`	25-50	常规更新、普通运营动作、主观评论、有限催化
`medium`	50-75	有交易意义的重要进展，但未必足以改变大趋势
`high`	75-100	黑客攻击、ETF 获批、重大监管变化、系统性风险事件

使用方法 | Usage

方式一：加载完整双头模型（推荐） | Option 1: load the full dual-head model

这种方式可以同时得到 importance_score 和 importance_bin。

import __main__
import sys
import torch
from huggingface_hub import snapshot_download
from transformers import AutoTokenizer

repo_id = "LocalOptimum/chinese-crypto-importance"
local_dir = snapshot_download(repo_id)
sys.path.insert(0, local_dir)

from model import NewsImportanceModel

__main__.NewsImportanceModel = NewsImportanceModel

tokenizer = AutoTokenizer.from_pretrained(local_dir)
model = torch.load(f"{local_dir}/model.pt", map_location="cpu", weights_only=False)
model.eval()

text = "美国现货以太坊 ETF 获批"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    logits, score = model(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        token_type_ids=inputs.get("token_type_ids"),
    )
    probs = torch.softmax(logits, dim=-1)[0]
    labels = ["noise", "low", "medium", "high"]
    importance_bin = labels[probs.argmax().item()]
    importance_score = score.item() * 100

print(importance_bin)
print(round(importance_score, 1))

方式二：仅使用 HuggingFace 分类头 | Option 2: use the classification head only

这种方式兼容 pipeline("text-classification")，但只能直接输出 4 档分类，不包含连续分数。

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

repo_id = "LocalOptimum/chinese-crypto-importance"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)

pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
print(pipe("比特币突破关键阻力位并创下阶段新高"))

训练配置 | Training Configuration

基础模型 | Base Model: LocalOptimum/chinese-crypto-sentiment
模型结构 | Architecture: BERT backbone + classification head + regression head
最大长度 | Max Length: 256
训练轮数 | Epochs: 10（Early Stopping patience=3，最佳 epoch=4）
批次大小 | Batch Size: 16
学习率 | Learning Rate: 2e-5
LoRA: r=16, alpha=32, dropout=0.05
损失函数 | Loss: 0.6 * cross_entropy + 0.4 * mse
混合精度 | Mixed Precision: FP16

适用场景 | Use Cases

加密货币新闻优先级排序
实时快讯过滤与告警降噪
研究员 / 交易员新闻流预筛选
回测与研究中的事件权重特征构建
市场重大事件回溯分析

核心标注原则 | Annotation Principles

重要性不等于情绪：利好和利空都可能是高重要性
优先看市场反应，再结合新颖度、内容质量和来源可信度
重复快讯、摘要汇总、弱相关宏观噪声会被系统性降分
官方公告、重大安全事件、ETF / 监管突破通常更高分
主观观点和常规运营更新通常落在 low 或 noise

局限性 | Limitations

数据分布明显偏向 low，当前版本对高重要性事件仍偏保守
high 样本较少，模型对极端高分事件的区分能力仍有提升空间
主要适用于中文加密货币新闻，跨领域泛化能力有限
HuggingFace 原生 pipeline 只暴露分类头；连续分数需要加载 model.pt
标签来自自动评分管线与规则修正，不等同于大规模人工金融标注

许可证 | License

Apache-2.0

引用 | Citation

如果你在研究或产品中使用本模型，可以引用：

@misc{onefly_crypto_importance_2026,
  title={Chinese Crypto News Importance Scoring Model},
  author={Onefly},
  year={2026},
  howpublished={\url{https://huggingface.co/LocalOptimum/chinese-crypto-importance}},
  note={LoRA fine-tuned from LocalOptimum/chinese-crypto-sentiment, 20286 samples, MAE=6.87, BinAcc=61.8%}
}

基础模型 | Base Model

本模型基于以下模型继续训练：

LocalOptimum/chinese-crypto-sentiment

更新日志 | Changelog

当前公开版本 | Current Public Version

首个公开的重要性评分模型版本
支持双头输出：连续重要性分数 + 4 档重要性分类
基于 20286 条中文加密货币新闻样本完成训练
当前验证指标：MAE=6.87，Bin Accuracy=61.8%，Pearson r=0.532

如有问题或建议，欢迎提 issue 或 PR。

Downloads last month: 16

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for LocalOptimum/chinese-crypto-importance

Base model

LocalOptimum/chinese-crypto-sentiment

Finetuned

(1)

this model

Evaluation results

MAE
self-reported

6.870
Bin Accuracy
self-reported

61.8%
Pearson r
self-reported

0.532