silashundhausen commited on
Commit
ada3adb
·
verified ·
1 Parent(s): 131a90d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md CHANGED
@@ -144,3 +144,12 @@ The model classifies documents into this hierarchy:
144
  | **M&A and Legal** | **Debt Information** | **Investment Vehicle** |
145
  | :--- | :--- | :--- |
146
  | • M&A Activity<br>• Legal Proceedings Report | • Capital/Financing Update<br>• Interest Rate Notice | • Net Asset Value (NAV)<br>• Fund Factsheet |
 
 
 
 
 
 
 
 
 
 
144
  | **M&A and Legal** | **Debt Information** | **Investment Vehicle** |
145
  | :--- | :--- | :--- |
146
  | • M&A Activity<br>• Legal Proceedings Report | • Capital/Financing Update<br>• Interest Rate Notice | • Net Asset Value (NAV)<br>• Fund Factsheet |
147
+
148
+ ## 📚 Training Data
149
+
150
+ The model was trained on a proprietary **Golden Dataset of 27,671 financial filings**, manually curated to represent the diverse landscape of global corporate reporting.
151
+
152
+ * **Source:** Real-world filings from listed companies across **Europe (primary focus)**, North America, and Asia.
153
+ * **Multilingual:** Includes documents in English, French, German, and other major European languages (leveraging the multilingual capabilities of Jina-V3).
154
+ * **Diversity:** The dataset preserves the natural "long-tail" distribution of financial data, ranging from massive 500+ page **Annual Reports** to single-page **Press Releases** and complex **ESG Disclosures**.
155
+ * **Quality Control:** Mapped to a strict 2-level hierarchy to resolve semantic ambiguities common in regulatory filings (e.g., distinguishing a *Share Buyback* announcement from a *Director's Dealing* notification).