Update README.md
Browse files
README.md
CHANGED
|
@@ -144,3 +144,12 @@ The model classifies documents into this hierarchy:
|
|
| 144 |
| **M&A and Legal** | **Debt Information** | **Investment Vehicle** |
|
| 145 |
| :--- | :--- | :--- |
|
| 146 |
| • M&A Activity<br>• Legal Proceedings Report | • Capital/Financing Update<br>• Interest Rate Notice | • Net Asset Value (NAV)<br>• Fund Factsheet |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 144 |
| **M&A and Legal** | **Debt Information** | **Investment Vehicle** |
|
| 145 |
| :--- | :--- | :--- |
|
| 146 |
| • M&A Activity<br>• Legal Proceedings Report | • Capital/Financing Update<br>• Interest Rate Notice | • Net Asset Value (NAV)<br>• Fund Factsheet |
|
| 147 |
+
|
| 148 |
+
## 📚 Training Data
|
| 149 |
+
|
| 150 |
+
The model was trained on a proprietary **Golden Dataset of 27,671 financial filings**, manually curated to represent the diverse landscape of global corporate reporting.
|
| 151 |
+
|
| 152 |
+
* **Source:** Real-world filings from listed companies across **Europe (primary focus)**, North America, and Asia.
|
| 153 |
+
* **Multilingual:** Includes documents in English, French, German, and other major European languages (leveraging the multilingual capabilities of Jina-V3).
|
| 154 |
+
* **Diversity:** The dataset preserves the natural "long-tail" distribution of financial data, ranging from massive 500+ page **Annual Reports** to single-page **Press Releases** and complex **ESG Disclosures**.
|
| 155 |
+
* **Quality Control:** Mapped to a strict 2-level hierarchy to resolve semantic ambiguities common in regulatory filings (e.g., distinguishing a *Share Buyback* announcement from a *Director's Dealing* notification).
|