FinancialReports
/

hierarchical-filing-classifier

@@ -145,6 +145,15 @@ The model classifies documents into this hierarchy:
 | :--- | :--- | :--- |
 | • M&A Activity<br>• Legal Proceedings Report | • Capital/Financing Update<br>• Interest Rate Notice | • Net Asset Value (NAV)<br>• Fund Factsheet |
 ## 📚 Training Data
 The model was trained on a proprietary **Golden Dataset of 27,671 financial filings**, manually curated to represent the diverse landscape of global corporate reporting.
@@ -153,3 +162,28 @@ The model was trained on a proprietary **Golden Dataset of 27,671 financial fili
 * **Multilingual:** Includes documents in English, French, German, and other major European languages (leveraging the multilingual capabilities of Jina-V3).
 * **Diversity:** The dataset preserves the natural "long-tail" distribution of financial data, ranging from massive 500+ page **Annual Reports** to single-page **Press Releases** and complex **ESG Disclosures**.
 * **Quality Control:** Mapped to a strict 2-level hierarchy to resolve semantic ambiguities common in regulatory filings (e.g., distinguishing a *Share Buyback* announcement from a *Director's Dealing* notification).

 | :--- | :--- | :--- |
 | • M&A Activity<br>• Legal Proceedings Report | • Capital/Financing Update<br>• Interest Rate Notice | • Net Asset Value (NAV)<br>• Fund Factsheet |
+## 📜 The Standard: Financial Reporting Classification Framework (FRCF)
+The taxonomy used by this model is based on the **[Financial Reporting Classification Framework (FRCF)](https://financialreports.eu/financial-reporting-classification-framework/)**, an open-source standard designed to organize corporate disclosures in a consistent, cross-jurisdictional format.
+Unlike fragmented regulatory schemes, the FRCF organizes disclosures by **functional purpose**, ensuring comparability across markets (e.g., mapping a US *10-K* and a European *Annual Financial Report* to the same standardized `Annual Report` category).
+* **[Explore the Framework](https://financialreports.eu/financial-reporting-classification-framework/)**
+* **[Download Methodology (PDF)](https://financialreports.eu/download/frcf-methodology.pdf)**
 ## 📚 Training Data
 The model was trained on a proprietary **Golden Dataset of 27,671 financial filings**, manually curated to represent the diverse landscape of global corporate reporting.
 * **Multilingual:** Includes documents in English, French, German, and other major European languages (leveraging the multilingual capabilities of Jina-V3).
 * **Diversity:** The dataset preserves the natural "long-tail" distribution of financial data, ranging from massive 500+ page **Annual Reports** to single-page **Press Releases** and complex **ESG Disclosures**.
 * **Quality Control:** Mapped to a strict 2-level hierarchy to resolve semantic ambiguities common in regulatory filings (e.g., distinguishing a *Share Buyback* announcement from a *Director's Dealing* notification).
+## ⚙️ Deployment & Hardware
+This model is optimized for **GPU Inference** due to the heavy 8192-token context window of the Jina encoder. While CPU inference is possible, it is significantly slower.
+### Recommended Configuration
+| Component | Recommendation | Notes |
+| :--- | :--- | :--- |
+| **GPU** | **NVIDIA T4 (16GB)** | The "Sweet Spot" for cost/performance. Capable of ~50 docs/sec in batch mode. |
+| **Alternative** | NVIDIA L4 / A10 | Recommended for high-concurrency production APIs. |
+| **VRAM** | 16 GB Minimum | Required to embed long documents without OOM errors. |
+| **System RAM** | 16 GB+ | Standard requirement for PyTorch + XGBoost overhead. |
+### Critical Environment Settings
+To load the underlying Jina-V3 model, you **must** allow remote code execution in your environment variables (Docker, Kubernetes, or Hugging Face Endpoints):
+```bash
+HF_TRUST_REMOTE_CODE=True
+```
+### Throughput Benchmarks (T4 GPU)
+* **Live API Latency:** ~200ms – 500ms per document.
+* **Batch Processing:** ~40 – 50 documents per second (Batch Size: 64).