Ruinius
/

tiger-transformer

 ---
+language:
+- en
+license: apache-2.0
+tags:
+- financial-analysis
+- transformer
+- classification
+- finbert
+- financial-statements
+base_model: yiyanghkust/finbert-pretrain
+model-index:
+- name: tiger-transformer
+  results: []
+---
+# Tiger Transformer (Standardizing Financial Statements)
+This model is a fine-tuned version of [yiyanghkust/finbert-pretrain](https://huggingface.co/yiyanghkust/finbert-pretrain) designed to standardize financial statement line items from Balance Sheets and Income Statements into a unified schema.
+## Model Description
+The **Tiger Transformer** serves as a specialized classification engine for financial analysis AI agents. It addresses the inconsistency found in broad-purpose LLMs when mapping diverse, raw line items (e.g., "Cash & Equivalents", "Cash and due from banks") to standardized accounting categories.
+### Key Features:
+- **Context-Aware Classification**: Unlike simple keyword matching, this model uses a context window of 2 lines before and 2 lines after the target line to refine predictions.
+- **Architecture**: Fine-tuned `BertForSequenceClassification` using the FinBERT base.
+- **Quantization Support**: A quantized version (`pytorch_model_quantized.pt`) is available for low-latency CPU inference.
+## Intended Uses & Limitations
+### Intended Use
+Standardizing raw line items extracted from 10-K, 10-Q, and other financial reports into a consistent format for downstream financial modeling (DCF, ROIC analysis, etc.).
+### Training Data Strategy
+The model was trained on a painstakingly curated dataset of manually cleaned financial statement labels. To maximize performance on a niche dataset, the model utilizes all available high-quality labels for training, with validation performed iteratively against new unseen batches.
+### Performance
+- **Accuracy**: 90-95% on modern financial reports.
+- **Robustness**: High accuracy on critical fields (Subtotals and Totals), which are essential for structural validation.
+- **Limitations**: Accuracy may decrease for companies in highly specialized industries or niche regions with non-standard terminology not present in the training set.
+## Training Procedure
+### Input Format
+The model expects input strings formatted with surrounding context:
+`[PREV_2] [PREV_1] [SECTION] [RAW_NAME] [NEXT_1] [NEXT_2]`
+*   `[SECTION]`: Balance Sheet or Income Statement.
+*   `[RAW_NAME]`: The line item name to be classified.
+*   `[PREV/NEXT]`: Surrounding line items providing structural context.
+### Hyperparameters
+- **Base Model**: FinBERT
+- **Quantization**: Dynamic quantization (int8) applied to Linear layers for optimized CPU performance.
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("Ruinius/tiger-transformer")
+model = AutoModelForSequenceClassification.from_pretrained("Ruinius/tiger-transformer")
+# Example input with context
+text = "Cash and Short-term Investments [SEP] Cash and Equivalents [SEP] Balance Sheet [SEP] Accounts Receivable [SEP] Inventory [SEP] Prepaid Expenses"
+inputs = tokenizer(text, return_tensors="pt")
+with torch.no_grad():
+    logits = model(**inputs).logits
+    predicted_class_id = logits.argmax().item()
+# Map ID back to label using model.config.id2label
+```
+## Acknowledgments & Licensing
+This project is a fine-tuned version of the FinBERT-Pretrain model developed by Yang et al. (HKUST).
+Licensed under the **Apache License 2.0**. Same as the base FinBERT model.