|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- finance |
|
|
- nlp |
|
|
- sentiment-analysis |
|
|
- token-classification |
|
|
- ner |
|
|
- transformers |
|
|
pipeline_tag: text-classification |
|
|
task_categories: |
|
|
- text-classification |
|
|
- token-classification |
|
|
--- |
|
|
|
|
|
# πΉ Finance NLP Toolkit |
|
|
|
|
|
**Finance NLP Toolkit** is a practical starter pack for analyzing financial text with Transformers. |
|
|
It supports two core tasks: |
|
|
|
|
|
1) **Sentiment Analysis** β positive / neutral / negative market tone |
|
|
2) **Named Entity Recognition (NER)** β companies, tickers, money, dates, etc. |
|
|
|
|
|
This repository includes: |
|
|
- Ready-to-run **inference snippets** |
|
|
- **Training scripts** for fine-tuning on your datasets |
|
|
- Label mapping examples and utilities |
|
|
|
|
|
> **Note:** Initial release ships training + inference scaffolding. |
|
|
> Plug in your dataset and fine-tune, or point to an existing finance model. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Quickstart (inference) |
|
|
|
|
|
Install deps: |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
|
|
|
Sentiment: |
|
|
|
|
|
from transformers import pipeline |
|
|
sentiment = pipeline( |
|
|
"sentiment-analysis", |
|
|
model="Proooof/Finance-NLP-Toolkit", # after you push your fine-tuned weights |
|
|
tokenizer="Proooof/Finance-NLP-Toolkit" |
|
|
) |
|
|
print(sentiment("The company reported record profits and raised guidance.")) |
|
|
|
|
|
NER: |
|
|
|
|
|
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline |
|
|
tok = AutoTokenizer.from_pretrained("YOUR-USERNAME/Finance-NLP-Toolkit", revision="ner") |
|
|
ner_model = AutoModelForTokenClassification.from_pretrained("YOUR-USERNAME/Finance-NLP-Toolkit", revision="ner") |
|
|
ner = pipeline("token-classification", model=ner_model, tokenizer=tok, aggregation_strategy="simple") |
|
|
print(ner("Apple Inc. reported a $10 billion revenue increase in Q2 2025.")) |
|
|
|
|
|
Tip: Use branches to host multiple checkpoints in one repo: |
|
|
|
|
|
main β sentiment |
|
|
|
|
|
ner β NER model |
|
|
Push each set of weights to its respective branch. |
|
|
|
|
|
π§ Training |
|
|
Sentiment (3-class) |
|
|
python training/train_sentiment.py \ |
|
|
--model_name distilbert-base-uncased \ |
|
|
--train_csv /path/train.csv \ |
|
|
--eval_csv /path/valid.csv \ |
|
|
--text_col text --label_col label \ |
|
|
--output_dir ./outputs/sentiment \ |
|
|
--epochs 3 --batch_size 16 --lr 5e-5 |
|
|
|
|
|
NER (BIO tags) |
|
|
python training/train_ner.py \ |
|
|
--model_name bert-base-cased \ |
|
|
--train_json /path/train.jsonl \ |
|
|
--eval_json /path/valid.jsonl \ |
|
|
--text_col tokens --label_col ner_tags \ |
|
|
--labels_file training/labels_ner.json \ |
|
|
--output_dir ./outputs/ner \ |
|
|
--epochs 5 --batch_size 8 --lr 3e-5 |
|
|
|
|
|
|
|
|
After training, push weights to the repo (e.g., git push origin main for sentiment and git push origin ner for NER). |
|
|
|
|
|
π Expected outputs |
|
|
|
|
|
Sentiment: |
|
|
|
|
|
[{'label': 'POSITIVE', 'score': 0.98}] |
|
|
|
|
|
|
|
|
NER: |
|
|
|
|
|
[ |
|
|
{'entity_group': 'ORG', 'word': 'Apple Inc.', 'score': 0.99}, |
|
|
{'entity_group': 'MONEY', 'word': '$10 billion', 'score': 0.99}, |
|
|
{'entity_group': 'DATE', 'word': 'Q2 2025', 'score': 0.98} |
|
|
] |
|
|
|
|
|
β οΈ Limitations |
|
|
|
|
|
English focus; domain shift may reduce accuracy |
|
|
|
|
|
Sarcasm/idioms can confound sentiment |
|
|
|
|
|
NER needs domain labels for best performance |
|
|
|
|
|
π License |
|
|
|
|
|
Apache-2.0 |
|
|
|