File size: 3,061 Bytes
189d315 20f38f8 189d315 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
---
language: en
license: apache-2.0
library_name: transformers
tags:
- finance
- nlp
- sentiment-analysis
- token-classification
- ner
- transformers
pipeline_tag: text-classification
task_categories:
- text-classification
- token-classification
---
# πΉ Finance NLP Toolkit
**Finance NLP Toolkit** is a practical starter pack for analyzing financial text with Transformers.
It supports two core tasks:
1) **Sentiment Analysis** β positive / neutral / negative market tone
2) **Named Entity Recognition (NER)** β companies, tickers, money, dates, etc.
This repository includes:
- Ready-to-run **inference snippets**
- **Training scripts** for fine-tuning on your datasets
- Label mapping examples and utilities
> **Note:** Initial release ships training + inference scaffolding.
> Plug in your dataset and fine-tune, or point to an existing finance model.
---
## π Quickstart (inference)
Install deps:
```bash
pip install -r requirements.txt
Sentiment:
from transformers import pipeline
sentiment = pipeline(
"sentiment-analysis",
model="Proooof/Finance-NLP-Toolkit", # after you push your fine-tuned weights
tokenizer="Proooof/Finance-NLP-Toolkit"
)
print(sentiment("The company reported record profits and raised guidance."))
NER:
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
tok = AutoTokenizer.from_pretrained("YOUR-USERNAME/Finance-NLP-Toolkit", revision="ner")
ner_model = AutoModelForTokenClassification.from_pretrained("YOUR-USERNAME/Finance-NLP-Toolkit", revision="ner")
ner = pipeline("token-classification", model=ner_model, tokenizer=tok, aggregation_strategy="simple")
print(ner("Apple Inc. reported a $10 billion revenue increase in Q2 2025."))
Tip: Use branches to host multiple checkpoints in one repo:
main β sentiment
ner β NER model
Push each set of weights to its respective branch.
π§ Training
Sentiment (3-class)
python training/train_sentiment.py \
--model_name distilbert-base-uncased \
--train_csv /path/train.csv \
--eval_csv /path/valid.csv \
--text_col text --label_col label \
--output_dir ./outputs/sentiment \
--epochs 3 --batch_size 16 --lr 5e-5
NER (BIO tags)
python training/train_ner.py \
--model_name bert-base-cased \
--train_json /path/train.jsonl \
--eval_json /path/valid.jsonl \
--text_col tokens --label_col ner_tags \
--labels_file training/labels_ner.json \
--output_dir ./outputs/ner \
--epochs 5 --batch_size 8 --lr 3e-5
After training, push weights to the repo (e.g., git push origin main for sentiment and git push origin ner for NER).
π Expected outputs
Sentiment:
[{'label': 'POSITIVE', 'score': 0.98}]
NER:
[
{'entity_group': 'ORG', 'word': 'Apple Inc.', 'score': 0.99},
{'entity_group': 'MONEY', 'word': '$10 billion', 'score': 0.99},
{'entity_group': 'DATE', 'word': 'Q2 2025', 'score': 0.98}
]
β οΈ Limitations
English focus; domain shift may reduce accuracy
Sarcasm/idioms can confound sentiment
NER needs domain labels for best performance
π License
Apache-2.0
|