Proooof
/

Finance_NLP_Toolkit

+---
+language: en
+license: apache-2.0
+library_name: transformers
+tags:
+- finance
+- nlp
+- sentiment-analysis
+- token-classification
+- ner
+- transformers
+pipeline_tag: text-classification
+task_categories:
+- text-classification
+- token-classification
+---
+# 💹 Finance NLP Toolkit
+**Finance NLP Toolkit** is a practical starter pack for analyzing financial text with Transformers.
+It supports two core tasks:
+1) **Sentiment Analysis** — positive / neutral / negative market tone
+2) **Named Entity Recognition (NER)** — companies, tickers, money, dates, etc.
+This repository includes:
+- Ready-to-run **inference snippets**
+- **Training scripts** for fine-tuning on your datasets
+- Label mapping examples and utilities
+> **Note:** Initial release ships training + inference scaffolding.
+> Plug in your dataset and fine-tune, or point to an existing finance model.
+---
+## 🚀 Quickstart (inference)
+Install deps:
+```bash
+pip install -r requirements.txt
+Sentiment:
+from transformers import pipeline
+sentiment = pipeline(
+    "sentiment-analysis",
+    model="YOUR-USERNAME/Finance-NLP-Toolkit",   # after you push your fine-tuned weights
+    tokenizer="YOUR-USERNAME/Finance-NLP-Toolkit"
+)
+print(sentiment("The company reported record profits and raised guidance."))
+NER:
+from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
+tok = AutoTokenizer.from_pretrained("YOUR-USERNAME/Finance-NLP-Toolkit", revision="ner")
+ner_model = AutoModelForTokenClassification.from_pretrained("YOUR-USERNAME/Finance-NLP-Toolkit", revision="ner")
+ner = pipeline("token-classification", model=ner_model, tokenizer=tok, aggregation_strategy="simple")
+print(ner("Apple Inc. reported a $10 billion revenue increase in Q2 2025."))
+Tip: Use branches to host multiple checkpoints in one repo:
+main → sentiment
+ner → NER model
+Push each set of weights to its respective branch.
+🧠 Training
+Sentiment (3-class)
+python training/train_sentiment.py \
+  --model_name distilbert-base-uncased \
+  --train_csv /path/train.csv \
+  --eval_csv /path/valid.csv \
+  --text_col text --label_col label \
+  --output_dir ./outputs/sentiment \
+  --epochs 3 --batch_size 16 --lr 5e-5
+NER (BIO tags)
+python training/train_ner.py \
+  --model_name bert-base-cased \
+  --train_json /path/train.jsonl \
+  --eval_json /path/valid.jsonl \
+  --text_col tokens --label_col ner_tags \
+  --labels_file training/labels_ner.json \
+  --output_dir ./outputs/ner \
+  --epochs 5 --batch_size 8 --lr 3e-5
+After training, push weights to the repo (e.g., git push origin main for sentiment and git push origin ner for NER).
+📊 Expected outputs
+Sentiment:
+[{'label': 'POSITIVE', 'score': 0.98}]
+NER:
+[
+  {'entity_group': 'ORG', 'word': 'Apple Inc.', 'score': 0.99},
+  {'entity_group': 'MONEY', 'word': '$10 billion', 'score': 0.99},
+  {'entity_group': 'DATE', 'word': 'Q2 2025', 'score': 0.98}
+]
+⚠️ Limitations
+English focus; domain shift may reduce accuracy
+Sarcasm/idioms can confound sentiment
+NER needs domain labels for best performance
+📜 License
+Apache-2.0