--- language: en license: apache-2.0 library_name: transformers tags: - finance - nlp - sentiment-analysis - token-classification - ner - transformers pipeline_tag: text-classification task_categories: - text-classification - token-classification --- # 💹 Finance NLP Toolkit **Finance NLP Toolkit** is a practical starter pack for analyzing financial text with Transformers. It supports two core tasks: 1) **Sentiment Analysis** — positive / neutral / negative market tone 2) **Named Entity Recognition (NER)** — companies, tickers, money, dates, etc. This repository includes: - Ready-to-run **inference snippets** - **Training scripts** for fine-tuning on your datasets - Label mapping examples and utilities > **Note:** Initial release ships training + inference scaffolding. > Plug in your dataset and fine-tune, or point to an existing finance model. --- ## 🚀 Quickstart (inference) Install deps: ```bash pip install -r requirements.txt Sentiment: from transformers import pipeline sentiment = pipeline( "sentiment-analysis", model="Proooof/Finance-NLP-Toolkit", # after you push your fine-tuned weights tokenizer="Proooof/Finance-NLP-Toolkit" ) print(sentiment("The company reported record profits and raised guidance.")) NER: from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline tok = AutoTokenizer.from_pretrained("YOUR-USERNAME/Finance-NLP-Toolkit", revision="ner") ner_model = AutoModelForTokenClassification.from_pretrained("YOUR-USERNAME/Finance-NLP-Toolkit", revision="ner") ner = pipeline("token-classification", model=ner_model, tokenizer=tok, aggregation_strategy="simple") print(ner("Apple Inc. reported a $10 billion revenue increase in Q2 2025.")) Tip: Use branches to host multiple checkpoints in one repo: main → sentiment ner → NER model Push each set of weights to its respective branch. 🧠 Training Sentiment (3-class) python training/train_sentiment.py \ --model_name distilbert-base-uncased \ --train_csv /path/train.csv \ --eval_csv /path/valid.csv \ --text_col text --label_col label \ --output_dir ./outputs/sentiment \ --epochs 3 --batch_size 16 --lr 5e-5 NER (BIO tags) python training/train_ner.py \ --model_name bert-base-cased \ --train_json /path/train.jsonl \ --eval_json /path/valid.jsonl \ --text_col tokens --label_col ner_tags \ --labels_file training/labels_ner.json \ --output_dir ./outputs/ner \ --epochs 5 --batch_size 8 --lr 3e-5 After training, push weights to the repo (e.g., git push origin main for sentiment and git push origin ner for NER). 📊 Expected outputs Sentiment: [{'label': 'POSITIVE', 'score': 0.98}] NER: [ {'entity_group': 'ORG', 'word': 'Apple Inc.', 'score': 0.99}, {'entity_group': 'MONEY', 'word': '$10 billion', 'score': 0.99}, {'entity_group': 'DATE', 'word': 'Q2 2025', 'score': 0.98} ] ⚠️ Limitations English focus; domain shift may reduce accuracy Sarcasm/idioms can confound sentiment NER needs domain labels for best performance 📜 License Apache-2.0