Proooof commited on
Commit
189d315
ยท
verified ยท
1 Parent(s): effe413

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -3
README.md CHANGED
@@ -1,3 +1,115 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ library_name: transformers
5
+ tags:
6
+ - finance
7
+ - nlp
8
+ - sentiment-analysis
9
+ - token-classification
10
+ - ner
11
+ - transformers
12
+ pipeline_tag: text-classification
13
+ task_categories:
14
+ - text-classification
15
+ - token-classification
16
+ ---
17
+
18
+ # ๐Ÿ’น Finance NLP Toolkit
19
+
20
+ **Finance NLP Toolkit** is a practical starter pack for analyzing financial text with Transformers.
21
+ It supports two core tasks:
22
+
23
+ 1) **Sentiment Analysis** โ€” positive / neutral / negative market tone
24
+ 2) **Named Entity Recognition (NER)** โ€” companies, tickers, money, dates, etc.
25
+
26
+ This repository includes:
27
+ - Ready-to-run **inference snippets**
28
+ - **Training scripts** for fine-tuning on your datasets
29
+ - Label mapping examples and utilities
30
+
31
+ > **Note:** Initial release ships training + inference scaffolding.
32
+ > Plug in your dataset and fine-tune, or point to an existing finance model.
33
+
34
+ ---
35
+
36
+ ## ๐Ÿš€ Quickstart (inference)
37
+
38
+ Install deps:
39
+ ```bash
40
+ pip install -r requirements.txt
41
+
42
+ Sentiment:
43
+
44
+ from transformers import pipeline
45
+ sentiment = pipeline(
46
+ "sentiment-analysis",
47
+ model="YOUR-USERNAME/Finance-NLP-Toolkit", # after you push your fine-tuned weights
48
+ tokenizer="YOUR-USERNAME/Finance-NLP-Toolkit"
49
+ )
50
+ print(sentiment("The company reported record profits and raised guidance."))
51
+
52
+ NER:
53
+
54
+ from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
55
+ tok = AutoTokenizer.from_pretrained("YOUR-USERNAME/Finance-NLP-Toolkit", revision="ner")
56
+ ner_model = AutoModelForTokenClassification.from_pretrained("YOUR-USERNAME/Finance-NLP-Toolkit", revision="ner")
57
+ ner = pipeline("token-classification", model=ner_model, tokenizer=tok, aggregation_strategy="simple")
58
+ print(ner("Apple Inc. reported a $10 billion revenue increase in Q2 2025."))
59
+
60
+ Tip: Use branches to host multiple checkpoints in one repo:
61
+
62
+ main โ†’ sentiment
63
+
64
+ ner โ†’ NER model
65
+ Push each set of weights to its respective branch.
66
+
67
+ ๐Ÿง  Training
68
+ Sentiment (3-class)
69
+ python training/train_sentiment.py \
70
+ --model_name distilbert-base-uncased \
71
+ --train_csv /path/train.csv \
72
+ --eval_csv /path/valid.csv \
73
+ --text_col text --label_col label \
74
+ --output_dir ./outputs/sentiment \
75
+ --epochs 3 --batch_size 16 --lr 5e-5
76
+
77
+ NER (BIO tags)
78
+ python training/train_ner.py \
79
+ --model_name bert-base-cased \
80
+ --train_json /path/train.jsonl \
81
+ --eval_json /path/valid.jsonl \
82
+ --text_col tokens --label_col ner_tags \
83
+ --labels_file training/labels_ner.json \
84
+ --output_dir ./outputs/ner \
85
+ --epochs 5 --batch_size 8 --lr 3e-5
86
+
87
+
88
+ After training, push weights to the repo (e.g., git push origin main for sentiment and git push origin ner for NER).
89
+
90
+ ๐Ÿ“Š Expected outputs
91
+
92
+ Sentiment:
93
+
94
+ [{'label': 'POSITIVE', 'score': 0.98}]
95
+
96
+
97
+ NER:
98
+
99
+ [
100
+ {'entity_group': 'ORG', 'word': 'Apple Inc.', 'score': 0.99},
101
+ {'entity_group': 'MONEY', 'word': '$10 billion', 'score': 0.99},
102
+ {'entity_group': 'DATE', 'word': 'Q2 2025', 'score': 0.98}
103
+ ]
104
+
105
+ โš ๏ธ Limitations
106
+
107
+ English focus; domain shift may reduce accuracy
108
+
109
+ Sarcasm/idioms can confound sentiment
110
+
111
+ NER needs domain labels for best performance
112
+
113
+ ๐Ÿ“œ License
114
+
115
+ Apache-2.0