Ayansk11 commited on
Commit
f49e498
·
verified ·
1 Parent(s): 07bd760

Update model card

Browse files
Files changed (1) hide show
  1. README.md +226 -0
README.md ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model: facebook/MobileLLM-R1-950M
6
+ datasets:
7
+ - Ayansk11/FinSenti-Dataset
8
+ pipeline_tag: text-generation
9
+ library_name: transformers
10
+ tags:
11
+ - finance
12
+ - financial-sentiment
13
+ - sentiment-analysis
14
+ - chain-of-thought
15
+ - reasoning
16
+ - grpo
17
+ - sft
18
+ - lora
19
+ - finsenti
20
+ ---
21
+ # FinSenti-MobileLLM-R1-950M
22
+
23
+ FinSenti-MobileLLM-R1-950M is a 0.9B-parameter model fine-tuned to
24
+ read short financial text (headlines, earnings snippets, market commentary)
25
+ and explain its read of them before settling on positive, negative, or
26
+ neutral. It's Meta's purpose-built mobile model. The architecture is shaped for on-device inference (compact embeddings, untied lm_head, shared attention layers) and FinSenti's recipe lifts the financial-sentiment quality without changing that footprint.
27
+
28
+ The model is part of the [FinSenti
29
+ collection](https://huggingface.co/collections/Ayansk11/finsenti), a
30
+ scaling study of small models trained on the same data with the same recipe.
31
+
32
+ ## What it's good at
33
+
34
+ - Classifying short financial text (1-3 sentences) into positive / negative
35
+ / neutral
36
+ - Producing a short reasoning chain you can read or log
37
+ - Following a strict `<reasoning>...</reasoning><answer>...</answer>` output
38
+ format that's easy to parse downstream
39
+
40
+ It was trained on news-style headlines and earnings snippets in English, so
41
+ that's where it shines. Outside that domain you'll see the format hold up
42
+ but the labels get noisier.
43
+
44
+ ## How it was trained
45
+
46
+ Two-stage recipe, same across the whole FinSenti family:
47
+
48
+ 1. **SFT** on the SFT train slice from the [FinSenti
49
+ dataset](https://huggingface.co/datasets/Ayansk11/FinSenti-Dataset)
50
+ (~15.2K balanced training samples, drawn from a
51
+ 50.8K-sample pool with held-out val/test splits, chain-of-thought
52
+ targets generated by a teacher model and filtered for label agreement).
53
+ This stage took about 1.6 hours on a single A100 80GB
54
+ for this model.
55
+ 2. **GRPO** with four reward functions (sentiment correctness, format
56
+ compliance, reasoning quality, output consistency), each weighted equally
57
+ for a maximum reward of 4.0. The training budget was 3000
58
+ steps with early stopping; the best checkpoint landed near step
59
+ ~200 with a mean reward of approximately
60
+ **3.29 / 4.0** on the validation slice.
61
+
62
+ Trainer stack: PEFT + bitsandbytes (no Unsloth - llama4_text arch unsupported), using Unsloth's pre-quantized mirror
63
+ [`facebook/MobileLLM-R1-950M`](https://huggingface.co/facebook/MobileLLM-R1-950M) as the
64
+ loading shortcut for the upstream
65
+ [`facebook/MobileLLM-R1-950M`](https://huggingface.co/facebook/MobileLLM-R1-950M)
66
+ weights. LoRA adapters (r=16, alpha=32) were
67
+ trained on the attention and MLP projection layers, then merged into the
68
+ base weights before export, so this repo is a self-contained model and
69
+ doesn't need PEFT to load.
70
+
71
+ ## Quick start
72
+
73
+ Standard `transformers` usage:
74
+
75
+ ```python
76
+ from transformers import AutoModelForCausalLM, AutoTokenizer
77
+ import torch
78
+
79
+ model_id = "Ayansk11/FinSenti-MobileLLM-R1-950M"
80
+ tok = AutoTokenizer.from_pretrained(model_id)
81
+ model = AutoModelForCausalLM.from_pretrained(
82
+ model_id, torch_dtype=torch.bfloat16, device_map="auto"
83
+ )
84
+
85
+ system = (
86
+ "You are a financial sentiment analyst. For each headline you receive, "
87
+ "write a short reasoning chain inside <reasoning>...</reasoning> tags, "
88
+ "then give a single label inside <answer>...</answer> tags. The label "
89
+ "must be exactly one of: positive, negative, neutral."
90
+ )
91
+ user = "Apple beats Q4 estimates as iPhone sales jump 12% year over year."
92
+
93
+ messages = [
94
+ {"role": "system", "content": system},
95
+ {"role": "user", "content": user},
96
+ ]
97
+ prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
98
+
99
+ inputs = tok(prompt, return_tensors="pt").to(model.device)
100
+ out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
101
+ print(tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
102
+ ```
103
+
104
+ Expected output (your reasoning text will vary; the label should match):
105
+
106
+ ```
107
+ <reasoning>
108
+ Beating estimates is a positive earnings surprise. A 12% YoY iPhone sales jump in the company's biggest product line points to demand strength. Both signals push the read positive.
109
+ </reasoning>
110
+ <answer>positive</answer>
111
+ ```
112
+
113
+ ## Prompt format
114
+
115
+ The model expects the system prompt above, verbatim is best. The user turn
116
+ is the headline or short snippet you want classified. Output is two XML-ish
117
+ blocks in this order: `<reasoning>...</reasoning>` then
118
+ `<answer>...</answer>`. The `<answer>` content is one of `positive`,
119
+ `negative`, or `neutral` (lowercase, no punctuation).
120
+
121
+ If you want labels only and don't care about the reasoning, you can stop
122
+ generation as soon as you see `</answer>` to save tokens.
123
+
124
+ ## Performance notes
125
+
126
+ The training reward (max 4.0) hit **3.29** on the
127
+ held-out validation slice. That breaks down across the four reward
128
+ functions roughly as:
129
+
130
+ - Sentiment correctness: dominant contributor; the model gets the label
131
+ right on the validation split most of the time
132
+ - Format compliance: near-saturated by the end of GRPO; the model almost
133
+ always produces well-formed `<reasoning>` and `<answer>` tags
134
+ - Reasoning quality: judged on length and presence of finance-relevant
135
+ signal words; this one's the noisiest of the four
136
+ - Consistency: rewards stable labels across paraphrases of the same headline
137
+
138
+ Numbers on standard finance benchmarks (FPB, FiQA, Twitter Financial News)
139
+ are forthcoming and will be added once the eval pipeline lands.
140
+
141
+ ## Hardware
142
+
143
+ At bf16 the weights are about 1.8 GB on disk and need ~3 GB of GPU memory for batch=1 inference. CPU inference is fine too: on a modern laptop you'll get a few tokens per second with the bf16 weights, and 15-30 tok/s with the GGUF Q4_K_M build.
144
+
145
+ ## Limitations
146
+
147
+ A few things this model isn't built for:
148
+
149
+ - **Long documents.** Training context was capped at 2048
150
+ tokens. Anything much longer than a few paragraphs is out of distribution.
151
+ - **Multi-asset reasoning.** It classifies the sentiment of a single piece
152
+ of text. It won't aggregate across multiple headlines or weigh sources.
153
+ - **Numerical reasoning.** It can read "beats by 12%" and call that
154
+ positive, but it isn't doing math. Don't ask it to forecast.
155
+ - **Languages other than English.** Training data was English only.
156
+ - **Background knowledge.** If the headline needs you to know what a
157
+ company does, the model only has whatever was in its base pretraining.
158
+ It can't look anything up.
159
+ - **Three labels, hard cutoffs.** The output space is positive / negative /
160
+ neutral. If you need a 5-class scale or a continuous score, you'll need
161
+ to retrain or post-process.
162
+
163
+ ## Training details
164
+
165
+ | | |
166
+ |---|---|
167
+ | Upstream base model | [facebook/MobileLLM-R1-950M](https://huggingface.co/facebook/MobileLLM-R1-950M) |
168
+ | Loading mirror | [facebook/MobileLLM-R1-950M](https://huggingface.co/facebook/MobileLLM-R1-950M) (Unsloth's pre-quantized copy) |
169
+ | Dataset | [Ayansk11/FinSenti-Dataset](https://huggingface.co/datasets/Ayansk11/FinSenti-Dataset) (~15.2K train per stage, 50.8K total across splits) |
170
+ | SFT length | ~1.6 hours on A100 80GB |
171
+ | GRPO budget | 3000 steps with early stopping (best near step ~200) |
172
+ | Best GRPO reward | ~3.29 / 4.0 |
173
+ | Adapter | LoRA (r=16, alpha=32) on q/k/v/o/gate/up/down projections |
174
+ | Sequence length | 2048 |
175
+ | Optimizer | AdamW (8-bit), cosine LR schedule |
176
+ | Hardware | NVIDIA A100 80GB (Indiana University BigRed200 cluster) |
177
+ | Frameworks | PEFT + bitsandbytes (no Unsloth - llama4_text arch unsupported) |
178
+
179
+ ## Related FinSenti models
180
+
181
+ Other sizes and bases trained with the same recipe:
182
+
183
+ - **Qwen3**: [Qwen3-0.6B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-0.6B), [Qwen3-1.7B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-1.7B), [Qwen3-4B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-4B), [Qwen3-8B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-8B)
184
+ - **Qwen3.5**: [Qwen3.5-0.8B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-0.8B), [Qwen3.5-2B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-2B), [Qwen3.5-4B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-4B), [Qwen3.5-9B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-9B)
185
+ - **DeepSeek**: [DeepSeek-R1-1.5B](https://huggingface.co/Ayansk11/FinSenti-DeepSeek-R1-1.5B)
186
+ - **Tiny-LLM**: [Tiny-LLM-10M](https://huggingface.co/Ayansk11/FinSenti-Tiny-LLM-10M)
187
+ - **Llama-3**: [Llama-3.2-1B](https://huggingface.co/Ayansk11/FinSenti-Llama-3.2-1B)
188
+ - **SmolLM**: [SmolLM-1.7B](https://huggingface.co/Ayansk11/FinSenti-SmolLM-1.7B)
189
+
190
+ There's a GGUF build of this same model at
191
+ [Ayansk11/FinSenti-MobileLLM-R1-950M-GGUF](https://huggingface.co/Ayansk11/FinSenti-MobileLLM-R1-950M-GGUF) for Ollama and
192
+ llama.cpp, and the dataset itself is at
193
+ [Ayansk11/FinSenti-Dataset](https://huggingface.co/datasets/Ayansk11/FinSenti-Dataset).
194
+
195
+ If you're picking a size, a rough guide:
196
+
197
+ - **Need it on a phone or browser?** Look at the smallest model in the
198
+ group (Qwen3-0.6B) or its GGUF.
199
+ - **Laptop with no GPU?** Any model up to ~2B as Q4_K_M GGUF works.
200
+ - **Single 8-12 GB GPU?** The 1.5B-4B sizes are the sweet spot.
201
+ - **Server or workstation?** The 8B / 9B variants give the best reasoning
202
+ but need the memory.
203
+
204
+ ## Citation
205
+
206
+ If you use this model in research, please cite:
207
+
208
+ ```bibtex
209
+ @misc{shaikh2026finsenti,
210
+ title = {FinSenti: Small Language Models for Financial Sentiment with Chain-of-Thought Reasoning},
211
+ author = {Shaikh, Ayan},
212
+ year = {2026},
213
+ url = {https://huggingface.co/collections/Ayansk11/finsenti},
214
+ note = {Indiana University}
215
+ }
216
+ ```
217
+
218
+ ## License
219
+
220
+ Apache 2.0, same as the base model.
221
+
222
+ ## Acknowledgements
223
+
224
+ Trained on the Indiana University BigRed200 cluster (account `r01510`).
225
+ Thanks to the Unsloth and TRL teams for the trainer stack, and to the
226
+ Qwen / DeepSeek teams for the base models.