File size: 9,789 Bytes
f49e498
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0e5d6ed
f49e498
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
---
license: apache-2.0
language:
  - en
base_model: facebook/MobileLLM-R1-950M
datasets:
  - Ayansk11/FinSenti-Dataset
pipeline_tag: text-generation
library_name: transformers
tags:
  - finance
  - financial-sentiment
  - sentiment-analysis
  - chain-of-thought
  - reasoning
  - grpo
  - sft
  - lora
  - finsenti
---
# FinSenti-MobileLLM-R1-950M

FinSenti-MobileLLM-R1-950M is a 0.9B-parameter model fine-tuned to
read short financial text (headlines, earnings snippets, market commentary)
and explain its read of them before settling on positive, negative, or
neutral. It's Meta's purpose-built mobile model. The architecture is shaped for on-device inference (compact embeddings, untied lm_head, shared attention layers) and FinSenti's recipe lifts the financial-sentiment quality without changing that footprint.

The model is part of the [FinSenti
collection](https://huggingface.co/collections/Ayansk11/finsenti), a
scaling study of small models trained on the same data with the same recipe.

## What it's good at

- Classifying short financial text (1-3 sentences) into positive / negative
  / neutral
- Producing a short reasoning chain you can read or log
- Following a strict `<reasoning>...</reasoning><answer>...</answer>` output
  format that's easy to parse downstream

It was trained on news-style headlines and earnings snippets in English, so
that's where it shines. Outside that domain you'll see the format hold up
but the labels get noisier.

## How it was trained

Two-stage recipe, same across the whole FinSenti family:

1. **SFT** on the SFT train slice from the [FinSenti
   dataset](https://huggingface.co/datasets/Ayansk11/FinSenti-Dataset)
   (~15.2K balanced training samples, drawn from a
   50.8K-sample pool with held-out val/test splits, chain-of-thought
   targets generated by a teacher model and filtered for label agreement).
   This stage took about 1.6 hours on a single A100 80GB
   for this model.
2. **GRPO** with four reward functions (sentiment correctness, format
   compliance, reasoning quality, output consistency), each weighted equally
   for a maximum reward of 4.0. The training budget was 3000
   steps with early stopping; the best checkpoint landed near step
   ~200 with a mean reward of approximately
   **3.29 / 4.0** on the validation slice.

Trainer stack: PEFT + bitsandbytes (no Unsloth - llama4_text arch unsupported), using Unsloth's pre-quantized mirror
[`facebook/MobileLLM-R1-950M`](https://huggingface.co/facebook/MobileLLM-R1-950M) as the
loading shortcut for the upstream
[`facebook/MobileLLM-R1-950M`](https://huggingface.co/facebook/MobileLLM-R1-950M)
weights. LoRA adapters (r=16, alpha=32) were
trained on the attention and MLP projection layers, then merged into the
base weights before export, so this repo is a self-contained model and
doesn't need PEFT to load.

## Quick start

Standard `transformers` usage:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Ayansk11/FinSenti-MobileLLM-R1-950M"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

system = (
    "You are a financial sentiment analyst. For each headline you receive, "
    "write a short reasoning chain inside <reasoning>...</reasoning> tags, "
    "then give a single label inside <answer>...</answer> tags. The label "
    "must be exactly one of: positive, negative, neutral."
)
user = "Apple beats Q4 estimates as iPhone sales jump 12% year over year."

messages = [
    {"role": "system", "content": system},
    {"role": "user", "content": user},
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
```

Expected output (your reasoning text will vary; the label should match):

```
<reasoning>
Beating estimates is a positive earnings surprise. A 12% YoY iPhone sales jump in the company's biggest product line points to demand strength. Both signals push the read positive.
</reasoning>
<answer>positive</answer>
```

## Prompt format

The model expects the system prompt above, verbatim is best. The user turn
is the headline or short snippet you want classified. Output is two XML-ish
blocks in this order: `<reasoning>...</reasoning>` then
`<answer>...</answer>`. The `<answer>` content is one of `positive`,
`negative`, or `neutral` (lowercase, no punctuation).

If you want labels only and don't care about the reasoning, you can stop
generation as soon as you see `</answer>` to save tokens.

## Performance notes

The training reward (max 4.0) hit **3.29** on the
held-out validation slice. That breaks down across the four reward
functions roughly as:

- Sentiment correctness: dominant contributor; the model gets the label
  right on the validation split most of the time
- Format compliance: near-saturated by the end of GRPO; the model almost
  always produces well-formed `<reasoning>` and `<answer>` tags
- Reasoning quality: judged on length and presence of finance-relevant
  signal words; this one's the noisiest of the four
- Consistency: rewards stable labels across paraphrases of the same headline

Numbers on standard finance benchmarks (FPB, FiQA, Twitter Financial News)
are forthcoming and will be added once the eval pipeline lands.

## Hardware

At bf16 the weights are about 1.8 GB on disk and need ~3 GB of GPU memory for batch=1 inference. CPU inference is fine too: on a modern laptop you'll get a few tokens per second with the bf16 weights, and 15-30 tok/s with the GGUF Q4_K_M build.

## Limitations

A few things this model isn't built for:

- **Long documents.** Training context was capped at 2048
  tokens. Anything much longer than a few paragraphs is out of distribution.
- **Multi-asset reasoning.** It classifies the sentiment of a single piece
  of text. It won't aggregate across multiple headlines or weigh sources.
- **Numerical reasoning.** It can read "beats by 12%" and call that
  positive, but it isn't doing math. Don't ask it to forecast.
- **Languages other than English.** Training data was English only.
- **Background knowledge.** If the headline needs you to know what a
  company does, the model only has whatever was in its base pretraining.
  It can't look anything up.
- **Three labels, hard cutoffs.** The output space is positive / negative /
  neutral. If you need a 5-class scale or a continuous score, you'll need
  to retrain or post-process.

## Training details

| | |
|---|---|
| Upstream base model | [facebook/MobileLLM-R1-950M](https://huggingface.co/facebook/MobileLLM-R1-950M) |
| Loading mirror | [facebook/MobileLLM-R1-950M](https://huggingface.co/facebook/MobileLLM-R1-950M) (Unsloth's pre-quantized copy) |
| Dataset | [Ayansk11/FinSenti-Dataset](https://huggingface.co/datasets/Ayansk11/FinSenti-Dataset) (~15.2K train per stage, 50.8K total across splits) |
| SFT length | ~1.6 hours on A100 80GB |
| GRPO budget | 3000 steps with early stopping (best near step ~200) |
| Best GRPO reward | ~3.29 / 4.0 |
| Adapter | LoRA (r=16, alpha=32) on q/k/v/o/gate/up/down projections |
| Sequence length | 2048 |
| Optimizer | AdamW (8-bit), cosine LR schedule |
| Hardware | NVIDIA A100 80GB (Indiana University BigRed200 cluster) |
| Frameworks | PEFT + bitsandbytes (no Unsloth - llama4_text arch unsupported) |

## Related FinSenti models

Other sizes and bases trained with the same recipe:

- **Qwen3**: [Qwen3-0.6B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-0.6B), [Qwen3-1.7B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-1.7B), [Qwen3-4B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-4B), [Qwen3-8B](https://huggingface.co/Ayansk11/FinSenti-Qwen3-8B)
- **Qwen3.5**: [Qwen3.5-0.8B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-0.8B), [Qwen3.5-2B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-2B), [Qwen3.5-4B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-4B), [Qwen3.5-9B](https://huggingface.co/Ayansk11/FinSenti-Qwen3.5-9B)
- **DeepSeek**: [DeepSeek-R1-1.5B](https://huggingface.co/Ayansk11/FinSenti-DeepSeek-R1-1.5B)
- **Tiny-LLM**: [Tiny-LLM-10M](https://huggingface.co/Ayansk11/FinSenti-Tiny-LLM-10M)
- **Llama-3**: [Llama-3.2-1B](https://huggingface.co/Ayansk11/FinSenti-Llama-3.2-1B)
- **SmolLM**: [SmolLM-1.7B](https://huggingface.co/Ayansk11/FinSenti-SmolLM-1.7B)

There's a GGUF build of this same model at
[Ayansk11/FinSenti-MobileLLM-R1-950M-GGUF](https://huggingface.co/Ayansk11/FinSenti-MobileLLM-R1-950M-GGUF) for Ollama and
llama.cpp, and the dataset itself is at
[Ayansk11/FinSenti-Dataset](https://huggingface.co/datasets/Ayansk11/FinSenti-Dataset).

If you're picking a size, a rough guide:

- **Need it on a phone or browser?** Look at the smallest model in the
  group (Qwen3-0.6B) or its GGUF.
- **Laptop with no GPU?** Any model up to ~2B as Q4_K_M GGUF works.
- **Single 8-12 GB GPU?** The 1.5B-4B sizes are the sweet spot.
- **Server or workstation?** The 8B / 9B variants give the best reasoning
  but need the memory.

## Citation

If you use this model in research, please cite:

```bibtex
@misc{shaikh2026finsenti,
  title  = {FinSenti: Small Language Models for Financial Sentiment with Chain-of-Thought Reasoning},
  author = {Shaikh, Ayan},
  year   = {2026},
  url    = {https://huggingface.co/collections/Ayansk11/finsenti},
  note   = {Indiana University}
}
```

## License

Apache 2.0, same as the base model.

## Acknowledgements

Trained on the Indiana University BigRed200 cluster.
Thanks to the Unsloth and TRL teams for the trainer stack, and to the
Qwen / DeepSeek teams for the base models.