AryanNsc commited on
Commit
93358de
·
verified ·
1 Parent(s): 66ba121

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +163 -3
README.md CHANGED
@@ -1,3 +1,163 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen3-4B
4
+ library_name: transformers
5
+ tags:
6
+ - qwen3
7
+ - lora
8
+ - fine-tuned
9
+ - chain-of-thought
10
+ - structured-output
11
+ - json
12
+ - reasoning
13
+ - pos
14
+ - order-parsing
15
+ pipeline_tag: text-generation
16
+ language:
17
+ - en
18
+ ---
19
+
20
+ # STRM-4B-v1: Stateful Reasoning Model
21
+
22
+ A LoRA fine-tune of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) trained on distilled chain-of-thought data for parsing unstructured, spoken-language input into structured JSON.
23
+
24
+ The model maintains running state while processing sequential instructions -- handling corrections ("scratch that"), cancellations ("actually nevermind"), quantity changes ("bump that to 3"), and modifier adjustments in a single forward pass.
25
+
26
+ **~94% exact-match accuracy** averaged across evaluation domains.
27
+
28
+ ## Quick Start
29
+
30
+ ```python
31
+ from transformers import AutoModelForCausalLM, AutoTokenizer
32
+
33
+ model_id = "AryanNsc/strm-4b-v1"
34
+
35
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
36
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
37
+
38
+ messages = [
39
+ {"role": "system", "content": "You are an expert POS agent. Parse the order step by step, then output JSON."},
40
+ {"role": "user", "content": "uh can I get a venti latte with oat milk and a blueberry muffin actually scratch that muffin make it an avocado toast"},
41
+ ]
42
+
43
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=True)
44
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
45
+ output = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, do_sample=True)
46
+ print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
47
+ ```
48
+
49
+ ### Expected Output
50
+
51
+ The model reasons inside `<think>` tags, then outputs structured JSON:
52
+
53
+ ```
54
+ <think>
55
+ Processing the order step by step:
56
+
57
+ 1. "venti latte with oat milk" -> Latte, Venti, Oat Milk
58
+ - Base: $4.50 + Venti: $1.00 + Oat Milk: $0.80 = $6.30
59
+
60
+ 2. "a blueberry muffin" -> Blueberry Muffin
61
+ - $3.75
62
+
63
+ 3. "actually scratch that muffin" -> Cancel the Blueberry Muffin
64
+
65
+ 4. "make it an avocado toast" -> Avocado Toast
66
+ - $7.00
67
+
68
+ Running total: $6.30 + $7.00 = $13.30
69
+ </think>
70
+ {"items": [{"name": "Latte", "size": "Venti", "quantity": 1, "modifiers": ["Oat Milk"]}, {"name": "Avocado Toast", "size": null, "quantity": 1, "modifiers": []}], "total_price": 13.30}
71
+ ```
72
+
73
+ ## Intended Use
74
+
75
+ STRM is designed for tasks that require **stateful sequential reasoning** -- processing a stream of instructions where later instructions modify earlier state. Primary use cases:
76
+
77
+ - **Point-of-sale order parsing** -- spoken coffee shop, restaurant, or retail orders with corrections and modifications
78
+ - **Grocery checkout / inventory** -- item additions, removals, quantity changes with running totals
79
+ - **Banking transactions** -- sequential operations with balance tracking
80
+ - **Bill splitting** -- multi-party calculations with adjustments
81
+ - **Any domain** where input arrives sequentially and includes corrections to prior state
82
+
83
+ ## How It Works
84
+
85
+ The model is trained with **distilled thinking** -- each training example includes explicit step-by-step reasoning inside `<think>` tags before the final JSON output. This teaches the model to:
86
+
87
+ 1. **Parse sequentially** -- process input phrase by phrase, not all at once
88
+ 2. **Track mutable state** -- maintain a running list of items/entities that gets updated with each action
89
+ 3. **Handle corrections** -- "scratch that", "remove that", "actually nevermind" modify tracked state rather than restarting
90
+ 4. **Show arithmetic** -- every price calculation is written out step by step, reducing computation errors
91
+ 5. **Output valid JSON** -- clean structured output after reasoning is complete
92
+
93
+ Training data spans multiple domains with weighted sampling, so the model learns the general skill of stateful reasoning rather than memorizing domain-specific patterns.
94
+
95
+ ## Training Details
96
+
97
+ | Parameter | Value |
98
+ |-----------|-------|
99
+ | Base model | [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) |
100
+ | Method | LoRA |
101
+ | LoRA rank (r) | 64 |
102
+ | LoRA alpha | 64 |
103
+ | LoRA dropout | 0.0 |
104
+ | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
105
+ | Quantization | 4-bit NF4 (training only; weights are merged to 16-bit) |
106
+ | Max sequence length | 4096 |
107
+ | Learning rate | 2e-4 |
108
+ | LR scheduler | Cosine with 5% warmup |
109
+ | Weight decay | 0.01 |
110
+ | Epochs | 3 |
111
+ | Per-device batch size | 2 |
112
+ | Gradient accumulation | 4 (effective batch size: 8) |
113
+ | Precision | bf16 |
114
+ | Seed | 42 |
115
+
116
+ ### Training Data
117
+
118
+ The model was trained on multi-domain distilled chain-of-thought data. Each example consists of a system prompt, a user input, and an assistant response containing `<think>...</think>` reasoning followed by structured JSON. Domains include coffee shop orders, restaurant orders, grocery checkout, banking, inventory, bill splitting, recipe scaling, scheduling, budget tracking, and unit conversion -- with coffee-domain examples upsampled for the primary use case.
119
+
120
+ ## Evaluation
121
+
122
+ Benchmarked on held-out labeled data across difficulty tiers:
123
+
124
+ | Difficulty | Description |
125
+ |------------|-------------|
126
+ | Easy | 1-2 items, no corrections |
127
+ | Medium | 2-3 items, some modifiers |
128
+ | Hard | Multiple items with cancellations or quantity bumps |
129
+ | Nightmare | 4+ items with mixed corrections, modifier removals, and re-additions |
130
+
131
+ The model achieves **~94% exact-match accuracy** averaged across domains, where exact match requires both the item list (names, sizes, quantities, modifiers) and total price to be completely correct.
132
+
133
+ ### Metrics Reported
134
+
135
+ - **Exact match** -- items + price both fully correct
136
+ - **Items match** -- all items correct regardless of price
137
+ - **Price match** -- total within $0.01 tolerance
138
+ - **Per-field** -- names, sizes, quantities, modifiers evaluated independently
139
+
140
+ ## Usage Tips
141
+
142
+ - **Use `enable_thinking=True`** in `apply_chat_template` -- the model was trained to reason inside `<think>` tags before outputting JSON
143
+ - **Temperature 0.6** works well for most inputs; use **temperature 0** (greedy) for maximum consistency
144
+ - **Max tokens 2048** is sufficient for most orders; nightmare-level inputs with 5+ items may need more
145
+ - The JSON output appears **after** the `</think>` closing tag -- parse everything after that delimiter
146
+ - The model handles filler words (uh, um, like, literally) natively -- no need to preprocess
147
+
148
+ ## Limitations
149
+
150
+ - Trained primarily on English-language input
151
+ - Price arithmetic can occasionally drift on very long orders (6+ items with many modifiers)
152
+ - The model expects a system prompt describing the menu/domain; without it, output format may be inconsistent
153
+ - Not designed for multi-turn conversation -- each inference is a single order
154
+
155
+ ## Training Code
156
+
157
+ The full training and evaluation code is open source:
158
+
159
+ [github.com/Guney-olu/strm-model](https://github.com/Guney-olu/strm-model)
160
+
161
+ ## License
162
+
163
+ Apache 2.0