tphage commited on
Commit
dcc338f
·
verified ·
1 Parent(s): 1046353

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
4
+ tags:
5
+ - reinforcement-learning
6
+ - grpo
7
+ - peft
8
+ - lora
9
+ - beam-mechanics
10
+ - structural-engineering
11
+ - math
12
+ - reasoning
13
+ language:
14
+ - en
15
+ pipeline_tag: text-generation
16
+ ---
17
+
18
+ # BeamPERL — DeepSeek-R1-Distill-Qwen-1.5B
19
+
20
+ **BeamPERL** is a parameter-efficient, reinforcement-learning fine-tuned language model specialized in beam mechanics problem-solving. It is built on top of [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) using LoRA adapters trained with Group Relative Policy Optimization (GRPO) and verifiable reward signals.
21
+
22
+ ## Model Details
23
+
24
+ | Property | Value |
25
+ |---|---|
26
+ | Base model | `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` |
27
+ | Fine-tuning method | GRPO (RL) + LoRA (PEFT) |
28
+ | LoRA rank / alpha | 32 / 128 |
29
+ | LoRA dropout | 0.05 |
30
+ | LoRA target modules | q, k, v, o, gate, up, down projections |
31
+ | Training precision | bfloat16 |
32
+ | Max sequence length | 2048 tokens (256 prompt + 1792 completion) |
33
+ | Training dataset | `beamrl_train` (synthetic beam mechanics QA) |
34
+
35
+ ### Reward Functions
36
+
37
+ | Reward | Weight | Description |
38
+ |---|---|---|
39
+ | Accuracy | 0.667 | Correctness of predicted reaction forces / coefficients |
40
+ | Format | 0.333 | Requires reasoning in `<think>` tags and answer in `\boxed{}` |
41
+
42
+ ## Usage
43
+
44
+ ```python
45
+ from transformers import AutoModelForCausalLM, AutoTokenizer
46
+
47
+ model = AutoModelForCausalLM.from_pretrained("tphage/BeamPERL", torch_dtype="bfloat16", device_map="auto")
48
+ tokenizer = AutoTokenizer.from_pretrained("tphage/BeamPERL")
49
+
50
+ prompt = "A simply supported beam of length 6 m carries a point load of 10 kN at its midspan. What are the reaction forces at the supports?"
51
+
52
+ messages = [{"role": "user", "content": prompt}]
53
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
54
+
55
+ outputs = model.generate(inputs, max_new_tokens=1792, temperature=0.6)
56
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
57
+ ```
58
+
59
+ The model reasons step-by-step inside `<think>...</think>` tags and gives its final answer in `\boxed{...}` format.
60
+
61
+ ## Training
62
+
63
+ LoRA adapters were trained using GRPO via the [BeamPERL framework](https://github.com/tphage/BeamPERL) on a synthetic dataset of beam mechanics questions generated with the SymBeam library. The base model weights were kept frozen throughout training.
64
+
65
+ ## Citation
66
+
67
+ ```bibtex
68
+ @misc{hage2025beamperl,
69
+ title={BeamPERL: Parameter-Efficient Reinforcement Learning for Verifiable Beam Mechanics Problem-Solving},
70
+ author={Tarjei P. Hage and Markus J. Buehler},
71
+ year={2025},
72
+ archivePrefix={arXiv},
73
+ primaryClass={cs.CL}
74
+ }
75
+ ```
76
+
77
+ ## Acknowledgements
78
+
79
+ Built upon [Tina](https://arxiv.org/abs/2504.15777) and [Open R1](https://github.com/huggingface/open-r1). Dataset generation uses a custom version of [SymBeam](https://github.com/amcc1996/symbeam).