tphage
/

BeamPERL

Text Generation

reinforcement-learning

structural-engineering

Model card Files Files and versions

tphage commited on Feb 27

Commit

7fcbf30

·

verified ·

1 Parent(s): dcc338f

Add model card

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -23,14 +23,14 @@ pipeline_tag: text-generation
 | Property | Value |
 |---|---|
-| Base model | `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` |
 | Fine-tuning method | GRPO (RL) + LoRA (PEFT) |
 | LoRA rank / alpha | 32 / 128 |
 | LoRA dropout | 0.05 |
 | LoRA target modules | q, k, v, o, gate, up, down projections |
 | Training precision | bfloat16 |
 | Max sequence length | 2048 tokens (256 prompt + 1792 completion) |
-| Training dataset | `beamrl_train` (synthetic beam mechanics QA) |
 ### Reward Functions
@@ -47,7 +47,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
 model = AutoModelForCausalLM.from_pretrained("tphage/BeamPERL", torch_dtype="bfloat16", device_map="auto")
 tokenizer = AutoTokenizer.from_pretrained("tphage/BeamPERL")
-prompt = "A simply supported beam of length 6 m carries a point load of 10 kN at its midspan. What are the reaction forces at the supports?"
 messages = [{"role": "user", "content": prompt}]
 inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)

 | Property | Value |
 |---|---|
+| Base model | `tphage/DeepSeek-R1-Distill-Qwen-1.5B` |
 | Fine-tuning method | GRPO (RL) + LoRA (PEFT) |
 | LoRA rank / alpha | 32 / 128 |
 | LoRA dropout | 0.05 |
 | LoRA target modules | q, k, v, o, gate, up, down projections |
 | Training precision | bfloat16 |
 | Max sequence length | 2048 tokens (256 prompt + 1792 completion) |
+| Training dataset | `tphage/BeamRL-TrainData` (synthetic beam mechanics QA) |
 ### Reward Functions
 model = AutoModelForCausalLM.from_pretrained("tphage/BeamPERL", torch_dtype="bfloat16", device_map="auto")
 tokenizer = AutoTokenizer.from_pretrained("tphage/BeamPERL")
+prompt = "Determine the reaction forces at the pin support (x=0.0*L) and the roller support (x=9.0*L) for a statically loaded beam with a length of 9*L, a point load of -13*P at x=3.0*L, and supports at x=0.0*L (pin) and x=9.0*L (roller)."
 messages = [{"role": "user", "content": prompt}]
 inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)