Text Generation
PEFT
Safetensors
English
operations-research
optimization
linear-programming
integer-programming
lora
dpo
conversational
Instructions to use billwang37/WWang-Lab-OptimAI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use billwang37/WWang-Lab-OptimAI with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("/scratch/billwang37/optim_ai/models/Qwen2.5-Math-7B") model = PeftModel.from_pretrained(base_model, "billwang37/WWang-Lab-OptimAI") - Notebooks
- Google Colab
- Kaggle
File size: 3,498 Bytes
7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad 7edeeef 6fa0aad | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | ---
license: apache-2.0
base_model: Qwen/Qwen2.5-Math-7B
tags:
- operations-research
- optimization
- linear-programming
- integer-programming
- lora
- dpo
- peft
language:
- en
library_name: peft
pipeline_tag: text-generation
---
# OptimAI
A 7B model fine-tuned to formulate and solve operations research problems (LP, IP, network flow, queueing, stochastic optimization).
Built on Qwen2.5-Math-7B with a two-stage training pipeline: supervised fine-tuning (SFT) followed by Direct Preference Optimization (DPO). Distributed as a LoRA adapter (~646 MB).
## Intended use
- Formulating optimization problems from natural-language descriptions
- Solving small LPs, IPs, shortest-path, max-flow, knapsack, and similar OR problems end-to-end
- Studying optimality conditions (KKT, dual formulations, sensitivity analysis)
- Educational support for OR / optimization students
Not a replacement for a real solver. Use this to set up the problem; verify with Gurobi, CPLEX, OR-Tools, or SciPy.
## Evaluation
Evaluated on a held-out set of 60 OR problems across three categories: closed_form (n=20, auto-graded), open_conceptual (n=20, qualitative), long_form (n=20, qualitative).
Closed-form numeric accuracy (final-answer grader, last 5 lines of completion only):
| Model | Closed-form score |
|---|---|
| SFT-only | 48.3% |
| SFT + DPO (this model) | 55.4% |
DPO gives a +7.1 percentage point absolute improvement (~15% relative). Across 20 closed-form items, DPO improved 7, regressed 5, tied 8.
### Example: A01 (LP formulation)
Prompt: "A factory produces A and B. A needs 2h labor, 3 units material, profit $5. B needs 1h labor, 2 units material, profit $4. Available: 100h labor, 200 units material. Maximize profit."
Expected: x_A=40, x_B=20, profit=280
SFT (incorrect): "the optimal solution is x_A = 20 and x_B = 60, which gives a total profit of $340"
DPO (correct): "the maximum profit occurs at the point (x_A, x_B) = (40, 20), yielding a total profit of $280"
## How to use
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-Math-7B",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, "billwang37/WWang-Lab-OptimAI")
tokenizer = AutoTokenizer.from_pretrained("billwang37/WWang-Lab-OptimAI", trust_remote_code=True)
prompt = "Maximize 3x + 5y subject to x + 2y <= 14, 3x - y >= 0, x - y <= 2, x,y >= 0."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))
## Training details
- Base model: Qwen/Qwen2.5-Math-7B
- SFT data: ~6k OR problems with worked solutions
- DPO data: ~1.2k preference pairs (chosen vs rejected)
- Method: LoRA adapter via PEFT + TRL
- Hardware: Single A100 (40 GB) on OU OSCER cluster
- SFT runtime: ~4 hours (3 epochs)
- DPO runtime: ~30 minutes
## Limitations
- 60-problem eval set is small; numbers are a directional signal, not a definitive benchmark
- Grader checks numbers, not reasoning
- Can produce confidently wrong answers, especially on integer programming and combinatorial graph problems
- LoRA adapter format; you must load Qwen2.5-Math-7B base separately
- English-only
## License
Apache-2.0, inherited from the base model.
|