File size: 3,498 Bytes

7edeeef
6fa0aad
7edeeef
6fa0aad
 
 
 
 
 
 
 
 
 
7edeeef
 
 
 
6fa0aad
7edeeef
6fa0aad
7edeeef
6fa0aad
7edeeef
6fa0aad
7edeeef
6fa0aad
 
 
 
7edeeef
6fa0aad
7edeeef
 
 
6fa0aad
7edeeef
6fa0aad
7edeeef
6fa0aad
 
 
 
7edeeef
6fa0aad
7edeeef
6fa0aad
7edeeef
6fa0aad
7edeeef
6fa0aad
7edeeef
6fa0aad
7edeeef
6fa0aad
7edeeef
6fa0aad
7edeeef
6fa0aad
 
 
7edeeef
6fa0aad
 
 
 
 
 
 
 
7edeeef
6fa0aad
 
 
 
7edeeef
6fa0aad
7edeeef
6fa0aad
 
 
 
 
 
 
7edeeef
6fa0aad
7edeeef
6fa0aad
 
 
 
 
7edeeef
6fa0aad
7edeeef
6fa0aad

---
license: apache-2.0
base_model: Qwen/Qwen2.5-Math-7B
tags:
  - operations-research
  - optimization
  - linear-programming
  - integer-programming
  - lora
  - dpo
  - peft
language:
  - en
library_name: peft
pipeline_tag: text-generation
---

# OptimAI

A 7B model fine-tuned to formulate and solve operations research problems (LP, IP, network flow, queueing, stochastic optimization).

Built on Qwen2.5-Math-7B with a two-stage training pipeline: supervised fine-tuning (SFT) followed by Direct Preference Optimization (DPO). Distributed as a LoRA adapter (~646 MB).

## Intended use

- Formulating optimization problems from natural-language descriptions
- Solving small LPs, IPs, shortest-path, max-flow, knapsack, and similar OR problems end-to-end
- Studying optimality conditions (KKT, dual formulations, sensitivity analysis)
- Educational support for OR / optimization students

Not a replacement for a real solver. Use this to set up the problem; verify with Gurobi, CPLEX, OR-Tools, or SciPy.

## Evaluation

Evaluated on a held-out set of 60 OR problems across three categories: closed_form (n=20, auto-graded), open_conceptual (n=20, qualitative), long_form (n=20, qualitative).

Closed-form numeric accuracy (final-answer grader, last 5 lines of completion only):

| Model | Closed-form score |
|---|---|
| SFT-only | 48.3% |
| SFT + DPO (this model) | 55.4% |

DPO gives a +7.1 percentage point absolute improvement (~15% relative). Across 20 closed-form items, DPO improved 7, regressed 5, tied 8.

### Example: A01 (LP formulation)

Prompt: "A factory produces A and B. A needs 2h labor, 3 units material, profit $5. B needs 1h labor, 2 units material, profit $4. Available: 100h labor, 200 units material. Maximize profit."

Expected: x_A=40, x_B=20, profit=280

SFT (incorrect): "the optimal solution is x_A = 20 and x_B = 60, which gives a total profit of $340"

DPO (correct): "the maximum profit occurs at the point (x_A, x_B) = (40, 20), yielding a total profit of $280"

## How to use

    from peft import PeftModel
    from transformers import AutoModelForCausalLM, AutoTokenizer
    import torch

    base = AutoModelForCausalLM.from_pretrained(
        "Qwen/Qwen2.5-Math-7B",
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
    )
    model = PeftModel.from_pretrained(base, "billwang37/WWang-Lab-OptimAI")
    tokenizer = AutoTokenizer.from_pretrained("billwang37/WWang-Lab-OptimAI", trust_remote_code=True)

    prompt = "Maximize 3x + 5y subject to x + 2y <= 14, 3x - y >= 0, x - y <= 2, x,y >= 0."
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
    print(tokenizer.decode(out[0], skip_special_tokens=True))

## Training details

- Base model: Qwen/Qwen2.5-Math-7B
- SFT data: ~6k OR problems with worked solutions
- DPO data: ~1.2k preference pairs (chosen vs rejected)
- Method: LoRA adapter via PEFT + TRL
- Hardware: Single A100 (40 GB) on OU OSCER cluster
- SFT runtime: ~4 hours (3 epochs)
- DPO runtime: ~30 minutes

## Limitations

- 60-problem eval set is small; numbers are a directional signal, not a definitive benchmark
- Grader checks numbers, not reasoning
- Can produce confidently wrong answers, especially on integer programming and combinatorial graph problems
- LoRA adapter format; you must load Qwen2.5-Math-7B base separately
- English-only

## License

Apache-2.0, inherited from the base model.