| | --- |
| | library_name: transformers |
| | license: mit |
| | language: |
| | - en |
| | metrics: |
| | - rouge |
| | base_model: |
| | - microsoft/Phi-3-mini-4k-instruct |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # Model Card for **Phi3-Lab-Report-Coder (LoRA on Phi-3 Mini 4k Instruct)** |
| |
|
| | A lightweight LoRA-adapter fine-tune of `microsoft/Phi-3-mini-4k-instruct` for **turning structured lab contexts + observations into executable Python code** that performs the target calculations (e.g., mechanics, fluids, vibrations, basic circuits, titrations). Trained with QLoRA in 4-bit, this model is intended as an **assistive code generator** for STEM lab writeups and teaching demos—not as a certified calculator for safety-critical engineering. |
| |
|
| | --- |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | - **Developed by:** Barghav777 |
| | - **Model type:** Causal decoder LM (instruction-tuned) + **LoRA adapter** |
| | - **Languages:** English |
| | - **License:** MIT |
| | - **Finetuned from:** `microsoft/Phi-3-mini-4k-instruct` |
| | - **Intended input format:** A structured prompt with: |
| | - `### CONTEXT:` (natural-language description of the experiment) |
| | - `### OBSERVATIONS:` (JSON-like dict with units, readings) |
| | - `### CODE:` (the model is trained to generate the Python solution after this tag) |
| |
|
| | ### Model Sources |
| |
|
| | - **Base model:** `microsoft/Phi-3-mini-4k-instruct` |
| | - **Training data files:** `train.jsonl` (37 items), `eval.jsonl` (6 items) |
| | - **Demo/Colab basis:** Training notebook available at: https://github.com/Barghav777/AI-Lab-Report-Agent |
| |
|
| | --- |
| |
|
| | ## Uses |
| |
|
| | ### Direct Use |
| | - Generate **readable Python code** to compute derived quantities from lab observations (e.g., average \(g\) via pendulum, Coriolis acceleration, Ohm’s law resistances, radius of gyration, Reynolds number). |
| | - Produce calculation pipelines with minimal plotting/printing that are easy to copy-paste and run in a notebook. |
| |
|
| | ### Downstream Use |
| | - Course assistants or lab-prep tools that auto-draft calculation code for **intro undergrad physics/mech/fluids/EE labs**. |
| | - Auto-checkers that compare student code vs. a reference implementation (with appropriate guardrails). |
| |
|
| | ### Out-of-Scope Use |
| | - Any **safety-critical** design decisions (structural, medical, chemical process control). |
| | - High-stakes computation without human verification. |
| | - Domains far outside the training distribution (e.g., NLP preprocessing pipelines, advanced control systems, large-scale simulation frameworks). |
| |
|
| | --- |
| |
|
| | ## Bias, Risks, and Limitations |
| |
|
| | - **Small dataset (37 train / 6 eval)** → plausible overfitting; brittle generalization to unseen experiment formats. |
| | - **Formula misuse risk:** The model may pick incorrect constants/units or silently use wrong equations. |
| | - **Overconfidence:** Generated code may “look right” while being numerically off or unit-inconsistent. |
| | - **JSON brittleness:** If `OBSERVATIONS` keys/units differ from training patterns, the code may break. |
| |
|
| | ### Recommendations |
| | - Always **review formulas and units**; add assertions/unit conversions in downstream systems. |
| | - Run generated code with **test observations** and compare against hand calculations. |
| | - For deployment, wrap outputs with **explanations and references** to the formulas used. |
| |
|
| | --- |
| |
|
| | ## How to Get Started |
| |
|
| | **Prompt template used in training** |
| | ```text |
| | ### CONTEXT: |
| | {context} |
| | |
| | ### OBSERVATIONS: |
| | {observations} |
| | |
| | ### CODE: |
| | ``` |
| |
|
| | **Load base + LoRA adapter (recommended)** |
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TextStreamer |
| | from peft import PeftModel |
| | import torch |
| | |
| | base_id = "microsoft/Phi-3-mini-4k-instruct" |
| | adapter_id = "YOUR_ADAPTER_REPO_OR_LOCAL_PATH" # e.g., ./phi3-lab-report-coder-final |
| | |
| | bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", |
| | bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=False) |
| | |
| | tok = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True) |
| | tok.pad_token = tok.eos_token |
| | |
| | base = AutoModelForCausalLM.from_pretrained(base_id, quantization_config=bnb, |
| | trust_remote_code=True, device_map="auto") |
| | model = PeftModel.from_pretrained(base, adapter_id) |
| | model.eval() |
| | |
| | prompt = """### CONTEXT: |
| | Experiment to determine acceleration due to gravity using a simple pendulum... |
| | |
| | ### OBSERVATIONS: |
| | {'readings': [{'L':0.50,'T':1.42}, {'L':0.60,'T':1.55}], 'unit_L':'m', 'unit_T':'s'} |
| | |
| | ### CODE: |
| | """ |
| | |
| | inputs = tok(prompt, return_tensors="pt").to(model.device) |
| | streamer = TextStreamer(tok, skip_prompt=True, skip_special_tokens=True) |
| | _ = model.generate(**inputs, max_new_tokens=400, temperature=0.2, do_sample=False, streamer=streamer) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Training Details |
| |
|
| | ### Data |
| | - **Files:** `train.jsonl` (list of objects), `eval.jsonl` (list of objects) |
| | - **Schema per example:** |
| | - `context` *(str)*: experiment description |
| | - `observations` *(dict)*: units + numeric readings (lists of dicts) |
| | - `code` *(str)*: reference Python solution |
| | - **Topical spread (non-exhaustive):** pendulum \(g\), Ohm’s law, titration, density via displacement, Coriolis accel., gyroscopic effect, Hartnell governor, rotating mass balancing, helical spring vibration, bi-filar suspension, etc. |
| |
|
| | **Size & basic stats** |
| | - Train: **37** items; Eval: **6** items |
| | - Formatted prompt (context+observations+code) length (train): |
| | - mean ≈ **222** words (≈ **1,739** chars); 95th pct ≈ **311** words |
| | - Reference code length (train): |
| | - mean ≈ **34** lines (min **9**, max **71**) |
| |
|
| | ### Training Procedure (from notebook) |
| | - **Approach:** QLoRA (4-bit) SFT using `trl.SFTTrainer` |
| | - **Quantization:** `bitsandbytes` 4-bit `nf4`, compute dtype `bfloat16` |
| | - **LoRA config:** `r=16`, `alpha=32`, `dropout=0.05`, `bias="none"`, targets = `q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj` |
| | - **Tokenizer:** right padding; `eos_token` as `pad_token` |
| | - **Hyperparameters (TrainingArguments):** |
| | - epochs: **10** |
| | - per-device train batch size: **1** |
| | - gradient_accumulation_steps: **4** |
| | - optimizer: **paged_adamw_32bit** |
| | - learning rate: **2e-4**, weight decay: **1e-3** |
| | - warmup_ratio: **0.03**, scheduler: **constant** |
| | - bf16: **True** (fp16: False), group_by_length: True |
| | - logging_steps: 10, save/eval every 50 steps |
| | - report_to: tensorboard |
| | - **Saving:** `trainer.save_model("./phi3-lab-report-coder-final")` (adapter folder) |
| |
|
| | ### Speeds, Sizes, Times |
| | - **Hardware:** Google Colab **T4 GPU** (per notebook metadata) |
| | - **Adapter artifact:** LoRA weights only (load with the base model). |
| | - **Wall-clock time:** not logged in the notebook. |
| |
|
| | --- |
| |
|
| | ## Evaluation |
| |
|
| | ### Testing Data, Factors & Metrics |
| | - **Eval set:** `eval.jsonl` (**6** items) with same schema. |
| | - **Primary metric (planned):** ROUGE-L / ROUGE-1 against reference `code` (proxy for surface similarity). |
| | - **Recommended additional checks:** unit tests on numeric outputs; pyflakes/ruff for syntax; run-time assertions. |
| |
|
| | ### Results |
| | - No automated score recorded in the notebook. |
| | - **Suggested protocol:** |
| | 1) Generate code for each eval item using the same prompt template. |
| | 2) Execute safely in a sandbox with provided observations. |
| | 3) Compare computed scalars (e.g., average \(g\), \(R\), Reynolds number) to ground truth tolerances. |
| | 4) Report pass rate and ROUGE for readability/similarity. |
| |
|
| | --- |
| |
|
| | ## Model Examination (optional) |
| | - Inspect token-by-token attention to `OBSERVATIONS` keys (ablation: shuffle keys to test robustness). |
| | - Add **unit-check helpers** (e.g., `pint`) in prompts to encourage explicit conversions. |
| |
|
| | --- |
| |
|
| | ## Environmental Impact |
| | - **Hardware Type:** NVIDIA T4 (Colab) |
| | - **Precision:** 4-bit QLoRA with `bfloat16` compute |
| | - **Hours used:** Not recorded (dataset is small; expected low) |
| | - **Cloud Provider/Region:** Colab (unspecified) |
| | - **Carbon Emitted:** Not estimated (see [ML CO2 Impact calculator](https://mlco2.github.io/impact#compute)) |
| |
|
| | --- |
| |
|
| | ## Technical Specifications |
| |
|
| | ### Architecture & Objective |
| | - **Backbone:** `Phi-3-mini-4k-instruct` (decoder-only causal LM) |
| | - **Objective:** Supervised fine-tuning to continue from `### CODE:` with correct, executable Python. |
| |
|
| | ### Compute Infrastructure |
| | - **Hardware:** Colab GPU (T4) + CPU RAM |
| | - **Software:** |
| | - `transformers`, `trl`, `peft`, `bitsandbytes`, `datasets`, `accelerate`, `torch` |
| |
|
| | --- |
| |
|
| | ## Citation |
| | @article{abdin2024phi3, |
| | title = {Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone}, |
| | author = {Abdin, Marah and others}, |
| | journal = {arXiv preprint arXiv:2404.14219}, |
| | year = {2024}, |
| | doi = {10.48550/arXiv.2404.14219}, |
| | url = {https://arxiv.org/abs/2404.14219} |
| | } |
| |
|
| | --- |
| |
|
| | ## Glossary |
| | - **QLoRA:** Fine-tuning with low-rank adapters on a quantized base model (saves memory/compute). |
| | - **LoRA (r, α):** Rank and scaling of low-rank update matrices. |
| |
|
| | --- |
| |
|
| | ## More Information |
| | - For better robustness, consider augmenting data with **unit-perturbation** and **noise-in-readings** variants, and add examples across more domains (materials, thermo, optics). |
| | - Add **eval harness** with numeric tolerances and syntax checks. |
| |
|
| | --- |
| |
|
| | ## Model Card Authors |
| | - Barghav777 |
| | --- |