Model Card for LFT_Final_FineTuned_Increased_Metrics
Merged structure-aware LoRA deltas for the LfT math-tutoring student on top of Llama-3.1-8B-Instruct. This is the canonical LfT student for downstream math-tutor and IDC workflows.
Model Details
Model Description
- --Developed by:-- YRS Aakanksha
- --Shared by:-- YRS Aakanksha
- --Model type:-- Instruction-tuned causal LM with merged LoRA deltas (global LfT)
- --Language(s):-- English (math tutoring focus)
- --License:-- Same as base model (Llama-3.1-8B-Instruct)
- --Finetuned from:-- meta-llama/Llama-3.1-8B-Instruct
Model Sources
Uses
Direct Use
- Math tutoring / reasoning with structure-aware prompts (chapter/difficulty/LO tags).
- Base student for two-stage LfT + IDC flows.
Downstream Use
- Further task-specific fine-tuning for math reasoning or instructional tutoring.
Out-of-Scope Use
- Non-math domains; safety-critical decisions; any deployment without alignment/safety layers.
Bias, Risks, and Limitations
- Inherits biases and limitations of the base Llama-3.1-8B-Instruct model and the curated math datasets.
- Not safety-tuned; avoid use in safety-critical settings.
Recommendations
- Keep human oversight; add safety/filters for production.
How to Get Started
Load with transformers (merged weights)
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Sashank-810/LFT_Final_FineTuned_Increased_Metrics"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
if tok.pad_token is None:
tok.pad_token = tok.eos_token
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
prompt = "Explain the concept of vector projections with an example."
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(--inputs, max_new_tokens=128)
print(tok.decode(out[0], skip_special_tokens=True))
Serve with vLLM
pip install vllm
python -m vllm.entrypoints.openai.api_server \
--model Sashank-810/LFT_Final_FineTuned_Increased_Metrics \
--tensor-parallel-size 2 \
--dtype auto
Then query via OpenAI-compatible endpoint (replace URL/key as needed):
import openai, os
openai.api_base = "http://localhost:8000/v1"
openai.api_key = "EMPTY"
resp = openai.ChatCompletion.create(
model="Sashank-810/LFT_Final_FineTuned_Increased_Metrics",
messages=[{"role": "user", "content": "Outline key steps in solving a probability problem involving Bayes' theorem."}],
max_tokens=128,
)
print(resp["choices"][0]["message"]["content"])
Training Details
- Fine-tuned with structure-aware SFT across all chapters; LoRA deltas merged into base. Specific hyperparameters and dataset splits are kept private.
Evaluation
The model was evaluated on a comprehensive test set of 2,617 math tutoring questions, comparing performance against the base Llama-3.1-8B-Instruct model.
Accuracy Results
| Metric | Base Model | Fine-tuned Model | Improvement |
|---|---|---|---|
| --Correct Answers-- | 625 / 2617 | 843 / 2617 | +218 |
| --Accuracy-- | 23.88% | 32.21% | --+8.33%-- |
| --Questions Improved-- | - | 421 | - |
| --Questions Regressed-- | - | 203 | - |
The fine-tuned model shows a --34.9% relative improvement-- in accuracy over the base model, with more than twice as many questions improved (421) compared to regressed (203).
Generation Quality Metrics
BLEU Score
| Model | BLEU Score | Precision (1/2/3/4-gram) | BP | Sys Len | Ref Len |
|---|---|---|---|---|---|
| Base | 38.24 | 58.8 / 67.8 / 63.8 / 59.8 | 0.612 | 3,765 | 5,612 |
| Fine-tuned | --58.56-- | 57.1 / 65.4 / 60.0 / 53.9 | --0.993-- | 5,573 | 5,612 |
The fine-tuned model achieves a --53.1% relative improvement-- in BLEU score (38.24 โ 58.56), with significantly better length matching (BP: 0.612 โ 0.993).
ROUGE Scores
| Metric | Base Model | Fine-tuned Model | Improvement |
|---|---|---|---|
| --ROUGE-1-- | 0.2948 | --0.4188-- | +42.1% |
| --ROUGE-2-- | 0.0931 | --0.1184-- | +27.2% |
| --ROUGE-L-- | 0.2936 | --0.4181-- | +42.4% |
| --ROUGE-Lsum-- | 0.2938 | --0.4185-- | +42.4% |
All ROUGE metrics show substantial improvements, indicating better recall and overlap with reference answers.
METEOR Score
| Model | METEOR Score | Improvement |
|---|---|---|
| Base | 0.1633 | - |
| Fine-tuned | --0.2327-- | --+42.5%-- |
The METEOR score improvement demonstrates better semantic alignment and synonym matching in generated responses.
Key Findings
--Substantial Accuracy Gains--: The model demonstrates a clear improvement in mathematical correctness, with accuracy rising from 23.88% to 32.21%.
--Improved Response Quality--: Across all automated metrics (BLEU, ROUGE, METEOR), the fine-tuned model shows 27-53% relative improvements, indicating more coherent and relevant responses.
--Better Length Calibration--: The brevity penalty improvement (0.612 โ 0.993) shows the model generates more appropriately-sized responses that better match expected answer lengths.
--Positive Net Impact--: With 421 improved questions versus 203 regressed, the model shows a strong positive impact ratio of approximately 2:1.
Technical Specifications
- Architecture: Llama-3.1-8B-Instruct with merged LoRA deltas (Phase 2 global LfT).
- Compute: Not disclosed; intended for GPU inference; vLLM compatible.
Model Card Contact
- Sashank-810 on Hugging Face
- Downloads last month
- 85