Model Card for LFT_Final_FineTuned_Increased_Metrics

Merged structure-aware LoRA deltas for the LfT math-tutoring student on top of Llama-3.1-8B-Instruct. This is the canonical LfT student for downstream math-tutor and IDC workflows.

Model Details

Model Description

  • --Developed by:-- YRS Aakanksha
  • --Shared by:-- YRS Aakanksha
  • --Model type:-- Instruction-tuned causal LM with merged LoRA deltas (global LfT)
  • --Language(s):-- English (math tutoring focus)
  • --License:-- Same as base model (Llama-3.1-8B-Instruct)
  • --Finetuned from:-- meta-llama/Llama-3.1-8B-Instruct

Model Sources

Uses

Direct Use

  • Math tutoring / reasoning with structure-aware prompts (chapter/difficulty/LO tags).
  • Base student for two-stage LfT + IDC flows.

Downstream Use

  • Further task-specific fine-tuning for math reasoning or instructional tutoring.

Out-of-Scope Use

  • Non-math domains; safety-critical decisions; any deployment without alignment/safety layers.

Bias, Risks, and Limitations

  • Inherits biases and limitations of the base Llama-3.1-8B-Instruct model and the curated math datasets.
  • Not safety-tuned; avoid use in safety-critical settings.

Recommendations

  • Keep human oversight; add safety/filters for production.

How to Get Started

Load with transformers (merged weights)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Sashank-810/LFT_Final_FineTuned_Increased_Metrics"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

prompt = "Explain the concept of vector projections with an example."
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(--inputs, max_new_tokens=128)
print(tok.decode(out[0], skip_special_tokens=True))

Serve with vLLM

pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model Sashank-810/LFT_Final_FineTuned_Increased_Metrics \
  --tensor-parallel-size 2 \
  --dtype auto

Then query via OpenAI-compatible endpoint (replace URL/key as needed):

import openai, os

openai.api_base = "http://localhost:8000/v1"
openai.api_key = "EMPTY"

resp = openai.ChatCompletion.create(
    model="Sashank-810/LFT_Final_FineTuned_Increased_Metrics",
    messages=[{"role": "user", "content": "Outline key steps in solving a probability problem involving Bayes' theorem."}],
    max_tokens=128,
)
print(resp["choices"][0]["message"]["content"])

Training Details

  • Fine-tuned with structure-aware SFT across all chapters; LoRA deltas merged into base. Specific hyperparameters and dataset splits are kept private.

Evaluation

The model was evaluated on a comprehensive test set of 2,617 math tutoring questions, comparing performance against the base Llama-3.1-8B-Instruct model.

Accuracy Results

Metric Base Model Fine-tuned Model Improvement
--Correct Answers-- 625 / 2617 843 / 2617 +218
--Accuracy-- 23.88% 32.21% --+8.33%--
--Questions Improved-- - 421 -
--Questions Regressed-- - 203 -

The fine-tuned model shows a --34.9% relative improvement-- in accuracy over the base model, with more than twice as many questions improved (421) compared to regressed (203).

Generation Quality Metrics

BLEU Score

Model BLEU Score Precision (1/2/3/4-gram) BP Sys Len Ref Len
Base 38.24 58.8 / 67.8 / 63.8 / 59.8 0.612 3,765 5,612
Fine-tuned --58.56-- 57.1 / 65.4 / 60.0 / 53.9 --0.993-- 5,573 5,612

The fine-tuned model achieves a --53.1% relative improvement-- in BLEU score (38.24 โ†’ 58.56), with significantly better length matching (BP: 0.612 โ†’ 0.993).

ROUGE Scores

Metric Base Model Fine-tuned Model Improvement
--ROUGE-1-- 0.2948 --0.4188-- +42.1%
--ROUGE-2-- 0.0931 --0.1184-- +27.2%
--ROUGE-L-- 0.2936 --0.4181-- +42.4%
--ROUGE-Lsum-- 0.2938 --0.4185-- +42.4%

All ROUGE metrics show substantial improvements, indicating better recall and overlap with reference answers.

METEOR Score

Model METEOR Score Improvement
Base 0.1633 -
Fine-tuned --0.2327-- --+42.5%--

The METEOR score improvement demonstrates better semantic alignment and synonym matching in generated responses.

Key Findings

  1. --Substantial Accuracy Gains--: The model demonstrates a clear improvement in mathematical correctness, with accuracy rising from 23.88% to 32.21%.

  2. --Improved Response Quality--: Across all automated metrics (BLEU, ROUGE, METEOR), the fine-tuned model shows 27-53% relative improvements, indicating more coherent and relevant responses.

  3. --Better Length Calibration--: The brevity penalty improvement (0.612 โ†’ 0.993) shows the model generates more appropriately-sized responses that better match expected answer lengths.

  4. --Positive Net Impact--: With 421 improved questions versus 203 regressed, the model shows a strong positive impact ratio of approximately 2:1.

Technical Specifications

  • Architecture: Llama-3.1-8B-Instruct with merged LoRA deltas (Phase 2 global LfT).
  • Compute: Not disclosed; intended for GPU inference; vLLM compatible.

Model Card Contact

  • Sashank-810 on Hugging Face
Downloads last month
85
Safetensors
Model size
8B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Sashank-810/LFT_Final_FineTuned_Increased_Metrics

Finetuned
(2109)
this model
Finetunes
1 model