File size: 9,165 Bytes
536306f 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d 6fa4c33 b1f5a0d e3167e1 ca62a1b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
---
base_model: unsloth/phi-3.5-mini-instruct-bnb-4bit
library_name: peft
---
# Model Card for Fine-tuned Phi-3.5-mini-instruct for MCQ Generation
## Model Details
**Model Description**
This model is a fine-tuned version of `unsloth/Phi-3.5-mini-instruct` (an optimized 4-bit version of `microsoft/Phi-3-mini-4k-instruct`). It has been fine-tuned using Low-Rank Adaptation (LoRA) specifically for the task of generating multiple-choice questions (MCQs) in JSON format based on provided context text. The fine-tuning was performed using the script provided in the context.
* **Developed by:** Fine-tuned based on the provided script. Base model by Microsoft. Optimization by Unsloth AI.
* **Funded by [optional]:** [More Information Needed]
* **Shared by [optional]:** [More Information Needed]
* **Model type:** Language Model (Phi-3 architecture) fine-tuned with QLoRA.
* **Language(s) (NLP):** English
* **License:** The base model `microsoft/Phi-3-mini-4k-instruct` is licensed under the MIT License. The fine-tuned adapters are subject to the base model's license and potentially the license of the training data (`asanchez75/medical_textbooks_mcq`). Unsloth code is typically Apache 2.0. Please check the specific licenses for compliance.
* **Finetuned from model:** `unsloth/Phi-3.5-mini-instruct` (4-bit quantized version).
**Model Sources [optional]**
* **Repository:** [More Information Needed - Link to where the fine-tuned adapters are hosted, if applicable]
* **Paper [optional]:** [Link to Phi-3 Paper, e.g., https://arxiv.org/abs/2404.14219]
* **Demo [optional]:** [More Information Needed]
## Uses
**Direct Use**
This model is intended for generating multiple-choice questions (MCQs) in a specific JSON format, given a piece of context text. It requires using the specific prompt structure employed during training (see Preprocessing section). The primary use case involves loading the base `unsloth/Phi-3.5-mini-instruct` model (in 4-bit) and then applying the saved LoRA adapters using the PEFT library.
**Downstream Use [optional]**
Could be integrated into educational tools, content creation pipelines for medical training materials, or automated assessment generation systems within the medical domain.
**Out-of-Scope Use**
* Generating text in formats other than the targeted MCQ JSON structure.
* Answering general knowledge questions or performing tasks unrelated to MCQ generation from context.
* Use in domains significantly different from the medical textbook context used for training (performance may degrade).
* Use without the specific prompt format defined during training.
* Generating harmful, biased, or inaccurate content.
* Any use violating the terms of the base model license or the dataset license.
## Bias, Risks, and Limitations
* **Inherited Bias:** The model inherits biases present in the base Phi-3 model and the `asanchez75/medical_textbooks_mcq` training dataset, which is derived from medical literature.
* **Accuracy:** Generated MCQs may be factually incorrect, nonsensical, or poorly formulated. The correctness of the identified "correct\_option" is not guaranteed.
* **Format Adherence:** While trained to output JSON, the model might occasionally fail to produce perfectly valid JSON or might include extraneous text.
* **Domain Specificity:** Performance is likely best on medical contexts similar to the training data. Performance on other domains or highly dissimilar medical texts is unknown.
* **Quantization:** The use of 4-bit quantization (QLoRA) may slightly impact performance compared to a full-precision model, although Unsloth optimizations aim to minimize this.
* **Context Dependence:** Output quality is highly dependent on the clarity and information content of the provided input context.
* **Limited Evaluation:** The model was only evaluated qualitatively on one example from the training set within the script. Rigorous evaluation across a dedicated test set was not performed.
## Recommendations
* **Verification:** Always verify the factual accuracy, grammatical correctness, and appropriateness of generated MCQs before use.
* **Prompting:** Use the specific prompt structure detailed in the "Preprocessing" section for optimal results.
* **Testing:** Thoroughly test the model's performance on your specific use case and data distribution.
* **Bias Awareness:** Be mindful of potential biases inherited from the base model and training data.
* **JSON Parsing:** Implement robust JSON parsing with error handling for the model's output.
## How to Get Started with the Model
Use the code below to load the 4-bit base model, apply the fine-tuned LoRA adapters, and run inference. Replace `"path/to/your/saved/adapters/"` with the actual path where you saved the adapter files (`adapter_model.safetensors`, `adapter_config.json`, etc.) and the tokenizer (`tokenizer.json`, etc.).
```python
import torch
from transformers import AutoTokenizer
from unsloth import FastLanguageModel
from peft import PeftModel
import json # For parsing output
# --- Configuration ---
base_model_name = "unsloth/Phi-3.5-mini-instruct"
adapter_path = "path/to/your/saved/adapters/" # <--- CHANGE THIS
max_seq_length = 4096
# --- 1. Load Base Model and Tokenizer (4-bit) ---
print("Loading base model and tokenizer...")
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = base_model_name,
max_seq_length = max_seq_length,
dtype = None,
load_in_4bit = True, # Load base in 4-bit
device_map = "auto",
)
print("Base model loaded in 4-bit.")
# Set padding token if necessary
if tokenizer.pad_token is None:
if tokenizer.pad_token_id is None:
tokenizer.pad_token = tokenizer.eos_token
else:
tokenizer.pad_token = tokenizer.convert_ids_to_tokens(tokenizer.pad_token_id)
tokenizer.padding_side = 'right'
print(f"Tokenizer pad token: {tokenizer.pad_token}, ID: {tokenizer.pad_token_id}")
# --- 2. Load LoRA Adapters ---
print(f"Loading LoRA adapters from {adapter_path}...")
# Load adapters onto the base model
model = PeftModel.from_pretrained(model, adapter_path)
print("LoRA adapters loaded.")
# --- 3. Prepare for Inference ---
print("Preparing combined model for inference...")
FastLanguageModel.for_inference(model)
print("Model ready for inference.")
# --- 4. Prepare Inference Prompt ---
test_context = "Human beings are fallible and it is in their nature to make mistakes. An error of omission occurs when a necessary action has not been taken." # Example context
inference_prompt = f"<|user|>\nContext:\n{test_context}\n\nGenerate ONE valid multiple-choice question based strictly on the context above. Output ONLY the valid JSON object representing the question.\nMCQ JSON:<|end|>\n<|assistant|>\n"
inputs = tokenizer(inference_prompt, return_tensors="pt", truncation=True, max_length=max_seq_length).to("cuda")
# --- 5. Generate Output ---
print("Generating MCQ JSON...")
with torch.no_grad():
outputs = model.generate(
input_ids = inputs["input_ids"],
max_new_tokens=512, # Max length for the generated JSON
temperature=0.1, # Low temperature for more deterministic output
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.eos_token_id
)
# Decode the generated part
output_ids = outputs[0][inputs["input_ids"].shape[1]:]
generated_json_part = tokenizer.decode(output_ids, skip_special_tokens=True).strip()
print("\n--- Generated Output ---")
print(generated_json_part)
# --- 6. (Optional) Validate JSON ---
try:
# Clean up potential markdown fences
if generated_json_part.startswith("```json"):
generated_json_part = generated_json_part[len("```json"):].strip()
if generated_json_part.endswith("```"):
generated_json_part = generated_json_part[:-len("```")].strip()
parsed_json = json.loads(generated_json_part)
print("\nGenerated JSON Parsed Successfully:")
print(json.dumps(parsed_json, indent=2))
except json.JSONDecodeError as e:
print(f"\nGenerated output IS NOT valid JSON. Error: {e}")
```
## Example Output
The model aims to generate a valid JSON object structured like the example below. Note that while the training prompt focused on specific keys (question, options, correct_option), the model might also generate related fields like explanation based on patterns learned from the training data.
```json
{
"question": "What is the maximum duration of a temporary ban from practising as a disciplinary sanction in the medical profession?",
"option_a": "1 year",
"option_b": "2 years",
"option_c": "3 years",
"option_d": "5 years",
"correct_option": "C",
"explanation": "The correct answer is C, which states that the maximum duration of a temporary ban from practising as a disciplinary sanction in the medical profession is 3 years. This information is explicitly stated in the text, which mentions that a temporary ban from practising may be imposed for a maximum of three years. The other options are incorrect because they either underestimate or overestimate the maximum duration of the ban."
}
```
|