File size: 9,165 Bytes
536306f
 
 
 
6fa4c33
b1f5a0d
 
 
6fa4c33
b1f5a0d
6fa4c33
b1f5a0d
6fa4c33
 
 
 
 
 
 
b1f5a0d
6fa4c33
b1f5a0d
6fa4c33
 
 
b1f5a0d
 
 
6fa4c33
b1f5a0d
6fa4c33
b1f5a0d
6fa4c33
b1f5a0d
6fa4c33
b1f5a0d
6fa4c33
b1f5a0d
6fa4c33
 
 
 
 
 
b1f5a0d
 
 
6fa4c33
 
 
 
 
 
 
b1f5a0d
6fa4c33
b1f5a0d
6fa4c33
 
 
 
 
b1f5a0d
 
 
6fa4c33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b1f5a0d
e3167e1
 
ca62a1b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---
base_model: unsloth/phi-3.5-mini-instruct-bnb-4bit
library_name: peft
---
# Model Card for Fine-tuned Phi-3.5-mini-instruct for MCQ Generation

## Model Details

**Model Description**

This model is a fine-tuned version of `unsloth/Phi-3.5-mini-instruct` (an optimized 4-bit version of `microsoft/Phi-3-mini-4k-instruct`). It has been fine-tuned using Low-Rank Adaptation (LoRA) specifically for the task of generating multiple-choice questions (MCQs) in JSON format based on provided context text. The fine-tuning was performed using the script provided in the context.

* **Developed by:** Fine-tuned based on the provided script. Base model by Microsoft. Optimization by Unsloth AI.
* **Funded by [optional]:** [More Information Needed]
* **Shared by [optional]:** [More Information Needed]
* **Model type:** Language Model (Phi-3 architecture) fine-tuned with QLoRA.
* **Language(s) (NLP):** English
* **License:** The base model `microsoft/Phi-3-mini-4k-instruct` is licensed under the MIT License. The fine-tuned adapters are subject to the base model's license and potentially the license of the training data (`asanchez75/medical_textbooks_mcq`). Unsloth code is typically Apache 2.0. Please check the specific licenses for compliance.
* **Finetuned from model:** `unsloth/Phi-3.5-mini-instruct` (4-bit quantized version).

**Model Sources [optional]**

* **Repository:** [More Information Needed - Link to where the fine-tuned adapters are hosted, if applicable]
* **Paper [optional]:** [Link to Phi-3 Paper, e.g., https://arxiv.org/abs/2404.14219]
* **Demo [optional]:** [More Information Needed]

## Uses

**Direct Use**

This model is intended for generating multiple-choice questions (MCQs) in a specific JSON format, given a piece of context text. It requires using the specific prompt structure employed during training (see Preprocessing section). The primary use case involves loading the base `unsloth/Phi-3.5-mini-instruct` model (in 4-bit) and then applying the saved LoRA adapters using the PEFT library.

**Downstream Use [optional]**

Could be integrated into educational tools, content creation pipelines for medical training materials, or automated assessment generation systems within the medical domain.

**Out-of-Scope Use**

* Generating text in formats other than the targeted MCQ JSON structure.
* Answering general knowledge questions or performing tasks unrelated to MCQ generation from context.
* Use in domains significantly different from the medical textbook context used for training (performance may degrade).
* Use without the specific prompt format defined during training.
* Generating harmful, biased, or inaccurate content.
* Any use violating the terms of the base model license or the dataset license.

## Bias, Risks, and Limitations

* **Inherited Bias:** The model inherits biases present in the base Phi-3 model and the `asanchez75/medical_textbooks_mcq` training dataset, which is derived from medical literature.
* **Accuracy:** Generated MCQs may be factually incorrect, nonsensical, or poorly formulated. The correctness of the identified "correct\_option" is not guaranteed.
* **Format Adherence:** While trained to output JSON, the model might occasionally fail to produce perfectly valid JSON or might include extraneous text.
* **Domain Specificity:** Performance is likely best on medical contexts similar to the training data. Performance on other domains or highly dissimilar medical texts is unknown.
* **Quantization:** The use of 4-bit quantization (QLoRA) may slightly impact performance compared to a full-precision model, although Unsloth optimizations aim to minimize this.
* **Context Dependence:** Output quality is highly dependent on the clarity and information content of the provided input context.
* **Limited Evaluation:** The model was only evaluated qualitatively on one example from the training set within the script. Rigorous evaluation across a dedicated test set was not performed.

## Recommendations

* **Verification:** Always verify the factual accuracy, grammatical correctness, and appropriateness of generated MCQs before use.
* **Prompting:** Use the specific prompt structure detailed in the "Preprocessing" section for optimal results.
* **Testing:** Thoroughly test the model's performance on your specific use case and data distribution.
* **Bias Awareness:** Be mindful of potential biases inherited from the base model and training data.
* **JSON Parsing:** Implement robust JSON parsing with error handling for the model's output.

## How to Get Started with the Model

Use the code below to load the 4-bit base model, apply the fine-tuned LoRA adapters, and run inference. Replace `"path/to/your/saved/adapters/"` with the actual path where you saved the adapter files (`adapter_model.safetensors`, `adapter_config.json`, etc.) and the tokenizer (`tokenizer.json`, etc.).

```python
import torch
from transformers import AutoTokenizer
from unsloth import FastLanguageModel
from peft import PeftModel
import json # For parsing output

# --- Configuration ---
base_model_name = "unsloth/Phi-3.5-mini-instruct"
adapter_path = "path/to/your/saved/adapters/" # <--- CHANGE THIS
max_seq_length = 4096

# --- 1. Load Base Model and Tokenizer (4-bit) ---
print("Loading base model and tokenizer...")
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = base_model_name,
    max_seq_length = max_seq_length,
    dtype = None,
    load_in_4bit = True, # Load base in 4-bit
    device_map = "auto",
)
print("Base model loaded in 4-bit.")

# Set padding token if necessary
if tokenizer.pad_token is None:
    if tokenizer.pad_token_id is None:
        tokenizer.pad_token = tokenizer.eos_token
    else:
        tokenizer.pad_token = tokenizer.convert_ids_to_tokens(tokenizer.pad_token_id)
tokenizer.padding_side = 'right'
print(f"Tokenizer pad token: {tokenizer.pad_token}, ID: {tokenizer.pad_token_id}")

# --- 2. Load LoRA Adapters ---
print(f"Loading LoRA adapters from {adapter_path}...")
# Load adapters onto the base model
model = PeftModel.from_pretrained(model, adapter_path)
print("LoRA adapters loaded.")

# --- 3. Prepare for Inference ---
print("Preparing combined model for inference...")
FastLanguageModel.for_inference(model)
print("Model ready for inference.")

# --- 4. Prepare Inference Prompt ---
test_context = "Human beings are fallible and it is in their nature to make mistakes. An error of omission occurs when a necessary action has not been taken." # Example context
inference_prompt = f"<|user|>\nContext:\n{test_context}\n\nGenerate ONE valid multiple-choice question based strictly on the context above. Output ONLY the valid JSON object representing the question.\nMCQ JSON:<|end|>\n<|assistant|>\n"

inputs = tokenizer(inference_prompt, return_tensors="pt", truncation=True, max_length=max_seq_length).to("cuda")

# --- 5. Generate Output ---
print("Generating MCQ JSON...")
with torch.no_grad():
    outputs = model.generate(
        input_ids = inputs["input_ids"],
        max_new_tokens=512,        # Max length for the generated JSON
        temperature=0.1,           # Low temperature for more deterministic output
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id if tokenizer.pad_token_id is not None else tokenizer.eos_token_id
    )

# Decode the generated part
output_ids = outputs[0][inputs["input_ids"].shape[1]:]
generated_json_part = tokenizer.decode(output_ids, skip_special_tokens=True).strip()

print("\n--- Generated Output ---")
print(generated_json_part)

# --- 6. (Optional) Validate JSON ---
try:
    # Clean up potential markdown fences
    if generated_json_part.startswith("```json"):
        generated_json_part = generated_json_part[len("```json"):].strip()
    if generated_json_part.endswith("```"):
        generated_json_part = generated_json_part[:-len("```")].strip()

    parsed_json = json.loads(generated_json_part)
    print("\nGenerated JSON Parsed Successfully:")
    print(json.dumps(parsed_json, indent=2))
except json.JSONDecodeError as e:
    print(f"\nGenerated output IS NOT valid JSON. Error: {e}")

```

## Example Output

The model aims to generate a valid JSON object structured like the example below. Note that while the training prompt focused on specific keys (question, options, correct_option), the model might also generate related fields like explanation based on patterns learned from the training data.

```json
{
  "question": "What is the maximum duration of a temporary ban from practising as a disciplinary sanction in the medical profession?",
  "option_a": "1 year",
  "option_b": "2 years",
  "option_c": "3 years",
  "option_d": "5 years",
  "correct_option": "C",
  "explanation": "The correct answer is C, which states that the maximum duration of a temporary ban from practising as a disciplinary sanction in the medical profession is 3 years. This information is explicitly stated in the text, which mentions that a temporary ban from practising may be imposed for a maximum of three years. The other options are incorrect because they either underestimate or overestimate the maximum duration of the ban."
}
```