adhafajp commited on
Commit
77fc812
Β·
verified Β·
1 Parent(s): d1a4821

About and Licence

Browse files
Files changed (1) hide show
  1. README.md +211 -3
README.md CHANGED
@@ -1,3 +1,211 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: microsoft/phi-2
4
+ pipeline_tag: text-generation
5
+ datasets:
6
+ - yahma/alpaca-cleaned
7
+ - rajpurkar/squad_v2
8
+ language:
9
+ - en
10
+ tags:
11
+ - phi-2
12
+ - qlora
13
+ - chat
14
+ - chatml
15
+ - conversational
16
+ - english
17
+ - instruction-following
18
+ - nlp
19
+ - text-generation
20
+ - alpaca
21
+ - squad
22
+ - bitsandbytes
23
+ - fastapi
24
+ - peft
25
+ - transformers
26
+ - adhafajp
27
+ - zero-ai
28
+ ---
29
+
30
+ # πŸ’¬ Chat Model "Zero" (Phi-2 2.7B + QLoRA Adapter)
31
+
32
+ This repository contains the **QLoRA adapter** for creating **"Zero"**, a specialized instruction-following AI assistant fine-tuned from [`microsoft/phi-2`](https://huggingface.co/microsoft/phi-2).
33
+
34
+ This model is the core component of a **full-stack MLOps portfolio project**, demonstrating capabilities from **fine-tuning** to **production-ready deployment**.
35
+
36
+ - 🧩 **Model Adapter:** [adhafajp/phi2-qlora-zero-chat](https://huggingface.co/adhafajp/phi2-qlora-zero-chat)
37
+ - βš™οΈ **Full FastAPI Project (Main Portfolio):** [GitHub – ZeroChat](https://github.com/adhafajp/ZeroChat)
38
+
39
+ ---
40
+
41
+ ## πŸš€ Project Overview
42
+
43
+ **Zero** is designed to serve as a fast, memory-efficient conversational model optimized for reasoning, instruction-following, and question-answering tasks.
44
+
45
+ ### Key Features
46
+ - 🧠 **Fine-tuned using QLoRA** β€” efficient, low-resource adaptation of Phi-2.
47
+ - ⚑ **Backend:** Asynchronous **FastAPI** server with streaming responses.
48
+ - πŸ’¬ **Frontend:** Interactive chat interface built with **HTML**, **TailwindCSS**, and **JavaScript** (via Server-Sent Events).
49
+ - πŸ” **Deployment-ready:** Lightweight and easy to containerize.
50
+
51
+ ---
52
+
53
+ ## 🧩 Training Details
54
+
55
+ | Component | Description |
56
+ |------------|-------------|
57
+ | **Base Model** | `microsoft/phi-2` |
58
+ | **Method** | QLoRA (Quantized LoRA Fine-Tuning) |
59
+ | **Language** | English only |
60
+ | **Precision** | 4-bit (NF4) |
61
+ | **Frameworks** | `transformers`, `peft`, `bitsandbytes`, `fastapi` |
62
+
63
+ ### Dataset Composition
64
+ The adapter was trained on a curated blend of English datasets:
65
+ - **alpaca_cleaned** β†’ general-purpose instruction-following
66
+ - **squad_v2** β†’ question answering and reading comprehension
67
+ - **custom_persona (283 samples)** β†’ gives *Zero* its distinct assistant identity
68
+
69
+ ---
70
+
71
+ ## πŸ”§ Integration Example
72
+
73
+ A complete **local deployment example** (FastAPI backend + chat frontend) is available at the main project repository:
74
+ πŸ‘‰ [**GitHub – ZeroChat**](https://github.com/adhafajp/ZeroChat)
75
+
76
+ This repository demonstrates how to integrate this adapter with:
77
+ - πŸ”Ή A FastAPI inference server (supports streaming responses)
78
+ - πŸ”Ή A lightweight HTML/Tailwind chat UI
79
+ - πŸ”Ή Simple local setup and environment configuration for experimentation or portfolio demonstration
80
+
81
+ ---
82
+
83
+ ## πŸ“ˆ Training Phases Summary
84
+
85
+ The fine-tuning process consisted of multiple experimental stages:
86
+
87
+ | Phase | Summary | Runtime |
88
+ |--------|----------|----------|
89
+ | **1A** | Initial fine-tune (canceled at 11h 50m due to val overfit) | 11h 50m |
90
+ | **1B** | Full 2-epoch fine-tune on Alpaca + SQuADv2 + persona (main baseline) | 5d 11h 50m |
91
+ | **1C** | Small re-train on reduced subset (underfit) | 19h |
92
+ | **1D / 1D-A / 1E** | Refinement attempts with packing and oversampling | ~3d total |
93
+ | **1F** | Final adapter re-train from **1B** (expanded persona dataset, balanced oversampling) | 1d 5h |
94
+
95
+ The released adapter corresponds to **Phase 1F**, which achieved balanced performance across **instruction-following**, **reasoning**, and **identity consistency**.
96
+
97
+ πŸ“Š W&B Log (Phase 1F):
98
+ [wandb.ai/VoidNova/phi-2-2.7B_qlora_alpaca-51.8k_identity-model-232_squadv2-15k/runs/bpju3d09](https://wandb.ai/VoidNova/phi-2-2.7B_qlora_alpaca-51.8k_identity-model-232_squadv2-15k/runs/bpju3d09?nw=nwuseradhafajp)
99
+
100
+ ---
101
+
102
+ ## 🧠 How to Use
103
+
104
+ > ⚠️ This is a **LoRA adapter**, not a full model.
105
+ > You must load the base model (`microsoft/phi-2`) and apply this adapter on top of it.
106
+
107
+ ```python
108
+ import torch
109
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
110
+ from peft import PeftModel
111
+
112
+ adapter_path = "adhafajp/phi2-qlora-zero-chat"
113
+ base_model_path = "microsoft/phi-2"
114
+
115
+ # Quantization configuration
116
+ compute_dtype = getattr(torch, "float16")
117
+ bnb_config = BitsAndBytesConfig(
118
+ load_in_4bit=True,
119
+ bnb_4bit_quant_type="nf4",
120
+ bnb_4bit_compute_dtype=compute_dtype,
121
+ bnb_4bit_use_double_quant=True,
122
+ )
123
+
124
+ print(f"Loading base model from: {base_model_path}")
125
+ base_model = AutoModelForCausalLM.from_pretrained(
126
+ base_model_path,
127
+ quantization_config=bnb_config,
128
+ device_map="auto",
129
+ trust_remote_code=True
130
+ )
131
+
132
+ print(f"Loading tokenizer from: {adapter_path}")
133
+ tokenizer = AutoTokenizer.from_pretrained(
134
+ adapter_path,
135
+ trust_remote_code=True
136
+ )
137
+ if tokenizer.pad_token is None:
138
+ tokenizer.pad_token = tokenizer.eos_token
139
+ tokenizer.pad_token_id = tokenizer.eos_token_id
140
+
141
+ base_model.resize_token_embeddings(len(tokenizer))
142
+
143
+ print(f"Applying QLoRA adapter from: {adapter_path}...")
144
+ model = PeftModel.from_pretrained(base_model, adapter_path)
145
+ model.eval()
146
+
147
+ print("Model is ready to use!")
148
+
149
+ # --- INFERENCE EXAMPLE ---
150
+
151
+ DEFAULT_SYSTEM = "You are Zero, a helpful assistant."
152
+ PROMPT_FORMAT = """<|im_start|>system
153
+ {system_prompt}<|im_end|>
154
+ <|im_start|>user
155
+ {instruction}<|im_end|>
156
+ <|im_start|>assistant
157
+ """
158
+
159
+ instruction = "What is QLoRA and how does it work?"
160
+ prompt_text = PROMPT_FORMAT.format(
161
+ system_prompt=DEFAULT_SYSTEM,
162
+ instruction=instruction
163
+ )
164
+
165
+ inputs = tokenizer(prompt_text, return_tensors="pt").to(model.device)
166
+ prompt_token_count = inputs["input_ids"].shape[1]
167
+
168
+ print(f"\nGenerating response for: '{instruction}'")
169
+
170
+ with torch.no_grad():
171
+ outputs = model.generate(
172
+ **inputs,
173
+ max_new_tokens=768,
174
+ repetition_penalty=1.1,
175
+ do_sample=False,
176
+ eos_token_id=tokenizer.convert_tokens_to_ids("<|endoftext|>"),
177
+ pad_token_id=tokenizer.pad_token_id,
178
+ )
179
+
180
+ generated_tokens = outputs[0][prompt_token_count:]
181
+ generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=False)
182
+
183
+ cut_index = len(generated_text)
184
+ for stop_token in ["<|endoftext|>", "<|im_end|>"]:
185
+ if stop_token in generated_text:
186
+ cut_index = min(cut_index, generated_text.index(stop_token))
187
+
188
+ final_answer = generated_text[:cut_index].strip()
189
+
190
+ print(f"Model response:\n{final_answer}")
191
+ ```
192
+ ---
193
+
194
+ ### πŸͺΆ Example Prompts
195
+ "Who are you?"
196
+ "How to be success?"
197
+
198
+ ---
199
+
200
+ ### 🧠 Example with RAG Context
201
+ "CONTEXT:---Zinc is an essential mineral perceived by the public today as being of ''exceptional biologic and public health importance'', especially regarding prenatal and postnatal development. Zinc deficiency affects about two billion people in the developing world and is associated with many diseases. In children it causes growth retardation, delayed sexual maturation, infection susceptibility, and diarrhea. Enzymes with a zinc atom in the reactive center are widespread in biochemistry, such as alcohol dehydrogenase in humans. Consumption of excess zinc can cause ataxia, lethargy and copper deficiency.---QUESTION:How many people are affected by zinc deficiency?"
202
+
203
+
204
+ ## Acknowledgements & Licenses
205
+
206
+ This project builds upon several outstanding open-source contributions:
207
+
208
+ * **Base Model:** This work is a fine-tuned adapter of `microsoft/phi-2`. The `phi-2` model is licensed under the **MIT License**.
209
+ * `Copyright (c) 2023 Microsoft`
210
+ * **Libraries:** This project is powered by `transformers`, `peft`, and `bitsandbytes` by Hugging Face πŸ€—, as well as `torch` by PyTorch. These libraries are generally available under the Apache 2.0 or similar permissive licenses.
211
+ * **This Adapter & Code:** The original code for this repository (including the adapter weights) is licensed under the **Apache 2.0 License**.