Text Generation
PEFT
Safetensors
sourize commited on
Commit
e541155
·
verified ·
1 Parent(s): b4a7a77

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -22
README.md CHANGED
@@ -3,53 +3,160 @@ base_model: microsoft/phi-2
3
  library_name: peft
4
  license: mit
5
  tags:
6
- - text-generation
7
  pipeline_tag: text-generation
 
 
8
  ---
9
 
10
- # phi2-memory-lora
11
 
12
- This repository contains the LoRA adapter weights for `microsoft/phi-2`, fine-tuned to maintain short-term conversational memory for DeepTalks.
13
 
14
- <!-- Provide a quick summary of what the model is/does. -->
 
 
 
 
15
 
16
- ## Model Details
 
 
17
 
18
- ### Model Description
 
 
 
19
 
20
- A lightweight LoRA adapter that injects memory awareness into Phi-2. It helps the assistant recall recent turns in a conversation and respond accordingly, without retraining the full model.
21
 
22
- - **Developed by:** Sourish
23
- - **Finetuned from:** `microsoft/phi-2`
24
- - **License:** MIT
25
- - **Language:** English (but generalizes to any text input)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
- ## Usage
28
 
29
- Once the adapter is added to your base model, you can load it with PEFT:
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ```python
32
  from transformers import AutoModelForCausalLM, AutoTokenizer
33
  from peft import PeftModel, LoraConfig
34
 
35
- # 1) Load the base
36
- tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
37
  model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
38
 
39
- # 2) Apply the LoRA adapter
40
- adapter_config = LoraConfig.from_pretrained("sourize/phi2-memory-lora")
41
- model = PeftModel.from_pretrained(model, adapter_config)
42
 
43
- # 3) Resize embeddings if needed
44
  model.base_model.resize_token_embeddings(len(tokenizer))
45
 
46
- # 4) Ready to generate!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
 
 
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  @misc{sourize_phi2_memory_lora,
50
- title = {phi2-memory-lora: LoRA adapter for Phi-2 with conversational memory},
51
  author = {Sourish},
52
  year = {2025},
53
- howpublished = {\url{https://huggingface.co/sourize/phi2-memory-lora}},
54
  license = {MIT}
55
  }
 
 
 
 
 
 
 
3
  library_name: peft
4
  license: mit
5
  tags:
6
+ - text-generation
7
  pipeline_tag: text-generation
8
+ datasets:
9
+ - NuclearAi/HyperThink-Mini-50K
10
  ---
11
 
12
+ # phi2-memory-deeptalks
13
 
14
+ A **LoRA adapter** for the Phi-2 language model, fine-tuned on short conversational snippets to provide **short-term memory** in dialogue. This adapter enables your assistant to recall and leverage the last few user/assistant turns—without full fine-tuning of the 2.7 B-parameter base model.
15
 
16
+ <p align="center">
17
+ <a href="https://huggingface.co/spaces/sourize/DeepTalks">
18
+ 🔗 Live Demo on Hugging Face Spaces (It takes Time to Generate Responses since it's running in CPU (free tier))
19
+ </a>
20
+ </p>
21
 
22
+ ---
23
+
24
+ ## 🚀 Overview
25
 
26
+ **phi2-memory-deeptalks** injects lightweight, low-rank corrections into the attention and MLP layers of `microsoft/phi-2`.
27
+ - **Size:** ~6 M trainable parameters (≈ 0.2 % of the base model)
28
+ - **Base:** Phi-2 (2.7 B parameters)
29
+ - **Adapter:** Low-Rank Adaptation (LoRA) via the [PEFT](https://github.com/huggingface/peft) library
30
 
31
+ ---
32
 
33
+ ## 📦 Model Details
34
+
35
+ ### Architecture & Adapter Configuration
36
+
37
+ - **Base model:** `microsoft/phi-2` (causal-LM)
38
+ - **LoRA rank (r):** 4
39
+ - **Modules wrapped:**
40
+ - Attention projections: `q_proj`, `k_proj`, `v_proj`, `dense`
41
+ - MLP layers: `fc1`, `fc2`
42
+ - **LoRA hyperparameters:**
43
+ - `lora_alpha`: 32
44
+ - `lora_dropout`: 0.05
45
+ - **Trainable params:** ~5.9 M
46
+
47
+ ### Training Data & Preprocessing
48
+
49
+ - **Dataset:** HyperThink-Mini 50 K (7 % used)
50
+ - **Prompt format:**
51
+ ```text
52
+ ### Human:
53
+ <user message>
54
+
55
+ ### Assistant:
56
+ <assistant response>
57
+ ```
58
+ - **Tokenization:** Truncated/padded to 256 tokens, `labels = input_ids`
59
+ - **Optimizer:** AdamW (PyTorch), FP16 on GPU
60
+ - **Batching:** `per_device_train_batch_size=1` + `gradient_accumulation_steps=8`
61
+ - **Epochs:** 3
62
+ - **Checkpointing:** Save every 500 steps; final adapter weights in `adapter_model.safetensors`
63
 
64
+ ---
65
 
66
+ ## 🎯 Evaluation
67
+
68
+ - **Training loss (step 500):** ~1.08
69
+ - **Validation loss:** ~1.10
70
+ - **Qualitative:**
71
+ - Improved recall of the last 2–4 turns in dialogue
72
+ - Maintains base Phi-2 fluency on general language
73
+
74
+ ---
75
+
76
+ ## 🔧 Usage
77
+
78
+ Load the adapter into your Phi-2 model with just a few lines:
79
 
80
  ```python
81
  from transformers import AutoModelForCausalLM, AutoTokenizer
82
  from peft import PeftModel, LoraConfig
83
 
84
+ # 1) Load base
85
+ tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", padding_side="left")
86
  model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
87
 
88
+ # 2) Apply LoRA adapter
89
+ peft_config = LoraConfig.from_pretrained("sourize/phi2-memory-deeptalks")
90
+ model = PeftModel.from_pretrained(model, peft_config)
91
 
92
+ # 3) (Optional) Resize embeddings
93
  model.base_model.resize_token_embeddings(len(tokenizer))
94
 
95
+ # 4) Generate
96
+ prompt = "### Human:\nHello, how are you?\n\n### Assistant:"
97
+ inputs = tokenizer(prompt, return_tensors="pt")
98
+ output = model.generate(**inputs, max_new_tokens=64)
99
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
100
+ ```
101
+
102
+ ---
103
+
104
+ ## ⚙️ Inference & Deployment
105
+
106
+ - **Preferred:** GPU (NVIDIA-CUDA) for sub-second latency
107
+ - **CPU-only:** ~7–10 min per response (large model!)
108
+ - **Hugging Face Inference API:**
109
+ ```bash
110
+ curl -X POST \
111
+ -H "Authorization: Bearer $HF_TOKEN" \
112
+ -H "Content-Type: application/json" \
113
+ https://api-inference.huggingface.co/pipeline/text-generation/sourize/phi2-memory-deeptalks \
114
+ -d '{
115
+ "inputs": "Hello, how are you?",
116
+ "parameters": {
117
+ "max_new_tokens": 64,
118
+ "do_sample": true,
119
+ "temperature": 0.7,
120
+ "top_p": 0.9,
121
+ "return_full_text": false
122
+ }
123
+ }'
124
+ ```
125
 
126
+ ---
127
 
128
+ ## 💡 Use Cases & Limitations
129
+
130
+ - **Ideal for:**
131
+ - Short back-and-forth chats (2–4 turns)
132
+ - Chatbots that need to “remember” very recent context
133
+ - **Not suited for:**
134
+ - Long-term memory or document-level retrieval
135
+ - High-volume production on CPU (too slow)
136
+
137
+ ---
138
+
139
+ ## 📖 Further Reading
140
+
141
+ - **Live Demo:** [DeepTalks Space](https://huggingface.co/spaces/sourize/DeepTalks)
142
+ - **Blog post (coming soon):** _Add link here_
143
+ - **PEFT & LoRA:** [PEFT GitHub](https://github.com/huggingface/peft) | [LoRA Paper](https://arxiv.org/abs/2106.09685)
144
+
145
+ ---
146
+
147
+ ## 🔖 Citation
148
+
149
+ ```bibtex
150
  @misc{sourize_phi2_memory_lora,
151
+ title = {phi2-memory-lora: LoRA adapter for Phi-2 with short-term conversational memory},
152
  author = {Sourish},
153
  year = {2025},
154
+ howpublished = {\url{https://huggingface.co/sourize/phi2-memory-deeptalks}},
155
  license = {MIT}
156
  }
157
+ ```
158
+
159
+ ---
160
+
161
+ *Questions or feedback? Please open an issue on the [repository](https://huggingface.co/sourize/phi2-memory-deeptalks).*
162
+ ```