sourize
/

phi2-memory-deeptalks

Text Generation

PEFT

Safetensors

Model card Files Files and versions

xet

Community

sourize commited on May 1, 2025

Commit

e541155

verified ·

1 Parent(s): b4a7a77

Update README.md

Browse files

Files changed (1) hide show

README.md +129 -22

README.md CHANGED Viewed

@@ -3,53 +3,160 @@ base_model: microsoft/phi-2
 library_name: peft
 license: mit
 tags:
-  - text-generation
 pipeline_tag: text-generation
 ---
-# phi2-memory-lora
-This repository contains the LoRA adapter weights for `microsoft/phi-2`, fine-tuned to maintain short-term conversational memory for DeepTalks.
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-A lightweight LoRA adapter that injects memory awareness into Phi-2. It helps the assistant recall recent turns in a conversation and respond accordingly, without retraining the full model.
-- **Developed by:** Sourish
-- **Finetuned from:** `microsoft/phi-2`
-- **License:** MIT
-- **Language:** English (but generalizes to any text input)
-## Usage
-Once the adapter is added to your base model, you can load it with PEFT:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from peft import PeftModel, LoraConfig
-# 1) Load the base
-tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2")
 model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
-# 2) Apply the LoRA adapter
-adapter_config = LoraConfig.from_pretrained("sourize/phi2-memory-lora")
-model = PeftModel.from_pretrained(model, adapter_config)
-# 3) Resize embeddings if needed
 model.base_model.resize_token_embeddings(len(tokenizer))
-# 4) Ready to generate!
 @misc{sourize_phi2_memory_lora,
-  title        = {phi2-memory-lora: LoRA adapter for Phi-2 with conversational memory},
   author       = {Sourish},
   year         = {2025},
-  howpublished = {\url{https://huggingface.co/sourize/phi2-memory-lora}},
   license      = {MIT}
 }

 library_name: peft
 license: mit
 tags:
+- text-generation
 pipeline_tag: text-generation
+datasets:
+- NuclearAi/HyperThink-Mini-50K
 ---
+# phi2-memory-deeptalks
+A **LoRA adapter** for the Phi-2 language model, fine-tuned on short conversational snippets to provide **short-term memory** in dialogue. This adapter enables your assistant to recall and leverage the last few user/assistant turns—without full fine-tuning of the 2.7 B-parameter base model.
+<p align="center">
+  <a href="https://huggingface.co/spaces/sourize/DeepTalks">
+    🔗 Live Demo on Hugging Face Spaces (It takes Time to Generate Responses since it's running in CPU (free tier))
+  </a>
+</p>
+---
+## 🚀 Overview
+**phi2-memory-deeptalks** injects lightweight, low-rank corrections into the attention and MLP layers of `microsoft/phi-2`.
+- **Size:** ~6 M trainable parameters (≈ 0.2 % of the base model)
+- **Base:** Phi-2 (2.7 B parameters)
+- **Adapter:** Low-Rank Adaptation (LoRA) via the [PEFT](https://github.com/huggingface/peft) library
+---
+## 📦 Model Details
+### Architecture & Adapter Configuration
+- **Base model:** `microsoft/phi-2` (causal-LM)
+- **LoRA rank (r):** 4
+- **Modules wrapped:**
+  - Attention projections: `q_proj`, `k_proj`, `v_proj`, `dense`
+  - MLP layers: `fc1`, `fc2`
+- **LoRA hyperparameters:**
+  - `lora_alpha`: 32
+  - `lora_dropout`: 0.05
+  - **Trainable params:** ~5.9 M
+### Training Data & Preprocessing
+- **Dataset:** HyperThink-Mini 50 K (7 % used)
+- **Prompt format:**
+  ```text
+  ### Human:
+  <user message>
+  ### Assistant:
+  <assistant response>
+  ```
+- **Tokenization:** Truncated/padded to 256 tokens, `labels = input_ids`
+- **Optimizer:** AdamW (PyTorch), FP16 on GPU
+- **Batching:** `per_device_train_batch_size=1` + `gradient_accumulation_steps=8`
+- **Epochs:** 3
+- **Checkpointing:** Save every 500 steps; final adapter weights in `adapter_model.safetensors`
+---
+## 🎯 Evaluation
+- **Training loss (step 500):** ~1.08
+- **Validation loss:** ~1.10
+- **Qualitative:**
+  - Improved recall of the last 2–4 turns in dialogue
+  - Maintains base Phi-2 fluency on general language
+---
+## 🔧 Usage
+Load the adapter into your Phi-2 model with just a few lines:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from peft import PeftModel, LoraConfig
+# 1) Load base
+tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", padding_side="left")
 model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
+# 2) Apply LoRA adapter
+peft_config = LoraConfig.from_pretrained("sourize/phi2-memory-deeptalks")
+model = PeftModel.from_pretrained(model, peft_config)
+# 3) (Optional) Resize embeddings
 model.base_model.resize_token_embeddings(len(tokenizer))
+# 4) Generate
+prompt = "### Human:\nHello, how are you?\n\n### Assistant:"
+inputs = tokenizer(prompt, return_tensors="pt")
+output = model.generate(**inputs, max_new_tokens=64)
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+```
+---
+## ⚙️ Inference & Deployment
+- **Preferred:** GPU (NVIDIA-CUDA) for sub-second latency
+- **CPU-only:** ~7–10 min per response (large model!)
+- **Hugging Face Inference API:**
+  ```bash
+  curl -X POST \
+    -H "Authorization: Bearer $HF_TOKEN" \
+    -H "Content-Type: application/json" \
+    https://api-inference.huggingface.co/pipeline/text-generation/sourize/phi2-memory-deeptalks \
+    -d '{
+      "inputs": "Hello, how are you?",
+      "parameters": {
+        "max_new_tokens": 64,
+        "do_sample": true,
+        "temperature": 0.7,
+        "top_p": 0.9,
+        "return_full_text": false
+      }
+    }'
+  ```
+---
+## 💡 Use Cases & Limitations
+- **Ideal for:**
+  - Short back-and-forth chats (2–4 turns)
+  - Chatbots that need to “remember” very recent context
+- **Not suited for:**
+  - Long-term memory or document-level retrieval
+  - High-volume production on CPU (too slow)
+---
+## 📖 Further Reading
+- **Live Demo:** [DeepTalks Space](https://huggingface.co/spaces/sourize/DeepTalks)
+- **Blog post (coming soon):** _Add link here_
+- **PEFT & LoRA:** [PEFT GitHub](https://github.com/huggingface/peft) | [LoRA Paper](https://arxiv.org/abs/2106.09685)
+---
+## 🔖 Citation
+```bibtex
 @misc{sourize_phi2_memory_lora,
+  title        = {phi2-memory-lora: LoRA adapter for Phi-2 with short-term conversational memory},
   author       = {Sourish},
   year         = {2025},
+  howpublished = {\url{https://huggingface.co/sourize/phi2-memory-deeptalks}},
   license      = {MIT}
 }
+```
+---
+*Questions or feedback? Please open an issue on the [repository](https://huggingface.co/sourize/phi2-memory-deeptalks).*
+```