--- base_model: microsoft/phi-2 library_name: peft license: mit tags: - text-generation pipeline_tag: text-generation datasets: - NuclearAi/HyperThink-Mini-50K --- # phi2-memory-deeptalks A **LoRA adapter** for the Phi-2 language model, fine-tuned on short conversational snippets to provide **short-term memory** in dialogue. This adapter enables your assistant to recall and leverage the last few user/assistant turnsโ€”without full fine-tuning of the 2.7 B-parameter base model.

๐Ÿ”— Live Demo on Hugging Face Spaces

โณ It takes time to generate responses since it's running on the CPU free tier

--- ## ๐Ÿš€ Overview **phi2-memory-deeptalks** injects lightweight, low-rank corrections into the attention and MLP layers of `microsoft/phi-2`. - **Size:** ~6 M trainable parameters (โ‰ˆ 0.2 % of the base model) - **Base:** Phi-2 (2.7 B parameters) - **Adapter:** Low-Rank Adaptation (LoRA) via the [PEFT](https://github.com/huggingface/peft) library --- ## ๐Ÿ“ฆ Model Details ### Architecture & Adapter Configuration - **Base model:** `microsoft/phi-2` (causal-LM) - **LoRA rank (r):** 4 - **Modules wrapped:** - Attention projections: `q_proj`, `k_proj`, `v_proj`, `dense` - MLP layers: `fc1`, `fc2` - **LoRA hyperparameters:** - `lora_alpha`: 32 - `lora_dropout`: 0.05 - **Trainable params:** ~5.9 M ### Training Data & Preprocessing - **Dataset:** HyperThink-Mini 50 K (7 % used) - **Prompt format:** ```text ### Human: ### Assistant: ``` - **Tokenization:** Truncated/padded to 256 tokens, `labels = input_ids` - **Optimizer:** AdamW (PyTorch), FP16 on GPU - **Batching:** `per_device_train_batch_size=1` + `gradient_accumulation_steps=8` - **Epochs:** 3 - **Checkpointing:** Save every 500 steps; final adapter weights in `adapter_model.safetensors` --- ## ๐ŸŽฏ Evaluation - **Training loss (step 500):** ~1.08 - **Validation loss:** ~1.10 - **Qualitative:** - Improved recall of the last 2โ€“4 turns in dialogue - Maintains base Phi-2 fluency on general language --- ## ๐Ÿ”ง Usage Load the adapter into your Phi-2 model with just a few lines: ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel, LoraConfig # 1) Load base tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", padding_side="left") model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2") # 2) Apply LoRA adapter peft_config = LoraConfig.from_pretrained("sourize/phi2-memory-deeptalks") model = PeftModel.from_pretrained(model, peft_config) # 3) (Optional) Resize embeddings model.base_model.resize_token_embeddings(len(tokenizer)) # 4) Generate prompt = "### Human:\nHello, how are you?\n\n### Assistant:" inputs = tokenizer(prompt, return_tensors="pt") output = model.generate(**inputs, max_new_tokens=64) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` --- ## โš™๏ธ Inference & Deployment - **Preferred:** GPU (NVIDIA-CUDA) for sub-second latency - **CPU-only:** ~7โ€“10 min per response (large model!) - **Hugging Face Inference API:** ```bash curl -X POST \ -H "Authorization: Bearer $HF_TOKEN" \ -H "Content-Type: application/json" \ https://api-inference.huggingface.co/pipeline/text-generation/sourize/phi2-memory-deeptalks \ -d '{ "inputs": "Hello, how are you?", "parameters": { "max_new_tokens": 64, "do_sample": true, "temperature": 0.7, "top_p": 0.9, "return_full_text": false } }' ``` --- ## ๐Ÿ’ก Use Cases & Limitations - **Ideal for:** - Short back-and-forth chats (2โ€“4 turns) - Chatbots that need to โ€œrememberโ€ very recent context - **Not suited for:** - Long-term memory or document-level retrieval - High-volume production on CPU (too slow) --- ## ๐Ÿ“– Further Reading - **Live Demo:** [DeepTalks Space](https://huggingface.co/spaces/sourize/DeepTalks) - **Blog post:** [DeepTalks Blog](https://sourish.xyz/thoughts/deeptalks-your-personal-ai-companion) - **PEFT & LoRA:** [PEFT GitHub](https://github.com/huggingface/peft) | [LoRA Paper](https://arxiv.org/abs/2106.09685) --- ## ๐Ÿ”– Citation ```bibtex @misc{sourize_phi2_memory_deeptalks, title = {phi2-memory-lora: LoRA adapter for Phi-2 with short-term conversational memory}, author = {Sourish}, year = {2025}, howpublished = {\url{https://huggingface.co/sourize/phi2-memory-deeptalks}}, license = {MIT} } ``` --- *Questions or feedback? Please open an issue on the [repository](https://huggingface.co/sourize/phi2-memory-deeptalks).* ```