--- datasets: - ofir408/MedConceptsQA language: - en metrics: - accuracy base_model: - TinyLlama/TinyLlama-1.1B-Chat-v1.0 pipeline_tag: question-answering library_name: transformers tags: - tinyllama - lora - instruction-tuned - peft - Lora - merged - medical - healthcare --- # 🩺 TinyLlama Medical Assistant (Merged LoRA) **Author:** Nabil Faieaz **Base model:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) **Fine-tuning method:** LoRA (Low-Rank Adaptation) using PEFT β†’ merged into base weights **Intended use:** Concise, factual, general medical information --- ## πŸ“Œ Overview This model is a **fine-tuned version of TinyLlama 1.1B-Chat** adapted for **medical question answering**. It has been trained to give **brief and accurate** answers to medical-related queries, following a consistent Q/A style. Key features: - βœ… LoRA fine-tuning for efficient adaptation on limited compute (T4 GPU) - βœ… Merged LoRA + base into a **single standalone model** (no separate adapter needed) - βœ… Optimized for short, factual answers β€” avoids overly verbose outputs - βœ… Context-aware: warns users to seek professional medical help for urgent/personal issues --- ## ⚠️ Disclaimer > **This model is for educational and informational purposes only.** > It is **not** a substitute for professional medical advice, diagnosis, or treatment. > Always consult a qualified healthcare provider for medical concerns. --- ## πŸš€ Quick Start ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "nabilfaieaz/tinyllama-med-full" # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained(model_id) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype="auto", device_map="auto" ) # Example prompt system_prompt = ( "You are a helpful, concise medical assistant. Provide general information only, " "not a diagnosis. If urgent or personal issues are mentioned, advise seeing a clinician." ) question = "What is hypertension?" prompt = f"{system_prompt}\n\nQuestion: {question}\nAnswer:" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=128, do_sample=False, temperature=0.0, top_p=1.0, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) 🧠 Training Details Base model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 Fine-tuning method: LoRA (via peft) Target modules: q_proj, k_proj, v_proj, o_proj LoRA config: * r = 16 * alpha = 16 * dropout = 0.0 Max sequence length: 512 tokens Batch size: 2 per device (gradient accumulation for effective batch) Learning rate: 2e-4 Precision: fp16 Evaluation: periodic eval every 200 steps Checkpoints: saved every 500 steps, final merge from checkpoint-17000 πŸ“Š Intended Use Intended: * Educational explanations of medical terms and concepts * Study aid for medical students and healthcare professionals * Healthcare-related chatbot demos Not intended: * Real-time clinical decision making * Emergency medical guidance * Handling sensitive personal medical data (PHI) βš™οΈ Technical Notes * The model is merged β€” you don’t need to separately load LoRA adapters. * Works with Hugging Face transformers β‰₯ 4.38. * Can be quantized to 4-bit (e.g., QLoRA) for local inference.