Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +67 -3

README.md CHANGED Viewed

@@ -1,3 +1,67 @@
----
-license: mit
----

+# LLaVA-Phi Model
+This is a vision-language model based on Microsoft's Phi-1.5 architecture with CLIP for image processing capabilities.
+## Model Description
+- **Base Model**: Microsoft Phi-1.5
+- **Vision Encoder**: CLIP ViT-B/32
+- **Training**: QLoRA fine-tuning
+- **Dataset**: Instruct 150K
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer, AutoProcessor
+import torch
+from PIL import Image
+# Load model and tokenizer
+model = AutoModelForCausalLM.from_pretrained("sagar007/Lava_phi")
+tokenizer = AutoTokenizer.from_pretrained("sagar007/Lava_phi")
+processor = AutoProcessor.from_pretrained("openai/clip-vit-base-patch32")
+# For text
+def generate_text(prompt):
+    inputs = tokenizer(f"human: {prompt}\ngpt:", return_tensors="pt")
+    outputs = model.generate(**inputs, max_new_tokens=128)
+    return tokenizer.decode(outputs[0], skip_special_tokens=True)
+# For images
+def process_image_and_prompt(image_path, prompt):
+    image = Image.open(image_path)
+    image_tensor = processor(images=image, return_tensors="pt").pixel_values
+    inputs = tokenizer(f"human: <image>\n{prompt}\ngpt:", return_tensors="pt")
+    outputs = model.generate(
+        input_ids=inputs["input_ids"],
+        attention_mask=inputs["attention_mask"],
+        images=image_tensor,
+        max_new_tokens=128
+    )
+    return tokenizer.decode(outputs[0], skip_special_tokens=True)
+```
+## Training Details
+- Trained using QLoRA (Quantized Low-Rank Adaptation)
+- 4-bit quantization for efficiency
+- Gradient checkpointing enabled
+- Mixed precision training (bfloat16)
+## License
+MIT License
+## Citation
+```bibtex
+@software{llava_phi_2024,
+  author = {sagar007},
+  title = {LLaVA-Phi: Vision-Language Model},
+  year = {2024},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/sagar007/Lava_phi}
+}
+```