YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
VLM Distillation (LLaVA)
Small toolkit for training and serving a custom vision-language model (VLM) using a vision encoder + LoRA-tuned language model + projector.
Main Files
vlm_distill_LLaVA.py: Train pipeline for LLaVA-style data (llava_images_100k/). Builds model, trains, and saves checkpoints.test_LLaVA.py: Loads a trained checkpoint and runs single-sample inference on the dataset split.run_model_LLaVA.py: FastAPI server for inference (/chat) from local image path or base64 image.data_extract.py: Dataset/data extraction helper.requirements.txt: Python dependencies.checkpoints/: Saved LoRA adapters + projector weights.images_test/: Local images for quick inference testing.
Quick Setup
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Run API Server to hit model
python run_model_LLaVA.py
Server starts on http://0.0.0.0:8000 with endpoint:
POST /chat- fields:
prompt, and eitherimage_pathorimage_base64
- fields:
Example request body:
{
"prompt": "Summarize the image in one sentence.",
"image_path": "images_test/bowl.jpg"
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support