YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

VLM Distillation (LLaVA)

Small toolkit for training and serving a custom vision-language model (VLM) using a vision encoder + LoRA-tuned language model + projector.

Main Files

  • vlm_distill_LLaVA.py: Train pipeline for LLaVA-style data (llava_images_100k/). Builds model, trains, and saves checkpoints.
  • test_LLaVA.py: Loads a trained checkpoint and runs single-sample inference on the dataset split.
  • run_model_LLaVA.py: FastAPI server for inference (/chat) from local image path or base64 image.
  • data_extract.py: Dataset/data extraction helper.
  • requirements.txt: Python dependencies.
  • checkpoints/: Saved LoRA adapters + projector weights.
  • images_test/: Local images for quick inference testing.

Quick Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run API Server to hit model

python run_model_LLaVA.py

Server starts on http://0.0.0.0:8000 with endpoint:

  • POST /chat
    • fields: prompt, and either image_path or image_base64

Example request body:

{
  "prompt": "Summarize the image in one sentence.",
  "image_path": "images_test/bowl.jpg"
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support