Anchor — PaliGemma2 Multi-LoRA Server

Load multiple LoRA adapters once. Switch between them at inference time — 216ms, no reload.

What is this?

Anchor is a lightweight serving server for PaliGemma2 with multiple LoRA adapters. Unlike frameworks that load adapters per-request from disk, Anchor keeps all adapters in GPU memory simultaneously — switching is a pointer swap.

Request: model="open_circuit"  →  set_adapter()  →  generate()  →  216ms
Request: model="missing_hole"  →  set_adapter()  →  generate()  →  216ms
Request: model="base"          →  disable_adapters()             →  generate()

Quick Start

git clone https://github.com/recursia-lab/anchor
docker build -t anchor .
docker run --gpus all -v /model:/model -v /lora:/lora -p 8080:8080 anchor

API (OpenAI-compatible)

curl http://localhost:8080/v1/chat/completions \
  -d '{"model": "your_adapter", "messages": [...]}'

Framework Support

Framework	PaliGemma2 LoRA
Anchor	✅ pre-loaded, 216ms switch
vLLM	✅ per-request load
SGLang	🚧 PR #24034

Community Adapters

See recursia-lab/paligemma2-adapters for a curated index of community fine-tuned PaliGemma2 LoRA adapters.

Built by Recursia Lab • Apache 2.0

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support