Anchor β€” PaliGemma2 Multi-LoRA Server

Load multiple LoRA adapters once. Switch between them at inference time β€” 216ms, no reload.

β†’ GitHub: recursia-lab/anchor

What is this?

Anchor is a lightweight serving server for PaliGemma2 with multiple LoRA adapters. Unlike frameworks that load adapters per-request from disk, Anchor keeps all adapters in GPU memory simultaneously β€” switching is a pointer swap.

Request: model="open_circuit"  β†’  set_adapter()  β†’  generate()  β†’  216ms
Request: model="missing_hole"  β†’  set_adapter()  β†’  generate()  β†’  216ms
Request: model="base"          β†’  disable_adapters()             β†’  generate()

Quick Start

git clone https://github.com/recursia-lab/anchor
docker build -t anchor .
docker run --gpus all -v /model:/model -v /lora:/lora -p 8080:8080 anchor

API (OpenAI-compatible)

curl http://localhost:8080/v1/chat/completions \
  -d '{"model": "your_adapter", "messages": [...]}'

Framework Support

Framework PaliGemma2 LoRA
Anchor βœ… pre-loaded, 216ms switch
vLLM βœ… per-request load
SGLang 🚧 PR #24034

Community Adapters

See recursia-lab/paligemma2-adapters for a curated index of community fine-tuned PaliGemma2 LoRA adapters.


Built by Recursia Lab β€’ Apache 2.0

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support