Anchor β PaliGemma2 Multi-LoRA Server
Load multiple LoRA adapters once. Switch between them at inference time β 216ms, no reload.
β GitHub: recursia-lab/anchor
What is this?
Anchor is a lightweight serving server for PaliGemma2 with multiple LoRA adapters. Unlike frameworks that load adapters per-request from disk, Anchor keeps all adapters in GPU memory simultaneously β switching is a pointer swap.
Request: model="open_circuit" β set_adapter() β generate() β 216ms
Request: model="missing_hole" β set_adapter() β generate() β 216ms
Request: model="base" β disable_adapters() β generate()
Quick Start
git clone https://github.com/recursia-lab/anchor
docker build -t anchor .
docker run --gpus all -v /model:/model -v /lora:/lora -p 8080:8080 anchor
API (OpenAI-compatible)
curl http://localhost:8080/v1/chat/completions \
-d '{"model": "your_adapter", "messages": [...]}'
Framework Support
| Framework | PaliGemma2 LoRA |
|---|---|
| Anchor | β pre-loaded, 216ms switch |
| vLLM | β per-request load |
| SGLang | π§ PR #24034 |
Community Adapters
See recursia-lab/paligemma2-adapters for a curated index of community fine-tuned PaliGemma2 LoRA adapters.
Built by Recursia Lab β’ Apache 2.0
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support