Instructions to use LLM-OS-Models/gdn1-32k-anchor-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LLM-OS-Models/gdn1-32k-anchor-1b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LLM-OS-Models/gdn1-32k-anchor-1b")# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("LLM-OS-Models/gdn1-32k-anchor-1b", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LLM-OS-Models/gdn1-32k-anchor-1b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LLM-OS-Models/gdn1-32k-anchor-1b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/gdn1-32k-anchor-1b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LLM-OS-Models/gdn1-32k-anchor-1b
- SGLang
How to use LLM-OS-Models/gdn1-32k-anchor-1b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LLM-OS-Models/gdn1-32k-anchor-1b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/gdn1-32k-anchor-1b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LLM-OS-Models/gdn1-32k-anchor-1b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LLM-OS-Models/gdn1-32k-anchor-1b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LLM-OS-Models/gdn1-32k-anchor-1b with Docker Model Runner:
docker model run hf.co/LLM-OS-Models/gdn1-32k-anchor-1b
GDN1 32K Anchor 1B Full Fine-Tune
This is a research checkpoint from the Long-GDN workspace.
Base Model
- Base checkpoint:
linear-moe-hub/Gated-Deltanet-1.3B - Architecture: Gated DeltaNet / linear recurrent attention
- Base training data reported by the upstream model card: SlimPajama 100B-token sample
- License inherited from upstream model card: Apache-2.0
Training Run
- Local source path:
runs/gdn1_32k_anchor_from_balanced200_1b_bs10_ft/final - Tokenizer source:
runs/gdn1_32k_anchor_from_balanced200_1b_bs10_ft/final - Training mode: full fine-tuning, no LoRA/adapter
- Hardware target: 8x NVIDIA H200
- Sequence length: 32768
- Approximate additional token budget: ~1B additional tokens
- Manifest/config:
configs/gdn1_memory_mix_32k_anchor_recovery.json
Intended Research Use
This checkpoint is intended for research on:
- long-context associative recall
- RULER/MQAR-style state tracking
- recurrent-state contamination during long generation
- Reference-State Reset with Rolling Replay, a GDN/RNN adaptation of the R-SWA idea
Usage
These checkpoints use the FLA Gated DeltaNet implementation. In the current
Long-GDN environment, plain GatedDeltaNetForCausalLM.from_pretrained() can
hit a Transformers 5.x tied-weight metadata issue. The robust path is to patch
the FLA tied-weight metadata before loading.
Install/runtime requirements:
pip install torch transformers safetensors huggingface_hub
# plus an FLA package/source tree that provides:
# fla.models.gated_deltanet.GatedDeltaNetForCausalLM
CPU Example
import torch
from transformers import AutoTokenizer
from fla.models.gated_deltanet import GatedDeltaNetForCausalLM
repo_id = "LLM-OS-Models/gdn1-32k-anchor-1b"
# Transformers 5.x compatibility patch for the installed FLA class.
if isinstance(getattr(GatedDeltaNetForCausalLM, "_tied_weights_keys", None), list):
GatedDeltaNetForCausalLM._tied_weights_keys = {
"lm_head.weight": "model.embeddings.weight"
}
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model = GatedDeltaNetForCausalLM.from_pretrained(
repo_id,
torch_dtype=torch.float32,
)
model.eval()
prompt = "A special magic number is 12345. What is the special magic number?"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=32,
do_sample=False,
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Single-GPU bf16 Example
import torch
from transformers import AutoTokenizer
from fla.models.gated_deltanet import GatedDeltaNetForCausalLM
repo_id = "LLM-OS-Models/gdn1-32k-anchor-1b"
if isinstance(getattr(GatedDeltaNetForCausalLM, "_tied_weights_keys", None), list):
GatedDeltaNetForCausalLM._tied_weights_keys = {
"lm_head.weight": "model.embeddings.weight"
}
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=True)
model = GatedDeltaNetForCausalLM.from_pretrained(
repo_id,
torch_dtype=torch.bfloat16,
).to("cuda")
model.eval()
prompt = "Reference facts:\n- key_alpha: value_123\n\nQuestion: key_alpha?\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=32,
do_sample=False,
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Long-GDN Local Loader
The project repository includes a more defensive loader at
scripts/gdn1_common.py::load_gdn1_causal_lm. It handles the compatibility
patch and older public-checkpoint key conversion used in local experiments.
from pathlib import Path
import torch
from transformers import AutoTokenizer
from scripts.gdn1_common import load_gdn1_causal_lm
repo_or_local_path = Path("path/to/downloaded/checkpoint")
tokenizer = AutoTokenizer.from_pretrained(repo_or_local_path, use_fast=True)
model = load_gdn1_causal_lm(repo_or_local_path, torch_dtype=torch.bfloat16).to("cuda")
Known Results
Anchor-heavy continuation from balanced checkpoint-200. Checkpoint sweep did not repair 32K and damaged 16K; not selected as current best.
Caveats
Not the current best checkpoint. Uploaded for ablation/audit only.
Citation Context
Relevant background papers include Gated Delta Networks, Gated DeltaNet-2, Log-Linear Attention, and Unlimited OCR / R-SWA. This checkpoint does not implement a new architecture by itself; it is part of a checkpoint-preserving full fine-tuning and inference-control study.
- Downloads last month
- 36
Model tree for LLM-OS-Models/gdn1-32k-anchor-1b
Base model
linear-moe-hub/Gated-Deltanet-1.3B