Metadata Conditioned LLMs
Collection
Pretraining Data: English NOW corpus (english-corpora.org/now). Paper: arxiv.org/abs/2601.15236. Code: github.com/iamshnoo/metadata_localization • 92 items • Updated
How to use iamshnoo/asia_with_metadata_1b with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="iamshnoo/asia_with_metadata_1b") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("iamshnoo/asia_with_metadata_1b")
model = AutoModelForCausalLM.from_pretrained("iamshnoo/asia_with_metadata_1b")How to use iamshnoo/asia_with_metadata_1b with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "iamshnoo/asia_with_metadata_1b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "iamshnoo/asia_with_metadata_1b",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/iamshnoo/asia_with_metadata_1b
How to use iamshnoo/asia_with_metadata_1b with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "iamshnoo/asia_with_metadata_1b" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "iamshnoo/asia_with_metadata_1b",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "iamshnoo/asia_with_metadata_1b" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "iamshnoo/asia_with_metadata_1b",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use iamshnoo/asia_with_metadata_1b with Docker Model Runner:
docker model run hf.co/iamshnoo/asia_with_metadata_1b
This repo contains the asia 1b model at the final 10k-step checkpoint for the metadata localization project. It was trained from scratch on the project corpus, using the Llama 3.2 tokenizer and vocabulary.
pretrainlocal_continent1bwith_metadataTrained from scratch; tokenizer/vocabulary from meta-llama/Llama-3.2-1B20/12/2025_09:26:57_combined_no_asia_with_metadata_1bhttps://wandb.ai/iamshnoo/nanotron/runs/xbk5hpp1finished114h 43m 7sKPI/train_lm_loss: 2.0704KPI/train_perplexity: 7.9279KPI/val_loss: 2.1182KPI/val_perplexity: 8.3164KPI/consumed_tokens/train: 41,943,040,000_step: 10,000train_steps: 10,000sequence_length: 2,048micro_batch_size: 8batch_accumulation_per_replica: 64learning_rate: 0.003min_decay_lr: 0.0003checkpoint_interval: 1,000Static plots below were exported from the private Weights & Biases run and embedded here for public access.
This model is part of the metadata localization release. Related checkpoints and variants are grouped in the public Hugging Face collection Metadata Conditioned LLMs.
Last synced: 2026-04-02 14:42:04 UTC