Instructions to use harims95/LoopLM-135M-naive with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use harims95/LoopLM-135M-naive with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="harims95/LoopLM-135M-naive", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("harims95/LoopLM-135M-naive", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use harims95/LoopLM-135M-naive with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "harims95/LoopLM-135M-naive"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "harims95/LoopLM-135M-naive",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/harims95/LoopLM-135M-naive

SGLang

How to use harims95/LoopLM-135M-naive with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "harims95/LoopLM-135M-naive" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "harims95/LoopLM-135M-naive",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "harims95/LoopLM-135M-naive" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "harims95/LoopLM-135M-naive",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use harims95/LoopLM-135M-naive with Docker Model Runner:
```
docker model run hf.co/harims95/LoopLM-135M-naive
```

LoopLM-135M-naive / spec.json

harims95

Initial release: LoopLM-135M-naive trained on FineWeb 4.6B tokens

12f0a98 verified 6 days ago

Raw

History Blame Contribute Delete

1.9 kB

	{
	"timestamp": "2026-06-28T11:27:05.869434+00:00",
	"run_name": "real_naive_fineweb_5B_2gpu",
	"git_commit": "unknown",
	"cli_args": {
	"preset": "135M",
	"run_name": "real_naive_fineweb_5B_2gpu",
	"data_dir": "/data/fineweb",
	"train_pattern": "fineweb_train_*.bin",
	"val_pattern": "fineweb_train_*.bin",
	"max_steps": 20000,
	"seq_len": 1024,
	"batch_tokens": 262144,
	"micro_batch_seqs": 32,
	"val_every": 250,
	"out_dir": "/data/runs",
	"save_every": 2500,
	"no_compile": false,
	"holdout_last_for_val": true,
	"set": [
	"use_a_matrix=false",
	"use_input_norm=false"
	]
	},
	"train_config": {
	"data_dir": "/data/fineweb_edu",
	"train_pattern": "edu_fineweb_train_*.bin",
	"val_pattern": "edu_fineweb_val_*.bin",
	"seq_len": 1024,
	"batch_tokens": 262144,
	"micro_batch_seqs": 32,
	"max_steps": 20000,
	"warmup_steps": 100,
	"cooldown_frac": 0.4,
	"final_lr_frac": 0.1,
	"muon_lr": 0.02,
	"muon_momentum": 0.95,
	"muon_wd": 0.1,
	"muon_ns_steps": 5,
	"adam_lr": 0.0003,
	"adam_betas": [
	0.9,
	0.95
	],
	"adam_wd": 0.1,
	"grad_clip": 1.0,
	"val_every": 250,
	"val_tokens": 10485760,
	"log_every": 10,
	"seed": 1337,
	"compile": true,
	"bf16": true,
	"out_dir": "/data/runs",
	"run_name": "real_naive_fineweb_5B_2gpu"
	},
	"model_config": {
	"vocab_size": 50304,
	"d_model": 1024,
	"n_prelude": 4,
	"n_coda": 2,
	"mu_rec": 6,
	"n_q_heads": 16,
	"n_kv_heads": 8,
	"head_dim": 64,
	"qk_norm": true,
	"rope_theta": 10000.0,
	"dense_ffn": 2816,
	"tie_embeddings": true,
	"final_z_loss_coef": 0.0001,
	"use_a_matrix": false,
	"use_input_norm": false,
	"init_std": 0.02
	},
	"hostname": "modal",
	"gpu_count": 2,
	"gpu_type": "NVIDIA H100 80GB HBM3",
	"pytorch_version": "2.12.0+cu130"
	}