How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "itriedcoding/Sage-1B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "itriedcoding/Sage-1B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Use Docker
docker model run hf.co/itriedcoding/Sage-1B
Quick Links

Sage 1B

A custom 1.286 billion parameter language model built entirely from scratch β€” no base models, no fine-tuning, no dependencies on existing LLM frameworks.

Architecture

Parameter Value
Parameters 1,286,155,776
Layers 30
Hidden Size 1536
Attention Heads 12
Head Dimension 128
Intermediate Size 6144
Vocabulary 50,000 (BPE)
Max Sequence Length 128 tokens
Activation SwiGLU
Position Encoding Rotary (RoPE)
Normalization RMSNorm
Precision FP16 / FP32

Key Features

  • Built from scratch β€” Custom PyTorch implementation. Not a derivative of any existing model.
  • BPE Tokenizer β€” Trained a 50,000-token BPE tokenizer on the TinyStories dataset.
  • Modern Architecture β€” SwiGLU activations, Rotary Position Embeddings (RoPE), RMSNorm.
  • Open Source β€” MIT license. Weights, training code, and inference code are all available.
  • GGUF Format β€” Available for use with llama.cpp, Ollama, and other GGUF-compatible runners.

Usage

With Hugging Face Hub

from huggingface_hub import hf_hub_download
import torch, json
from tokenizers import Tokenizer

config_path = hf_hub_download('itriedcoding/Sage-1B', 'config.json')
tokenizer_path = hf_hub_download('itriedcoding/Sage-1B', 'tokenizer.json')
weights_path = hf_hub_download('itriedcoding/Sage-1B', 'pytorch_model_state.bin')

cfg = json.load(open(config_path))
tok = Tokenizer.from_file(tokenizer_path)

With GGUF (llama.cpp)

wget https://huggingface.co/itriedcoding/Sage-1B/resolve/main/sage-1b-f16.gguf
./main -m sage-1b-f16.gguf -p "Once upon a time" -n 50

Web Interface

Chat with the model at: https://sage-ai.vercel.app/chat

API

curl -X POST https://sage-ai.vercel.app/api/v1/chat \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"message": "Tell me a story"}'

Training

The model was trained on the TinyStories dataset β€” a synthetic dataset of short stories designed for training compact language models. Training was performed on CPU with limited resources, making this a proof-of-concept for building LLMs from scratch without GPU access.

Files

File Size Description
pytorch_model_state.bin 2.4 GB FP16 model weights
sage-1b-f16.gguf 2.4 GB GGUF format for llama.cpp
config.json 1 KB Model hyperparameters
tokenizer.json 12 MB BPE tokenizer (50K vocab)
modeling_sage_1b.py 6 KB Model architecture code

License

MIT

Downloads last month
261
GGUF
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using itriedcoding/Sage-1B 1