Sage-1B / README.md
itriedcoding's picture
Upload README.md with huggingface_hub
6923f6b verified
metadata
license: mit
language:
  - en
tags:
  - language-model
  - transformer
  - pytorch
  - from-scratch
  - tiny-stories
datasets:
  - TinyStories
library_name: transformers
pipeline_tag: text-generation

Sage 1B

A custom 1.286 billion parameter language model built entirely from scratch — no base models, no fine-tuning, no dependencies on existing LLM frameworks.

Architecture

Parameter Value
Parameters 1,286,155,776
Layers 30
Hidden Size 1536
Attention Heads 12
Head Dimension 128
Intermediate Size 6144
Vocabulary 50,000 (BPE)
Max Sequence Length 128 tokens
Activation SwiGLU
Position Encoding Rotary (RoPE)
Normalization RMSNorm
Precision FP16 / FP32

Key Features

  • Built from scratch — Custom PyTorch implementation. Not a derivative of any existing model.
  • BPE Tokenizer — Trained a 50,000-token BPE tokenizer on the TinyStories dataset.
  • Modern Architecture — SwiGLU activations, Rotary Position Embeddings (RoPE), RMSNorm.
  • Open Source — MIT license. Weights, training code, and inference code are all available.
  • GGUF Format — Available for use with llama.cpp, Ollama, and other GGUF-compatible runners.

Usage

With Hugging Face Hub

from huggingface_hub import hf_hub_download
import torch, json
from tokenizers import Tokenizer

config_path = hf_hub_download('itriedcoding/Sage-1B', 'config.json')
tokenizer_path = hf_hub_download('itriedcoding/Sage-1B', 'tokenizer.json')
weights_path = hf_hub_download('itriedcoding/Sage-1B', 'pytorch_model_state.bin')

cfg = json.load(open(config_path))
tok = Tokenizer.from_file(tokenizer_path)

With GGUF (llama.cpp)

wget https://huggingface.co/itriedcoding/Sage-1B/resolve/main/sage-1b-f16.gguf
./main -m sage-1b-f16.gguf -p "Once upon a time" -n 50

Web Interface

Chat with the model at: https://sage-ai.vercel.app/chat

API

curl -X POST https://sage-ai.vercel.app/api/v1/chat \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"message": "Tell me a story"}'

Training

The model was trained on the TinyStories dataset — a synthetic dataset of short stories designed for training compact language models. Training was performed on CPU with limited resources, making this a proof-of-concept for building LLMs from scratch without GPU access.

Files

File Size Description
pytorch_model_state.bin 2.4 GB FP16 model weights
sage-1b-f16.gguf 2.4 GB GGUF format for llama.cpp
config.json 1 KB Model hyperparameters
tokenizer.json 12 MB BPE tokenizer (50K vocab)
modeling_sage_1b.py 6 KB Model architecture code

License

MIT