Buckets:
| license: mit | |
| language: | |
| - en | |
| tags: | |
| - language-model | |
| - transformer | |
| - pytorch | |
| - from-scratch | |
| - tiny-stories | |
| datasets: | |
| - TinyStories | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| # Sage 1B | |
| A **custom 1.286 billion parameter** language model built entirely from scratch — no base models, no fine-tuning, no dependencies on existing LLM frameworks. | |
| ## Architecture | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Parameters | 1,286,155,776 | | |
| | Layers | 30 | | |
| | Hidden Size | 1536 | | |
| | Attention Heads | 12 | | |
| | Head Dimension | 128 | | |
| | Intermediate Size | 6144 | | |
| | Vocabulary | 50,000 (BPE) | | |
| | Max Sequence Length | 128 tokens | | |
| | Activation | SwiGLU | | |
| | Position Encoding | Rotary (RoPE) | | |
| | Normalization | RMSNorm | | |
| | Precision | FP16 / FP32 | | |
| ## Key Features | |
| - **Built from scratch** — Custom PyTorch implementation. Not a derivative of any existing model. | |
| - **BPE Tokenizer** — Trained a 50,000-token BPE tokenizer on the TinyStories dataset. | |
| - **Modern Architecture** — SwiGLU activations, Rotary Position Embeddings (RoPE), RMSNorm. | |
| - **Open Source** — MIT license. Weights, training code, and inference code are all available. | |
| - **GGUF Format** — Available for use with llama.cpp, Ollama, and other GGUF-compatible runners. | |
| ## Usage | |
| ### With Hugging Face Hub | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| import torch, json | |
| from tokenizers import Tokenizer | |
| config_path = hf_hub_download('itriedcoding/Sage-1B', 'config.json') | |
| tokenizer_path = hf_hub_download('itriedcoding/Sage-1B', 'tokenizer.json') | |
| weights_path = hf_hub_download('itriedcoding/Sage-1B', 'pytorch_model_state.bin') | |
| cfg = json.load(open(config_path)) | |
| tok = Tokenizer.from_file(tokenizer_path) | |
| ``` | |
| ### With GGUF (llama.cpp) | |
| ```bash | |
| wget https://huggingface.co/itriedcoding/Sage-1B/resolve/main/sage-1b-f16.gguf | |
| ./main -m sage-1b-f16.gguf -p "Once upon a time" -n 50 | |
| ``` | |
| ### Web Interface | |
| Chat with the model at: https://sage-ai.vercel.app/chat | |
| ### API | |
| ```bash | |
| curl -X POST https://sage-ai.vercel.app/api/v1/chat \ | |
| -H "Authorization: Bearer YOUR_API_KEY" \ | |
| -d '{"message": "Tell me a story"}' | |
| ``` | |
| ## Training | |
| The model was trained on the **TinyStories** dataset — a synthetic dataset of short stories designed for training compact language models. Training was performed on CPU with limited resources, making this a proof-of-concept for building LLMs from scratch without GPU access. | |
| ## Files | |
| | File | Size | Description | | |
| |------|------|-------------| | |
| | `pytorch_model_state.bin` | 2.4 GB | FP16 model weights | | |
| | `sage-1b-f16.gguf` | 2.4 GB | GGUF format for llama.cpp | | |
| | `config.json` | 1 KB | Model hyperparameters | | |
| | `tokenizer.json` | 12 MB | BPE tokenizer (50K vocab) | | |
| | `modeling_sage_1b.py` | 6 KB | Model architecture code | | |
| ## License | |
| MIT | |
Xet Storage Details
- Size:
- 2.78 kB
- Xet hash:
- 094001ef007be7927f5561a74810986651f045ac39902bcf84db45203401e071
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.