Text Generation
Transformers
GGUF
PyTorch
English
sage_1b
language-model
transformer
from-scratch
tiny-stories
Instructions to use itriedcoding/Sage-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use itriedcoding/Sage-1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="itriedcoding/Sage-1B")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("itriedcoding/Sage-1B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use itriedcoding/Sage-1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "itriedcoding/Sage-1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "itriedcoding/Sage-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/itriedcoding/Sage-1B
- SGLang
How to use itriedcoding/Sage-1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "itriedcoding/Sage-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "itriedcoding/Sage-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "itriedcoding/Sage-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "itriedcoding/Sage-1B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use itriedcoding/Sage-1B with Docker Model Runner:
docker model run hf.co/itriedcoding/Sage-1B
| license: mit | |
| language: | |
| - en | |
| tags: | |
| - language-model | |
| - transformer | |
| - pytorch | |
| - from-scratch | |
| - tiny-stories | |
| datasets: | |
| - TinyStories | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| # Sage 1B | |
| A **custom 1.286 billion parameter** language model built entirely from scratch β no base models, no fine-tuning, no dependencies on existing LLM frameworks. | |
| ## Architecture | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Parameters | 1,286,155,776 | | |
| | Layers | 30 | | |
| | Hidden Size | 1536 | | |
| | Attention Heads | 12 | | |
| | Head Dimension | 128 | | |
| | Intermediate Size | 6144 | | |
| | Vocabulary | 50,000 (BPE) | | |
| | Max Sequence Length | 128 tokens | | |
| | Activation | SwiGLU | | |
| | Position Encoding | Rotary (RoPE) | | |
| | Normalization | RMSNorm | | |
| | Precision | FP16 / FP32 | | |
| ## Key Features | |
| - **Built from scratch** β Custom PyTorch implementation. Not a derivative of any existing model. | |
| - **BPE Tokenizer** β Trained a 50,000-token BPE tokenizer on the TinyStories dataset. | |
| - **Modern Architecture** β SwiGLU activations, Rotary Position Embeddings (RoPE), RMSNorm. | |
| - **Open Source** β MIT license. Weights, training code, and inference code are all available. | |
| - **GGUF Format** β Available for use with llama.cpp, Ollama, and other GGUF-compatible runners. | |
| ## Usage | |
| ### With Hugging Face Hub | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| import torch, json | |
| from tokenizers import Tokenizer | |
| config_path = hf_hub_download('itriedcoding/Sage-1B', 'config.json') | |
| tokenizer_path = hf_hub_download('itriedcoding/Sage-1B', 'tokenizer.json') | |
| weights_path = hf_hub_download('itriedcoding/Sage-1B', 'pytorch_model_state.bin') | |
| cfg = json.load(open(config_path)) | |
| tok = Tokenizer.from_file(tokenizer_path) | |
| ``` | |
| ### With GGUF (llama.cpp) | |
| ```bash | |
| wget https://huggingface.co/itriedcoding/Sage-1B/resolve/main/sage-1b-f16.gguf | |
| ./main -m sage-1b-f16.gguf -p "Once upon a time" -n 50 | |
| ``` | |
| ### Web Interface | |
| Chat with the model at: https://sage-ai.vercel.app/chat | |
| ### API | |
| ```bash | |
| curl -X POST https://sage-ai.vercel.app/api/v1/chat \ | |
| -H "Authorization: Bearer YOUR_API_KEY" \ | |
| -d '{"message": "Tell me a story"}' | |
| ``` | |
| ## Training | |
| The model was trained on the **TinyStories** dataset β a synthetic dataset of short stories designed for training compact language models. Training was performed on CPU with limited resources, making this a proof-of-concept for building LLMs from scratch without GPU access. | |
| ## Files | |
| | File | Size | Description | | |
| |------|------|-------------| | |
| | `pytorch_model_state.bin` | 2.4 GB | FP16 model weights | | |
| | `sage-1b-f16.gguf` | 2.4 GB | GGUF format for llama.cpp | | |
| | `config.json` | 1 KB | Model hyperparameters | | |
| | `tokenizer.json` | 12 MB | BPE tokenizer (50K vocab) | | |
| | `modeling_sage_1b.py` | 6 KB | Model architecture code | | |
| ## License | |
| MIT | |