Text Generation
Transformers
Safetensors
English
llama
causal-lm
from-scratch
grouped-query-attention
rope
swiglu
chatml
conversational
Eval Results (legacy)
text-generation-inference
Instructions to use jbomdev/AlterEgo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jbomdev/AlterEgo with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="jbomdev/AlterEgo") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("jbomdev/AlterEgo") model = AutoModelForCausalLM.from_pretrained("jbomdev/AlterEgo") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use jbomdev/AlterEgo with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jbomdev/AlterEgo" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jbomdev/AlterEgo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/jbomdev/AlterEgo
- SGLang
How to use jbomdev/AlterEgo with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jbomdev/AlterEgo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jbomdev/AlterEgo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jbomdev/AlterEgo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jbomdev/AlterEgo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use jbomdev/AlterEgo with Docker Model Runner:
docker model run hf.co/jbomdev/AlterEgo
File size: 9,996 Bytes
25907b8 603b7a6 25907b8 603b7a6 3f17d90 603b7a6 13d560f 603b7a6 13d560f 603b7a6 f66ef7a 603b7a6 f66ef7a 603b7a6 f66ef7a 603b7a6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 | ---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- text-generation
- causal-lm
- from-scratch
- llama
- grouped-query-attention
- rope
- swiglu
- chatml
datasets:
- HuggingFaceFW/fineweb-edu
- HuggingFaceH4/ultrachat_200k
model-index:
- name: AlterEgo-373M
results:
- task: {type: text-generation}
dataset: {name: lambada_openai, type: lambada_openai}
metrics: [{type: acc, value: 0.3161}]
- task: {type: text-generation}
dataset: {name: hellaswag, type: hellaswag}
metrics: [{type: acc_norm, value: 0.38}]
- task: {type: text-generation}
dataset: {name: arc_easy, type: arc_easy}
metrics: [{type: acc_norm, value: 0.5269}]
- task: {type: text-generation}
dataset: {name: arc_challenge, type: arc_challenge}
metrics: [{type: acc_norm, value: 0.273}]
- task: {type: text-generation}
dataset: {name: piqa, type: piqa}
metrics: [{type: acc_norm, value: 0.6567}]
- task: {type: text-generation}
dataset: {name: winogrande, type: winogrande}
metrics: [{type: acc, value: 0.513}]
- task: {type: text-generation}
dataset: {name: openbookqa, type: openbookqa}
metrics: [{type: acc_norm, value: 0.322}]
- task: {type: text-generation}
dataset: {name: sciq, type: sciq}
metrics: [{type: acc_norm, value: 0.722}]
- task: {type: text-generation}
dataset: {name: boolq, type: boolq}
metrics: [{type: acc, value: 0.6177}]
---
<div align="center">
# 🧠 AlterEgo-373M
**A 373-million-parameter language model designed, trained, and served entirely from scratch.**
[-181717?logo=github)](https://github.com/J-bom/AlterEgo)
[-181717?logo=github)](https://github.com/J-bom/LLME)
[]()
[]()
[](https://huggingface.co/jbomdev/AlterEgo-GGUF)
</div>
---
## Introduction
**AlterEgo** is a small, decoder-only language model built from the ground up - not a fine-tune of an existing model. Every part was written from zero: the transformer architecture, the training loop, the tokenizer wiring, and the KV-cached inference engine. It was pre-trained on ~10B tokens of high-quality educational web text and then instruction-tuned for chat.
It is the model at the heart of **[LLME](https://github.com/J-bom/LLME)**, a self-hosted, end-to-end-encrypted LLM platform (think LM Studio / Open WebUI / Ollama, also built from scratch). LLME can serve AlterEgo alongside `llama.cpp` GGUF models and the Gemini API; AlterEgo is the "house" model it was designed around.
This repository contains the **model**. The training and architecture code lives in the [AlterEgo repo](https://github.com/J-bom/AlterEgo); the serving platform lives in the [LLME repo](https://github.com/J-bom/LLME).
> **Two formats are published.** This repo is the Hugging Face `LlamaForCausalLM` conversion, for drop-in use with `transformers`, vLLM, and GGUF tooling. The **original checkpoint** - in AlterEgo's own from-scratch architecture, exactly as trained - is published separately as [`jbomdev/alterego_raw`](https://huggingface.co/jbomdev/AlterEgo_raw). This version is a **numerically-lossless conversion** of it (verified: max logit difference ~1e-6).
> **What it is and isn't.** AlterEgo is a *research / learning artifact* - a demonstration of the full modern LLM pipeline (architecture → pretraining → SFT → serving) at a scale one person can train on a single GPU. It is **not** a production assistant and won't compete with billion-parameter models. See [Limitations](#limitations).
## Architecture
A modern Llama-style decoder (and, thanks to that, it loads as a standard `LlamaForCausalLM`).
| Component | Choice |
|---|---|
| Type | Decoder-only transformer (autoregressive) |
| Parameters | ~373M (input/output embeddings tied) |
| Layers | 24 |
| Model dimension | 1024 |
| Attention | **Grouped-Query Attention** - 16 query heads / 4 KV heads (head dim 64) |
| Positional encoding | **Rotary embeddings (RoPE)**, θ = 10,000 |
| Normalization | **RMSNorm** (pre-norm) |
| Feed-forward | **SwiGLU**, hidden dim 2816 |
| Context length | 2048 |
| Vocabulary | 100,352 |
| Tokenizer | `cl100k_base` (tiktoken) extended with ChatML special tokens |
## Training
AlterEgo was trained in two stages on a single NVIDIA RTX 4090.
### Stage 1 - Pretraining
Pre-trained on **[FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)** (HuggingFaceFW), a quality-filtered educational subset of CommonCrawl.


The grad-norm settling to ~0.26 and the smooth cosine-shaped loss indicate stable training with no divergence.
### Stage 2 - Supervised fine-tuning
Instruction-tuned on **[UltraChat-200K](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)** (HuggingFaceH4), formatted as multi-turn **ChatML**.

### Hyperparameters
| | Pretraining | SFT |
|---|---|---|
| Dataset | FineWeb-Edu | UltraChat-200K |
| Tokens / steps | ~10B / 19,073 | ~64M / 244 |
| Global batch | 524,288 tokens (micro 2 × 2048 × 128 grad-accum) | same scheme |
| Optimizer | AdamW (β = 0.9, 0.95; ε = 1e-8; fused) | same |
| Weight decay | 0.1 (decoupled; excluded from norms/biases) | same |
| LR schedule | linear warmup (1,900 steps) → cosine decay | cosine |
| Peak / min LR | 3e-4 → 3e-5 | low (fine-tune range) |
| Grad clipping | global-norm 1.0 | 1.0 |
| Precision | bfloat16 autocast | bfloat16 |
| Throughput / wall-clock | ~32k tok/s · ~86 GPU-h (3.6 days) | ~39k tok/s · ~28 min |
| Other | `torch.compile`, gradient checkpointing, FlashAttention (SDPA) | same |
| Final loss (train / val) | 2.94 / **2.89** | 1.83 / **1.81** |
## Evaluation
Benchmarked with [EleutherAI's lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) (0-shot).
| Benchmark | Metric | AlterEgo-373M | Random |
|---|---|---|---|
| lambada_openai | acc | 31.6% | ~0% |
| hellaswag | acc_norm | 38.0% | 25% |
| arc_easy | acc_norm | 52.7% | 25% |
| arc_challenge | acc_norm | 27.3% | 25% |
| piqa | acc_norm | 65.7% | 50% |
| winogrande | acc | 51.3% | 50% |
| openbookqa | acc_norm | 32.2% | 25% |
| sciq | acc_norm | 72.2% | 25% |
| boolq | acc | 61.8% | 50% |
For a 373M model trained on ~10B tokens these are solid: clearly above chance on science and commonsense (SciQ, PIQA, BoolQ, ARC-easy, HellaSwag) and on next-word prediction (LAMBADA — perplexity 62.3), with the expected near-chance results on the hardest reasoning sets (ARC-challenge, WinoGrande).
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("jbomdev/AlterEgo")
model = AutoModelForCausalLM.from_pretrained("jbomdev/AlterEgo", torch_dtype=torch.bfloat16)
messages = [
{"role": "system", "content":
"You are Alter Ego, a small AI built from scratch. You're casual and direct. "
"You're not great with facts, math, or current events - when you don't know "
"something, just say so. You're better at chatting than at answering questions."},
{"role": "user", "content": "Tell me something interesting about the ocean."},
]
ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
out = model.generate(
ids,
max_new_tokens=200,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=1.0,
repetition_penalty=1.1,
)
print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
```
### Recommended generation settings
These are the defaults AlterEgo was tuned and served with in LLME:
| Parameter | Value |
|---|---|
| `temperature` | 0.7 |
| `top_k` | 50 |
| `top_p` | 1.0 |
| `repetition_penalty` | 1.1 |
| `max_new_tokens` | 200 |
Lower the temperature toward 0.3–0.5 for steadier, more focused replies; it stops on `<|im_end|>` or `<|endoftext|>`.
### Chat format
AlterEgo uses **ChatML**:
```
<|im_start|>system
{system prompt}<|im_end|>
<|im_start|>user
{message}<|im_end|>
<|im_start|>assistant
```
### Run it locally (GGUF)
Feel free to use my pre-made GGUF's and quants by visiting [The GGUF's and quants page](https://huggingface.co/jbomdev/AlterEgo-GGUF).
Or running the model with [ollama](https://ollama.com/jbomdev/alterego).
Also, Because it's standard Llama format, you can convert to GGUF for Ollama / LM Studio / llama.cpp yourself:
```bash
python llama.cpp/convert_hf_to_gguf.py ./AlterEgo --outfile alterego-f16.gguf --outtype f16
```
## Limitations
AlterEgo is a 373M-parameter model trained on a modest token budget, and it behaves like one:
- **Capability** - it can be factually wrong, repeat itself, and lose coherence on long or complex prompts. By its own (default) system prompt, it is "better at chatting than at answering questions."
- **Language** - English only.
- **Safety** - it is **not** safety- or preference-tuned (no RLHF/DPO). It can produce incorrect, biased, or undesirable content and must not be deployed in user-facing settings without additional safeguards.
- **Bias** - it inherits biases from FineWeb-Edu (web text) and UltraChat.
## License
Released under the Apache 2.0 license. Training data is governed by the respective licenses of FineWeb-Edu and UltraChat-200K.
## Citation
```bibtex
@misc{alterego2026,
title = {AlterEgo: A 373M language model trained from scratch},
author = {J-bom},
year = {2026},
url = {https://github.com/J-bom/AlterEgo}
}
```
**Credits** - datasets: FineWeb-Edu (HuggingFaceFW), UltraChat-200K (HuggingFaceH4). Architecture follows the modern Llama-style design (RoPE, GQA, SwiGLU, RMSNorm); implementation, training, and serving by the author. |