Instructions to use jbomdev/AlterEgo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jbomdev/AlterEgo with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jbomdev/AlterEgo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("jbomdev/AlterEgo")
model = AutoModelForCausalLM.from_pretrained("jbomdev/AlterEgo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use jbomdev/AlterEgo with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jbomdev/AlterEgo"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jbomdev/AlterEgo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/jbomdev/AlterEgo

SGLang

How to use jbomdev/AlterEgo with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jbomdev/AlterEgo" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jbomdev/AlterEgo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jbomdev/AlterEgo" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jbomdev/AlterEgo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use jbomdev/AlterEgo with Docker Model Runner:
```
docker model run hf.co/jbomdev/AlterEgo
```

jbomdev commited on 12 days ago

Commit

603b7a6

verified ·

1 Parent(s): 25907b8

Update README.md

Browse files

initial model card commit

Files changed (1) hide show

README.md +236 -0

README.md CHANGED Viewed

@@ -1,3 +1,239 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- text-generation
+- causal-lm
+- from-scratch
+- llama
+- grouped-query-attention
+- rope
+- swiglu
+- chatml
+datasets:
+- HuggingFaceFW/fineweb-edu
+- HuggingFaceH4/ultrachat_200k
+model-index:
+- name: AlterEgo-373M
+  results:
+  - task: {type: text-generation}
+    dataset: {name: lambada_openai, type: lambada_openai}
+    metrics: [{type: acc, value: 0.3161}]
+  - task: {type: text-generation}
+    dataset: {name: hellaswag, type: hellaswag}
+    metrics: [{type: acc_norm, value: 0.38}]
+  - task: {type: text-generation}
+    dataset: {name: arc_easy, type: arc_easy}
+    metrics: [{type: acc_norm, value: 0.5269}]
+  - task: {type: text-generation}
+    dataset: {name: arc_challenge, type: arc_challenge}
+    metrics: [{type: acc_norm, value: 0.273}]
+  - task: {type: text-generation}
+    dataset: {name: piqa, type: piqa}
+    metrics: [{type: acc_norm, value: 0.6567}]
+  - task: {type: text-generation}
+    dataset: {name: winogrande, type: winogrande}
+    metrics: [{type: acc, value: 0.513}]
+  - task: {type: text-generation}
+    dataset: {name: openbookqa, type: openbookqa}
+    metrics: [{type: acc_norm, value: 0.322}]
+  - task: {type: text-generation}
+    dataset: {name: sciq, type: sciq}
+    metrics: [{type: acc_norm, value: 0.722}]
+  - task: {type: text-generation}
+    dataset: {name: boolq, type: boolq}
+    metrics: [{type: acc, value: 0.6177}]
 ---
+<div align="center">
+# 🧠 AlterEgo-373M
+**A 373-million-parameter language model designed, trained, and served entirely from scratch.**
+[![Code](https://img.shields.io/badge/GitHub-AlterEgo%20(training)-181717?logo=github)](https://github.com/J-bom/AlterEgo)
+[![Platform](https://img.shields.io/badge/GitHub-LLME%20(platform)-181717?logo=github)](https://github.com/J-bom/LLME)
+[![Architecture](https://img.shields.io/badge/arch-Llama--style-blue)]()
+[![Params](https://img.shields.io/badge/params-373M-green)]()
+</div>
+---
+## Introduction
+**AlterEgo** is a small, decoder-only language model built from the ground up - not a fine-tune of an existing model. Every part was written from zero: the transformer architecture, the training loop, the tokenizer wiring, and the KV-cached inference engine. It was pre-trained on ~10B tokens of high-quality educational web text and then instruction-tuned for chat.
+It is the model at the heart of **[LLME](https://github.com/J-bom/LLME)**, a self-hosted, end-to-end-encrypted LLM platform (think LM Studio / Open WebUI / Ollama, also built from scratch). LLME can serve AlterEgo alongside `llama.cpp` GGUF models and the Gemini API; AlterEgo is the "house" model it was designed around.
+This repository contains the **model**. The training and architecture code lives in the [AlterEgo repo](https://github.com/J-bom/AlterEgo); the serving platform lives in the [LLME repo](https://github.com/J-bom/LLME).
+> **Two formats are published.** This repo is the Hugging Face `LlamaForCausalLM` conversion, for drop-in use with `transformers`, vLLM, and GGUF tooling. The **original checkpoint** - in AlterEgo's own from-scratch architecture, exactly as trained - is published separately as [`J-bom/alterego_raw`](https://huggingface.co/J-bom/AlterEgo_raw). This version is a **numerically-lossless conversion** of it (verified: max logit difference ~1e-6).
+> **What it is and isn't.** AlterEgo is a *research / learning artifact* - a demonstration of the full modern LLM pipeline (architecture → pretraining → SFT → serving) at a scale one person can train on a single GPU. It is **not** a production assistant and won't compete with billion-parameter models. See [Limitations](#limitations).
+## Architecture
+A modern Llama-style decoder (and, thanks to that, it loads as a standard `LlamaForCausalLM`).
+| Component | Choice |
+|---|---|
+| Type | Decoder-only transformer (autoregressive) |
+| Parameters | ~373M (input/output embeddings tied) |
+| Layers | 24 |
+| Model dimension | 1024 |
+| Attention | **Grouped-Query Attention** - 16 query heads / 4 KV heads (head dim 64) |
+| Positional encoding | **Rotary embeddings (RoPE)**, θ = 10,000 |
+| Normalization | **RMSNorm** (pre-norm) |
+| Feed-forward | **SwiGLU**, hidden dim 2816 |
+| Context length | 2048 |
+| Vocabulary | 100,352 |
+| Tokenizer | `cl100k_base` (tiktoken) extended with ChatML special tokens |
+## Training
+AlterEgo was trained in two stages on a single NVIDIA RTX 4090.
+### Stage 1 - Pretraining
+Pre-trained on **[FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)** (HuggingFaceFW), a quality-filtered educational subset of CommonCrawl.
+![Pretraining loss](assets/pretraining_loss.png)
+![Training dynamics](assets/training_dynamics.png)
+The grad-norm settling to ~0.26 and the smooth cosine-shaped loss indicate stable training with no divergence.
+### Stage 2 - Supervised fine-tuning
+Instruction-tuned on **[UltraChat-200K](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)** (HuggingFaceH4), formatted as multi-turn **ChatML**.
+![SFT loss](assets/sft_loss.png)
+### Hyperparameters
+| | Pretraining | SFT |
+|---|---|---|
+| Dataset | FineWeb-Edu | UltraChat-200K |
+| Tokens / steps | ~10B / 19,073 | ~64M / 244 |
+| Global batch | 524,288 tokens (micro 2 × 2048 × 128 grad-accum) | same scheme |
+| Optimizer | AdamW (β = 0.9, 0.95; ε = 1e-8; fused) | same |
+| Weight decay | 0.1 (decoupled; excluded from norms/biases) | same |
+| LR schedule | linear warmup (1,900 steps) → cosine decay | cosine |
+| Peak / min LR | 3e-4 → 3e-5 | low (fine-tune range) |
+| Grad clipping | global-norm 1.0 | 1.0 |
+| Precision | bfloat16 autocast | bfloat16 |
+| Throughput / wall-clock | ~32k tok/s · ~86 GPU-h (3.6 days) | ~39k tok/s · ~28 min |
+| Other | `torch.compile`, gradient checkpointing, FlashAttention (SDPA) | same |
+| Final loss (train / val) | 2.94 / **2.89** | 1.83 / **1.81** |
+## Evaluation
+Benchmarked with [EleutherAI's lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) (0-shot).
+| Benchmark | Metric | AlterEgo-373M | Random |
+|---|---|---|---|
+| lambada_openai | acc | 31.6% | ~0% |
+| hellaswag | acc_norm | 38.0% | 25% |
+| arc_easy | acc_norm | 52.7% | 25% |
+| arc_challenge | acc_norm | 27.3% | 25% |
+| piqa | acc_norm | 65.7% | 50% |
+| winogrande | acc | 51.3% | 50% |
+| openbookqa | acc_norm | 32.2% | 25% |
+| sciq | acc_norm | 72.2% | 25% |
+| boolq | acc | 61.8% | 50% |
+For a 373M model trained on ~10B tokens these are solid: clearly above chance on science and commonsense (SciQ, PIQA, BoolQ, ARC-easy, HellaSwag) and on next-word prediction (LAMBADA — perplexity 62.3), with the expected near-chance results on the hardest reasoning sets (ARC-challenge, WinoGrande).
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+tok = AutoTokenizer.from_pretrained("J-bom/AlterEgo")
+model = AutoModelForCausalLM.from_pretrained("J-bom/AlterEgo", torch_dtype=torch.bfloat16)
+messages = [
+    {"role": "system", "content":
+     "You are Alter Ego, a small AI built from scratch. You're casual and direct. "
+     "You're not great with facts, math, or current events - when you don't know "
+     "something, just say so. You're better at chatting than at answering questions."},
+    {"role": "user", "content": "Tell me something interesting about the ocean."},
+]
+ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
+out = model.generate(
+    ids,
+    max_new_tokens=200,
+    do_sample=True,
+    temperature=0.7,
+    top_k=50,
+    top_p=1.0,
+    repetition_penalty=1.1,
+)
+print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
+```
+### Recommended generation settings
+These are the defaults AlterEgo was tuned and served with in LLME:
+| Parameter | Value |
+|---|---|
+| `temperature` | 0.7 |
+| `top_k` | 50 |
+| `top_p` | 1.0 |
+| `repetition_penalty` | 1.1 |
+| `max_new_tokens` | 200 |
+Lower the temperature toward 0.3–0.5 for steadier, more focused replies; it stops on `<|im_end|>` or `<|endoftext|>`.
+### Chat format
+AlterEgo uses **ChatML**:
+```
+<|im_start|>system
+{system prompt}<|im_end|>
+<|im_start|>user
+{message}<|im_end|>
+<|im_start|>assistant
+```
+### Run it locally (GGUF)
+Because it's standard Llama format, you can convert to GGUF for Ollama / LM Studio / llama.cpp:
+```bash
+python llama.cpp/convert_hf_to_gguf.py ./AlterEgo --outfile alterego-f16.gguf --outtype f16
+```
+## Limitations
+AlterEgo is a 373M-parameter model trained on a modest token budget, and it behaves like one:
+- **Capability** - it can be factually wrong, repeat itself, and lose coherence on long or complex prompts. By its own (default) system prompt, it is "better at chatting than at answering questions."
+- **Language** - English only.
+- **Safety** - it is **not** safety- or preference-tuned (no RLHF/DPO). It can produce incorrect, biased, or undesirable content and must not be deployed in user-facing settings without additional safeguards.
+- **Bias** - it inherits biases from FineWeb-Edu (web text) and UltraChat.
+## License
+Released under the Apache 2.0 license. Training data is governed by the respective licenses of FineWeb-Edu and UltraChat-200K.
+## Citation
+```bibtex
+@misc{alterego2026,
+  title  = {AlterEgo: A 373M language model trained from scratch},
+  author = {J-bom},
+  year   = {2026},
+  url    = {https://github.com/J-bom/AlterEgo}
+}
+```
+**Credits** - datasets: FineWeb-Edu (HuggingFaceFW), UltraChat-200K (HuggingFaceH4). Architecture follows the modern Llama-style design (RoPE, GQA, SwiGLU, RMSNorm); implementation, training, and serving by the author.