CompactAI-O
/

Glint-0.1

@@ -1,5 +1,5 @@
 ---
-license: apache-2.0
 datasets:
 - shuyuej/English-Pretraining-Dataset
 - HuggingFaceFW/fineweb-edu
@@ -12,66 +12,61 @@ language:
 - en
 tags:
 - small
-- haiku
-new_version: CompactAI-O/TMLM-Haiku-1.3
 ---
-# TinyMemoryLM
-> **⚠️ IMPORTANT NOTICE**
-> 1. **The model is really dumb.** This is a ~1M parameter research model designed for experimentation, not production use.
-> 2. **Do not expect it to answer any questions.** It is prone to repetition, hallucination, and format collapse.
-## Overview
-TinyMemoryLM is an ultra-lightweight language model optimized for edge cases and architectural experimentation. Despite its small footprint, it incorporates several novel training innovations aimed at stabilizing tiny model convergence, including hybrid tokenization, loss boosting strategies, and context-aware relevance modeling.
-This release includes both **Pretrained Weights** (base language modeling) and **Instruction Weights** (fine-tuned for chat/completion).
-## Files Provided
-| File | Description |
 | :--- | :--- |
-| `tokenizer.json` | Hybrid word/character tokenizer vocabulary. |
-| `pretrain.pt` | Base pretrained checkpoint (language modeling). |
-| `model.pt` | Instruction-tuned checkpoint (SFT/Chat). |
-## Model Specifications
-| Parameter | Value |
 | :--- | :--- |
 | **Architecture** | Transformer Decoder |
 | **Parameters** | ~1 Million |
-| **Context Length** | 2,048 tokens |
-| **Dimensions** | `d_model=160`, `layers=6`, `heads=4`, `ffn=256` |
-| **Vocabulary** | ~2,111 tokens (Hybrid Char + Word) |
-| **Normalization** | RMSNorm + QK-Norm |
-| **Embeddings** | Rotary Embeddings (RoPE) |
 | **Activation** | SwiGLU |
-## Architecture Highlights
-TinyMemoryLM implements several research-focused modifications to standard transformer architectures:
-*   **Hybrid Tokenizer:** Combines character-level fallback with frequent word tokens to balance compression and vocabulary size.
-*   **QK-Norm:** Applies RMSNorm to Query and Key projections for improved stability in low-precision training.
-*   **Word Token Loss Boosting:** Upweights loss signals for multi-character tokens to prevent the model from ignoring them in favor of character-level spelling.
-*   **Response-Start Weighting:** Prioritizes the first tokens of assistant responses to improve prompt conditioning.
-*   **Pretrain Replay:** Mixes pretraining data during instruction tuning to prevent catastrophic forgetting of language fluency.
-## Training Loss Curve
-Below is the training loss progression during the instruction tuning phase. Note the stability measures taken to prevent collapse in such a small parameter regime.
-![Training Loss Curve]({model/loss_curve.png})
-## Limitations & Expectations
-Please manage your expectations when using TinyMemoryLM:
-*   **Repetition:** Tiny models are prone to collapsing into repetitive token loops.
-*   **Knowledge:** The model has limited world knowledge due to parameter constraints.
-*   **Usage:** This model is intended for **research, educational purposes, and architectural benchmarking**. It is not suitable for assistant tasks or reliable information retrieval.
 ---
-*Generated for research purposes. Use responsibly.*

 ---
+license: gpl-3.0
 datasets:
 - shuyuej/English-Pretraining-Dataset
 - HuggingFaceFW/fineweb-edu
 - en
 tags:
 - small
+- glint
+new_version: CompactAI-O/Glint-0.2
 ---
+# Glint-0.1
+> Once upon a time, there was a model that could only say `couldcouldoldbloodbloodbodybody`. This is its ancestor.
+Glint-0.1 is where the Glint line started. 1M parameters. Big dreams. Almost no ability to realize them. We look back on this one fondly, like a blurry photo of a puppy that chewed your shoes.
+## What you get
+| File | What it is |
 | :--- | :--- |
+| `tokenizer.json` | Hybrid word/char tokenizer (~2,111 tokens) |
+| `pretrain.pt` | Base pretrained checkpoint |
+| `model.pt` | Instruction-tuned checkpoint (SFT) |
+## Specs
+| Thing | Value |
 | :--- | :--- |
 | **Architecture** | Transformer Decoder |
 | **Parameters** | ~1 Million |
+| **Context** | 2,048 tokens |
+| **d_model** | 160 |
+| **Layers** | 6 |
+| **Heads** | 4 |
+| **FFN** | 256 |
+| **Vocab** | ~2,111 tokens (Hybrid Char + Word) |
+| **Norm** | RMSNorm + QK-Norm |
+| **Position** | RoPE |
 | **Activation** | SwiGLU |
+## What made this one special
+- **Hybrid tokenizer** -- word-level where it helps, character-level where it gets confused
+- **QK-Norm** -- RMSNorm on queries and keys so training doesnt blow up
+- **Loss boosting** -- yelled at the model extra hard when it ignored multi-character words
+- **Response-start weighting** -- made it actually pay attention to the first tokens of its answers
+- **Pretrain replay** -- kept mixing in pretrain data during SFT so it wouldnt forget how to speak English
+## Training curve
+![loss curve]({model/loss_curve.png})
+It went down. Slowly. Painfully.
+## Limitations
+- Repeats itself. A lot.
+- Knows almost nothing about the world.
+- Not useful for anything real. Research only.
+- Will embarrass itself if asked a direct question.
 ---
+*Built by [CompactAI](https://huggingface.co/CompactAI-O). We started somewhere.*