CompactAI commited on
Commit
fb7b90b
·
verified ·
1 Parent(s): 70f61d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -3
README.md CHANGED
@@ -1,3 +1,79 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - shuyuej/English-Pretraining-Dataset
5
+ - HuggingFaceFW/fineweb-edu
6
+ - mattwesney/General_Inquiry_Thinking-Chain-Of-Thought
7
+ - tatsu-lab/alpaca
8
+ - databricks/databricks-dolly-15k
9
+ - TeichAI/Step-3.5-Flash-2600x
10
+ - TeichAI/convo-v1
11
+ language:
12
+ - en
13
+ tags:
14
+ - small
15
+ - haiku
16
+ ---
17
+ # TinyMemoryLM
18
+
19
+ > **⚠️ IMPORTANT NOTICE**
20
+ >
21
+ > 1. **The infrence script is not publicly available yet (soon!)**. This release contains only the model weights and tokenizer.
22
+ > 2. **The model is really dumb.** This is a ~1M parameter research model designed for experimentation, not production use.
23
+ > 3. **Do not expect it to answer any questions.** It is prone to repetition, hallucination, and format collapse.
24
+
25
+ ## Overview
26
+
27
+ TinyMemoryLM is an ultra-lightweight language model optimized for edge cases and architectural experimentation. Despite its small footprint, it incorporates several novel training innovations aimed at stabilizing tiny model convergence, including hybrid tokenization, loss boosting strategies, and context-aware relevance modeling.
28
+
29
+ This release includes both **Pretrained Weights** (base language modeling) and **Instruction Weights** (fine-tuned for chat/completion).
30
+
31
+ ## Files Provided
32
+
33
+ | File | Description |
34
+ | :--- | :--- |
35
+ | `tokenizer.json` | Hybrid word/character tokenizer vocabulary. |
36
+ | `pretrain.pt` | Base pretrained checkpoint (language modeling). |
37
+ | `model.pt` | Instruction-tuned checkpoint (SFT/Chat). |
38
+
39
+ ## Model Specifications
40
+
41
+ | Parameter | Value |
42
+ | :--- | :--- |
43
+ | **Architecture** | Transformer Decoder |
44
+ | **Parameters** | ~1 Million |
45
+ | **Context Length** | 2,048 tokens |
46
+ | **Dimensions** | `d_model=160`, `layers=6`, `heads=4`, `ffn=256` |
47
+ | **Vocabulary** | ~2,111 tokens (Hybrid Char + Word) |
48
+ | **Normalization** | RMSNorm + QK-Norm |
49
+ | **Embeddings** | Rotary Embeddings (RoPE) |
50
+ | **Activation** | SwiGLU |
51
+
52
+ ## Architecture Highlights
53
+
54
+ TinyMemoryLM implements several research-focused modifications to standard transformer architectures:
55
+
56
+ * **Hybrid Tokenizer:** Combines character-level fallback with frequent word tokens to balance compression and vocabulary size.
57
+ * **QK-Norm:** Applies RMSNorm to Query and Key projections for improved stability in low-precision training.
58
+ * **Word Token Loss Boosting:** Upweights loss signals for multi-character tokens to prevent the model from ignoring them in favor of character-level spelling.
59
+ * **Response-Start Weighting:** Prioritizes the first tokens of assistant responses to improve prompt conditioning.
60
+ * **Pretrain Replay:** Mixes pretraining data during instruction tuning to prevent catastrophic forgetting of language fluency.
61
+
62
+ ## Training Loss Curve
63
+
64
+ Below is the training loss progression during the instruction tuning phase. Note the stability measures taken to prevent collapse in such a small parameter regime.
65
+
66
+ ![Training Loss Curve]({loss})
67
+
68
+ ##Limitations & Expectations
69
+
70
+ Please manage your expectations when using TinyMemoryLM:
71
+
72
+ * **Reasoning:** While trained with Chain-of-Thought markers (`<|begin_of_thought|>`, `<|begin_of_solution|>`), the model often memorizes the format scaffolding without genuine reasoning capability.
73
+ * **Repetition:** Tiny models are prone to collapsing into repetitive token loops.
74
+ * **Knowledge:** The model has limited world knowledge due to parameter constraints.
75
+ * **Usage:** This model is intended for **research, educational purposes, and architectural benchmarking**. It is not suitable for assistant tasks or reliable information retrieval.
76
+
77
+ ---
78
+
79
+ *Generated for research purposes. Use responsibly.*