shibatch commited on
Commit
b8df4ed
·
verified ·
1 Parent(s): 56019ec

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,24 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tinybpe1m.BF16.gguf filter=lfs diff=lfs merge=lfs -text
37
+ tinybpe1m.F16.gguf filter=lfs diff=lfs merge=lfs -text
38
+ tinybpe1m.F32.gguf filter=lfs diff=lfs merge=lfs -text
39
+ tinybpe1m.IQ3_S.gguf filter=lfs diff=lfs merge=lfs -text
40
+ tinybpe1m.IQ3_XXS.gguf filter=lfs diff=lfs merge=lfs -text
41
+ tinybpe1m.IQ4_NL.gguf filter=lfs diff=lfs merge=lfs -text
42
+ tinybpe1m.IQ4_XS.gguf filter=lfs diff=lfs merge=lfs -text
43
+ tinybpe1m.Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
44
+ tinybpe1m.Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
45
+ tinybpe1m.Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
46
+ tinybpe1m.Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
47
+ tinybpe1m.Q4_0.gguf filter=lfs diff=lfs merge=lfs -text
48
+ tinybpe1m.Q4_1.gguf filter=lfs diff=lfs merge=lfs -text
49
+ tinybpe1m.Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
50
+ tinybpe1m.Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
51
+ tinybpe1m.Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
52
+ tinybpe1m.Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
53
+ tinybpe1m.Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
54
+ tinybpe1m.Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
55
+ tinybpe1m.TQ1_0.gguf filter=lfs diff=lfs merge=lfs -text
56
+ tinybpe1m.TQ2_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: karpathy/tinyllamas
4
+ tags:
5
+ - llama2
6
+ - gguf
7
+ - safetensors
8
+ - transformers
9
+ - tinyllamas
10
+ - validation
11
+ - test-suite
12
+ ---
13
+
14
+ # TinyStories Llama2 1M (tinybpe1m) GGUF & HF Validation Suite
15
+
16
+ This repository provides ultra-lightweight Llama2 model files across various formats (both **GGUF** and **Hugging Face / Safetensors**), trained on the TinyStories dataset and optimized for testing and validation.
17
+
18
+ ### Why this repository exists
19
+ When developing a custom LLM inference engine, debugging with a full-sized model is slow. This suite offers a true **1M parameter scale model** (~0.5MB to ~4MB depending on the quantization format), allowing developers to validate their loaders, serialization, quantization kernels, and inference logic step-by-step with maximum efficiency.
20
+
21
+ ### Difference from `tiny1m`
22
+ This is a **BPE-based model variant**. Unlike the standard `tiny1m` model, this model is **NOT compatible with `llama2.c`**.
23
+ The custom SentencePiece BPE tokenizer utilized here relies on the `byte_fallback` mechanism to handle unknown characters. Because `llama2.c`'s simplified native C loader/tokenizer cannot interpret or process `byte_fallback` routines, text generation will fail or corrupt in that environment. This suite is strictly designed and optimized for **`llama.cpp` (GGUF)** and **Hugging Face `transformers` (Python)** execution.
24
+
25
+ ---
26
+
27
+ ## 📂 Repository Structure & File Descriptions
28
+
29
+ ### 1. GGUF Formats (Root Directory `./`)
30
+ A comprehensive validation suite converted for `llama.cpp` and compatible engines. The tokenizer vocabulary and special tokens are fully embedded within each GGUF binary. Every compiled quantization variant available in the root directory is explicitly covered below:
31
+
32
+ | Filename(s) / Wildcard Pattern | Type | Size | Purpose / Validation Target |
33
+ | :--- | :--- | :--- | :--- |
34
+ | **`tinybpe1m.F32.gguf`** | `F32` | ~4.0 MB | **Baseline Test.** Validates GGUF parsing, tensor layout, matrix multiplication, RoPE, and Attention logic without dequantization overhead. |
35
+ | **`tinybpe1m.F16.gguf`**<br>**`tinybpe1m.BF16.gguf`** | `F16`<br>`BF16` | ~2.0 MB | **Half-Precision Test.** Validates 16-bit floating point loading, type casting, and inference stability. |
36
+ | **`tinybpe1m.Q8_0.gguf`** | `Q8_0` | ~1.1 MB | **Quantization Level 1.** Validates block-based uniform scaling with 32 elements. |
37
+ | **`tinybpe1m.Q4_0.gguf`**<br>**`tinybpe1m.Q4_1.gguf`** | `Q4_0`<br>`Q4_1` | ~0.7 MB | **Quantization Level 2.** Validates classic 4-bit linear quantization and bit-unpacking logic. |
38
+ | **`tinybpe1m.Q2_K.gguf`** | `Q2_K` | ~0.5 MB | **Standard K-Quant (2-bit).** Validates 2-bit super-block quantization parsing. |
39
+ | **`tinybpe1m.Q3_K_*.gguf`**<br>↳ *`tinybpe1m.Q3_K_S.gguf`*<br>↳ *`tinybpe1m.Q3_K_M.gguf`*<br>↳ *`tinybpe1m.Q3_K_L.gguf`* | `Q3_K` | ~0.6 MB | **Standard K-Quant (3-bit).** Validates Small, Medium, and Large sub-variants of 3-bit multi-block structures. |
40
+ | **`tinybpe1m.Q4_K_*.gguf`**<br>↳ *`tinybpe1m.Q4_K_S.gguf`*<br>↳ *`tinybpe1m.Q4_K_M.gguf`* | `Q4_K` | ~0.7 MB | **Standard K-Quant (4-bit).** Validates Small and Medium sub-variants of modern 4-bit super-block structural parsing. |
41
+ | **`tinybpe1m.Q5_K_*.gguf`**<br>↳ *`tinybpe1m.Q5_K_S.gguf`*<br>↳ *`tinybpe1m.Q5_K_M.gguf`* | `Q5_K` | ~0.8 MB | **Standard K-Quant (5-bit).** Validates Small and Medium sub-variants of 5-bit mixed precision super-blocks. |
42
+ | **`tinybpe1m.Q6_K.gguf`** | `Q6_K` | ~0.9 MB | **Standard K-Quant (6-bit).** Validates 6-bit high-fidelity super-block quantization. |
43
+ | **`tinybpe1m.IQ3_*.gguf`**<br>↳ *`tinybpe1m.IQ3_XXS.gguf`*<br>↳ *`tinybpe1m.IQ3_S.gguf`* | `I-Quants` | ~0.5 MB | **Importance Quants (3-bit).** Non-linear 3-bit importance quantization targeting lookup table (codebook) decoding logic. |
44
+ | **`tinybpe1m.IQ4_*.gguf`**<br>↳ *`tinybpe1m.IQ4_NL.gguf`*<br>↳ *`tinybpe1m.IQ4_XS.gguf`* | `I-Quants` | ~0.6 MB | **Importance Quants (4-bit).** Non-linear 4-bit importance quantization variants (Non-Linear and Extra Small). |
45
+ | **`tinybpe1m.TQ1_0.gguf`**<br>**`tinybpe1m.TQ2_0.gguf`** | `Ternary` | ~0.4 MB | **Experimental.** Ternary (-1, 0, 1) state quantization for cutting-edge engine testing. |
46
+
47
+ ### 2. Hugging Face Native Format (`./hf/`)
48
+ This directory contains the standard files required to load the model using the PyTorch `transformers` library:
49
+ * **`hf/model.safetensors`**: The raw, unquantized model weights stored securely in Safetensors format.
50
+ * **`hf/config.json`**: The architectural configuration file defining hyperparameters (layers, heads, dimensions).
51
+ * **`hf/generation_config.json`**: Default parameters optimized for text generation (temperature, top_p, etc.).
52
+ * **`hf/tokenizer.model`**: The custom 512-vocab SentencePiece tokenizer model file required for Python-side encoding/decoding.
53
+
54
+ ---
55
+
56
+ ## 🚀 Usage Examples
57
+
58
+ ### A. Running GGUF via llama.cpp
59
+ To verify your local setup or test custom execution backends using the official native utilities:
60
+ ```bash
61
+ ./llama-cli -m tinybpe1m.Q4_K_M.gguf -p "Tom and Jerry are " -n 64 --temp 0.0
62
+
63
+ ```
64
+
65
+ ### B. Loading Hugging Face Formats via Python
66
+
67
+ You can import the Hugging Face variant directly into Python using the `transformers` library.
68
+
69
+ ```python
70
+ import torch
71
+ from transformers import AutoTokenizer, AutoModelForCausalLM
72
+
73
+ repo_id = "shibatch/tinybpe1m"
74
+
75
+ # The library automatically loads from the hf/ subfolder
76
+ tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="hf")
77
+ model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="hf")
78
+
79
+ prompt = "Tom and Jerry are "
80
+ inputs = tokenizer(prompt, return_tensors="pt")
81
+
82
+ with torch.no_grad():
83
+ outputs = model.generate(
84
+ **inputs,
85
+ max_new_tokens=64,
86
+ do_sample=False,
87
+ pad_token_id=tokenizer.eos_token_id
88
+ )
89
+
90
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
91
+
92
+ ```
93
+
94
+ ---
95
+
96
+ ## 📝 Model Specifications
97
+
98
+ The network architecture features an unshared output layer (`lm_head`) to keep memory structures consistent with standard Llama 2 definitions. Thanks to the highly optimized 512 vocabulary size, the token embedding and output layers remain extremely lightweight.
99
+
100
+ * **Architecture:** Llama 2 (Scaled-down variant)
101
+ * **Dataset:** TinyStories
102
+ * **Total Parameters:** ~1M (Exactly 896,256 parameters)
103
+ * **Vocabulary Size:** 512 (Custom SentencePiece BPE Tokenizer with `byte_fallback` enabled)
104
+ * **Hidden Size (`hidden_size`):** 128
105
+ * **Number of Hidden Layers (`num_hidden_layers`):** 4
106
+ * **Number of Attention Heads (`num_heads`):** 2
107
+ * **Number of Key-Value Heads (`num_kv_heads`):** 2
108
+ * **Intermediate Size (`intermediate_size`):** 352
109
+ * **Max Position Embeddings (`max_position_embeddings`):** 256
110
+
111
+ ## 📜 Acknowledgments & License
112
+
113
+ * **Original Implementation:** Inspired by Andrej Karpathy's `llama2.c` project.
114
+ * **Dataset:** TinyStories dataset.
115
+ * **License:** **MIT License**. You are free to use, modify, and distribute these assets for any purpose.
hf/config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "dtype": "float32",
9
+ "eos_token_id": 2,
10
+ "head_dim": 64,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 128,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 352,
15
+ "max_position_embeddings": 256,
16
+ "mlp_bias": false,
17
+ "model_type": "llama",
18
+ "num_attention_heads": 2,
19
+ "num_hidden_layers": 4,
20
+ "num_key_value_heads": 2,
21
+ "pad_token_id": 2,
22
+ "pretraining_tp": 1,
23
+ "rms_norm_eps": 1e-06,
24
+ "rope_parameters": {
25
+ "rope_theta": 10000.0,
26
+ "rope_type": "default"
27
+ },
28
+ "tie_word_embeddings": false,
29
+ "transformers_version": "5.9.0",
30
+ "use_cache": false,
31
+ "vocab_size": 512
32
+ }
hf/generation_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "output_attentions": false,
6
+ "output_hidden_states": false,
7
+ "pad_token_id": 2,
8
+ "transformers_version": "5.9.0",
9
+ "use_cache": true
10
+ }
hf/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39c5ff9c53cbda295e88a57bb25009f6a23045555cb76a1d66501230d001a7bb
3
+ size 3744288
hf/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b432388a969eb49dbb64e25d9ebcffd9de04919f88b13f7805cde6c8ad6684f
3
+ size 247523
tinybpe1m.BF16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dfd2563eaf811ae1d12a3f38365a8315601d3539c864734fd59b600820ae59e1
3
+ size 1886240
tinybpe1m.F16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17d21f397845a9ce92f18a55c7f0698a4016ef475041b996fd5a39b3f30d8d89
3
+ size 1886240
tinybpe1m.F32.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66b6c2075b1e774643faaf46adaeea54b8500d24c6a903350dbf5480e6d11a9e
3
+ size 3754016
tinybpe1m.IQ3_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ef0a3c2116a41a1b1d78f6768bef230034c57e2bb1650ec04d5dfcc200e512d
3
+ size 576544
tinybpe1m.IQ3_XXS.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0cfae34d0fb9d30d0ee4156702dc6b80c31d2240a3c99b76f2f27ad24d2e5680
3
+ size 576544
tinybpe1m.IQ4_NL.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d377c65479f5a62e91a1eca730d884d9fe164e0808a8fbed74df119ea81c7e1
3
+ size 576544
tinybpe1m.IQ4_XS.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44279e858ae0f4a7a3b3cf4a4f8031148acfb514eb0bf4bfa7a13d40a6ce62cc
3
+ size 576544
tinybpe1m.Q2_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:417e611d483bbf79802cb23cdc22bbd9339a0cbc95136f2b2708475936a1a7f9
3
+ size 576544
tinybpe1m.Q3_K_L.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be5e249341719b7de52ba7714445885103c720cc03a9167c1bdb717bb28d093a
3
+ size 634912
tinybpe1m.Q3_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8cbac852529e34c3753891143f8bb09e31b23140a2496ba2271f4fecc4d81d2
3
+ size 617504
tinybpe1m.Q3_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eac6e38760029f893db587041f154648827994c5ebf85a078698eb16ba4bf741
3
+ size 576544
tinybpe1m.Q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f37ac9de3a907abf512ce20ac4545de5091a2509e059289fb1eefde14342d79
3
+ size 576544
tinybpe1m.Q4_1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da7c6cd0e27bce416487e229be783ed95ab08bd2eb48f23a982141fa0110b1ea
3
+ size 630816
tinybpe1m.Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7f5bcd15b90c9e2937dd0551a875a0ae45865617253e0a4c337b0eceb8e36bd
3
+ size 731168
tinybpe1m.Q4_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1aca8ddadfe86ab9ee0c5d6bbeb40a91c3bcf673bc32e372ec6bd4db6a76b8a3
3
+ size 689184
tinybpe1m.Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:320ede8f17d2c0007754f2a4d8668b86c7f01dc52efde3c90e468b9b6b585f0a
3
+ size 777760
tinybpe1m.Q5_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5a595f24fbefd6686996ddf637a02c007df148a50303503d8087d57bb1d87cc
3
+ size 739360
tinybpe1m.Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd581fbd61a6404bdf74e02efa6c8d11611f7840bfc1c30a3e78e0c9dcd41db7
3
+ size 1010720
tinybpe1m.Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:abbde54169767f110f1ccedf26b4f42ad219d902d83ea4f44ec8176f33c2d0cf
3
+ size 1010720
tinybpe1m.TQ1_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:944bfa1694ae5683b5e5e39f496e71f3ea4c0240d5d1a7cef53bf4fda699f2aa
3
+ size 584736
tinybpe1m.TQ2_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18c653e44063a5589d748658319b3e8b03a5c8e35dc1ff9f2c6091b09ded2d44
3
+ size 584736