shibatch commited on
Commit
4724c96
Β·
verified Β·
1 Parent(s): 7e90238

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +64 -9
README.md CHANGED
@@ -4,22 +4,27 @@ base_model: karpathy/tinyllamas
4
  tags:
5
  - llama2
6
  - gguf
 
 
7
  - tinyllamas
8
  - validation
9
  - test-suite
10
  ---
11
 
12
- # TinyStories Llama2 GGUF Validation Suite
13
 
14
- This repository provides a collection of ultra-lightweight GGUF models converted and quantized from Andrej Karpathy's `llama2.c` project.
15
 
16
  ### Why this repository exists?
17
- When developing a custom LLM inference engine from scratch (C/C++, Vulkan, WebAssembly, etc.), debugging with a full-sized 7B model is slow and inefficient. This suite offers **10MB - 60MB scale GGUF models** across various quantization levels, allowing developers to implement and validate their loaders and quantization kernels step-by-step.
18
 
19
  ---
20
 
21
  ## πŸ“¦ Included Formats & Testing Roadmap
22
 
 
 
 
23
  | Filename | Type | Size | Purpose / Validation Target |
24
  | :--- | :--- | :--- | :--- |
25
  | **`stories15M.F32.gguf`** | `F32` | ~60 MB | **Baseline Test.** Validates GGUF parsing, tensor layout, matrix multiplication, RoPE, and Attention logic without any dequantization overhead. |
@@ -29,16 +34,66 @@ When developing a custom LLM inference engine from scratch (C/C++, Vulkan, WebAs
29
  | **`stories15M.Q2_K`** γ€œ **`Q6_K.gguf`** | `K-Quants` | 9~15 MB | **Standard Quants.** Validates modern super-block structural parsing with mixed precision. |
30
  | **`stories15M.IQ3_XXS`** γ€œ **`IQ4_XS.gguf`** | `I-Quants` | 8~12 MB | **Advanced Quants.** Non-linear quantization targeting lookup table (codebook) decoding logic. |
31
  | **`stories15M.TQ1_0.gguf`**<br>`stories15M.TQ2_0.gguf` | `Ternary` | 7~9 MB | **Experimental.** Ternary (-1, 0, 1) state quantization for cutting-edge engine testing. |
32
- | **`stories260K.F32.gguf`**<br>`stories260K.F16.gguf` | `F32`<br>`F16` | ~1 MB | **Ultra-Mini Check.** Extreme low-resource baseline utilizing a tiny 512-token vocabulary (`tok512`). |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ---
35
 
36
  ## πŸ“ Model Specifications
37
- - **Architecture:** Llama 2 (with scaled-down dimensions)
38
- - **Dataset:** TinyStories (synthetic text dataset focused on 3 to 4-year-old vocabulary)
39
- - **Vocabulary Size:** 32,000 for 15M models, 512 for 260K models.
 
40
 
41
  ## πŸ“œ Acknowledgments & License
42
- - **Original Weights:** Trained by Andrej Karpathy ([karpathy/tinyllamas](https://huggingface.co/karpathy/tinyllamas)).
43
- - **License:** **MIT License** (inherited from the original `llama2.c` repository).
44
 
 
 
 
4
  tags:
5
  - llama2
6
  - gguf
7
+ - safetensors
8
+ - transformers
9
  - tinyllamas
10
  - validation
11
  - test-suite
12
  ---
13
 
14
+ # TinyStories Llama2 GGUF & HF Validation Suite
15
 
16
+ This repository provides a comprehensive collection of ultra-lightweight Llama2 models across various formats (both **GGUF** and **Hugging Face/Safetensors**), converted from Andrej Karpathy's `llama2.c` project.
17
 
18
  ### Why this repository exists?
19
+ When developing a custom LLM inference engine from scratch (C/C++, Vulkan, WebAssembly, etc.) or testing custom hardware kernels, debugging with a full-sized 7B model is slow and inefficient. This suite offers **1MB to 60MB scale models**, allowing developers to validate their loaders, serialization, quantization kernels, and inference logic step-by-step with lightning speed.
20
 
21
  ---
22
 
23
  ## πŸ“¦ Included Formats & Testing Roadmap
24
 
25
+ ### 1. GGUF Formats (For Native Inference Engines)
26
+ Recommended validation order when developing a custom native GGUF engine:
27
+
28
  | Filename | Type | Size | Purpose / Validation Target |
29
  | :--- | :--- | :--- | :--- |
30
  | **`stories15M.F32.gguf`** | `F32` | ~60 MB | **Baseline Test.** Validates GGUF parsing, tensor layout, matrix multiplication, RoPE, and Attention logic without any dequantization overhead. |
 
34
  | **`stories15M.Q2_K`** γ€œ **`Q6_K.gguf`** | `K-Quants` | 9~15 MB | **Standard Quants.** Validates modern super-block structural parsing with mixed precision. |
35
  | **`stories15M.IQ3_XXS`** γ€œ **`IQ4_XS.gguf`** | `I-Quants` | 8~12 MB | **Advanced Quants.** Non-linear quantization targeting lookup table (codebook) decoding logic. |
36
  | **`stories15M.TQ1_0.gguf`**<br>`stories15M.TQ2_0.gguf` | `Ternary` | 7~9 MB | **Experimental.** Ternary (-1, 0, 1) state quantization for cutting-edge engine testing. |
37
+ | **`stories260K.F32.gguf`**<br>`stories260K.F16.gguf` | `F32`<br>`F16` | ~1 MB | **Ultra-Mini Check.** Extreme low-resource baseline utilizing a tiny 512-token vocabulary. |
38
+
39
+ ### 2. Hugging Face / Transformers Formats (For PyTorch Validation)
40
+ Standard Safetensors weights accompanied by standard `config.json` files for out-of-the-box usage with the Hugging Face `transformers` library. Ideal for calculating mathematical baseline answers or testing upstream conversion scripts (like `convert_hf_to_gguf.py`).
41
+
42
+ - **`hf_stories15M/`**: The 15M parameter model mapped to standard Hugging Face Llama architecture. Includes pre-bundled Llama-2 compatible tokenizer configurations.
43
+ - **`hf_stories260K/`**: The ultra-mini 260K parameter model with its custom architecture parameters intact.
44
+
45
+ ---
46
+
47
+ ## πŸš€ Quick Start & Usage Examples
48
+
49
+ ### A. Running GGUF via llama.cpp
50
+ To verify your local setup or compare tokens using the official native utilities:
51
+
52
+ ```bash
53
+ ./llama-cli -m stories15M.Q4_K_M.gguf -p "One day, Timmy went to" -n 30 --temp 0.0
54
+
55
+ ```
56
+
57
+ ### B. Loading Hugging Face Formats via Python
58
+
59
+ You can import the Hugging Face variants directly into Python via the `transformers` library using the `subfolder` argument.
60
+
61
+ #### Example for `hf_stories15M`
62
+
63
+ ```python
64
+ import torch
65
+ from transformers import AutoTokenizer, AutoModelForCausalLM
66
+
67
+ repo_id = "shibatch/stories-converted"
68
+
69
+ # Load directly from the subfolder in this repository
70
+ tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="hf_stories15M")
71
+ model = AutoModelForCausalLM.from_pretrained(repo_id, subfolder="hf_stories15M")
72
+
73
+ prompt = "One day, Timmy went to"
74
+ inputs = tokenizer(prompt, return_tensors="pt")
75
+
76
+ with torch.no_grad():
77
+ outputs = model.generate(
78
+ **inputs,
79
+ max_new_tokens=30,
80
+ do_sample=False,
81
+ pad_token_id=tokenizer.eos_token_id
82
+ )
83
+
84
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
85
+
86
+ ```
87
 
88
  ---
89
 
90
  ## πŸ“ Model Specifications
91
+
92
+ * **Architecture:** Llama 2 (scaled down variants)
93
+ * **Dataset:** TinyStories (focused on simple vocabulary suited for 3 to 4-year-olds)
94
+ * **Vocabulary Size:** 32,000 for 15M models, 512 for 260K models.
95
 
96
  ## πŸ“œ Acknowledgments & License
 
 
97
 
98
+ * **Original Weights:** Trained by Andrej Karpathy ([karpathy/tinyllamas](https://huggingface.co/karpathy/tinyllamas)).
99
+ * **License:** **MIT License** (inherited from the original `llama2.c` repository). You are free to use, modify, and distribute these assets for any purpose.