nologik commited on
Commit
e9c9d33
·
verified ·
1 Parent(s): 640cf5c

Add comprehensive README

Browse files
Files changed (1) hide show
  1. README.md +167 -0
README.md ADDED
@@ -0,0 +1,167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: iquestcoder
4
+ license_link: https://huggingface.co/IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct
5
+ base_model: IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct
6
+ tags:
7
+ - gguf
8
+ - quantized
9
+ - loop-attention
10
+ - recurrent-transformer
11
+ - code-generation
12
+ - iquest
13
+ language:
14
+ - en
15
+ pipeline_tag: text-generation
16
+ ---
17
+
18
+ # IQuest-Coder-V1-40B-Loop-Instruct - GGUF
19
+
20
+ **World's first GGUF conversion** of IQuestLab's IQuest-Coder-V1-40B-Loop-Instruct model with recurrent loop attention mechanism.
21
+
22
+ ## Model Details
23
+
24
+ - **Base Model**: [IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct](https://huggingface.co/IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct)
25
+ - **Architecture**: Llama with Loop Attention (recurrent transformer, 2 iterations)
26
+ - **Parameters**: 40B
27
+ - **Context Length**: 131,072 tokens
28
+ - **Vocabulary**: 76,800 tokens
29
+ - **Conversion Date**: 2026-01-07
30
+ - **Converted By**: Avarok (Dual NVIDIA DGX Spark with GB10 GPUs)
31
+
32
+ ## Files Included
33
+
34
+ | Filename | Size | Quant Type | Use Case |
35
+ |----------|------|------------|----------|
36
+ | `IQuest-Coder-V1-40B-Loop-Instruct-f16.gguf` | 75GB | F16 | Full precision reference |
37
+ | `IQuest-Coder-V1-40B-Loop-Instruct-q8_0.gguf` | 40GB | Q8_0 | Excellent quality, minimal loss |
38
+ | `IQuest-Coder-V1-40B-Loop-Instruct-q5_k_m.gguf` | 27GB | Q5_K_M | Good quality balance |
39
+ | `IQuest-Coder-V1-40B-Loop-Instruct-q4_k_m.gguf` | 23GB | Q4_K_M | **RECOMMENDED** - Best size/quality balance |
40
+
41
+ ## SHA256 Checksums
42
+
43
+ ```
44
+ b70d3bb48753e786c8afca7556b818341fc9258e29083be4b0375c5a8b788289 IQuest-Coder-V1-40B-Loop-Instruct-f16.gguf
45
+ a9323b7ca583a842737dd4ec1f7422101c68ededf2a86c75a8d5e9da70eaae06 IQuest-Coder-V1-40B-Loop-Instruct-q8_0.gguf
46
+ a15814998038c8c6334f69bc11b776bce785350c933ce95fe9c41c4c7ec708ba IQuest-Coder-V1-40B-Loop-Instruct-q5_k_m.gguf
47
+ b665999c8d6660ba0ea29cbbb072056052ef965a233ef65661ec16a16b39a9e3 IQuest-Coder-V1-40B-Loop-Instruct-q4_k_m.gguf
48
+ ```
49
+
50
+ ## Current Status
51
+
52
+ ⚠️ **IMPORTANT**: These GGUF files contain all loop attention tensors and metadata, but **runtime support is pending** in llama.cpp.
53
+
54
+ **What Works**:
55
+ - ✅ GGUF files load correctly
56
+ - ✅ All 883 tensors preserved (721 standard + 160 loop gates + 2 embeddings)
57
+ - ✅ Loop parameters stored in metadata (loop_num=2, loop_window_size=64)
58
+ - ✅ Quantization tested and verified
59
+
60
+ **What's Pending**:
61
+ - ⏳ Loop attention runtime implementation in llama.cpp
62
+ - ⏳ Inference will fail until runtime support added
63
+
64
+ ## Technical Details
65
+
66
+ ### Loop Architecture
67
+
68
+ The IQuest Loop Coder uses a **recurrent transformer design** with:
69
+ - **loop_num**: 2 iterations of attention per layer
70
+ - **loop_window_size**: 64 token attention window
71
+ - **Gate Projections**: 160 additional tensors for gating mechanism
72
+ - `blk.-79.loop_gate.weight`: [128, 40] per layer
73
+ - `blk.-79.loop_gate.bias`: [40] per layer
74
+
75
+ ### Conversion Process
76
+
77
+ Converted using custom `IQuestLoopCoderModel` class:
78
+ - Inherits from LlamaModel (compatible base architecture)
79
+ - Maps gate_projections to GGUF tensor names
80
+ - Preserves loop parameters in metadata
81
+ - Tested with all quantization levels
82
+
83
+ Conversion time: **2-7 minutes** per quantization on NVIDIA GB10
84
+
85
+ ## Usage (When Runtime Support Available)
86
+
87
+ ### With Ollama
88
+
89
+ ```bash
90
+ # Create Modelfile
91
+ cat > Modelfile <<EOF
92
+ FROM IQuest-Coder-V1-40B-Loop-Instruct-q4_k_m.gguf
93
+ PARAMETER temperature 0.7
94
+ PARAMETER top_p 0.9
95
+ EOF
96
+
97
+ # Create model
98
+ ollama create iquest-loop:q4 -f Modelfile
99
+
100
+ # Run
101
+ ollama run iquest-loop:q4 "Write a Python function for fibonacci"
102
+ ```
103
+
104
+ ### With llama.cpp
105
+
106
+ ```bash
107
+ ./llama-cli \
108
+ --model IQuest-Coder-V1-40B-Loop-Instruct-q4_k_m.gguf \
109
+ --prompt "def fibonacci(n):" \
110
+ --n-predict 100
111
+ ```
112
+
113
+ **Note**: Will fail until loop attention runtime is implemented.
114
+
115
+ ## Implementation Status
116
+
117
+ ### Converter ✅ (Complete)
118
+
119
+ The converter successfully creates GGUF files with all loop-specific components:
120
+ - Custom tensor mapping for gate projections
121
+ - Loop parameter metadata storage
122
+ - Tested with 40B parameter model
123
+ - All quantization levels verified
124
+
125
+ ### Runtime ⏳ (In Progress)
126
+
127
+ Runtime implementation requires:
128
+ 1. C++ implementation of loop attention mechanism
129
+ 2. CUDA kernels for GPU acceleration
130
+ 3. Integration into llama.cpp forward pass
131
+ 4. Testing against PyTorch reference
132
+
133
+ See `RUNTIME_IMPLEMENTATION_GUIDE.md` for detailed implementation requirements.
134
+
135
+ ## Contribution & Support
136
+
137
+ - **Converter Implementation**: Available in llama.cpp PR (pending)
138
+ - **Runtime Development**: Community contribution welcome
139
+ - **Technical Documentation**: Included in this repository
140
+
141
+ ## Resources
142
+
143
+ - **Original Model**: [IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct](https://huggingface.co/IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct)
144
+ - **Conversion Guide**: See `CONVERSION_SUMMARY.md`
145
+ - **Runtime Guide**: See `RUNTIME_IMPLEMENTATION_GUIDE.md`
146
+ - **llama.cpp Issue**: [#18517](https://github.com/ggerganov/llama.cpp/issues/18517)
147
+ - **vLLM Support**: [PR #31575](https://github.com/vllm-project/vllm/pull/31575)
148
+
149
+ ## Credits
150
+
151
+ - **Original Model**: IQuestLab team
152
+ - **Conversion**: Avarok (Dual DGX Spark hardware)
153
+ - **Tools**: llama.cpp (ggerganov), vLLM project
154
+ - **Achievement**: First Loop-Instruct variant in GGUF format
155
+
156
+ ## License
157
+
158
+ Same as base model: IQuestCoder license
159
+ - Link: https://huggingface.co/IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct
160
+
161
+ ## Acknowledgments
162
+
163
+ This is the first publicly available GGUF conversion of an IQuest Loop-Instruct model. The conversion preserves all architectural components needed for loop attention, paving the way for future runtime support.
164
+
165
+ ---
166
+
167
+ **Status**: Converter complete ✅ | Runtime pending ⏳ | Community contributions welcome 🤝