CaaLM/CaaLM-v1-GGUF
Overview
This repository contains official GGUF quantizations of CaaLM/CaaLM-v1, provided by CaaLM.
CaaLM-v1 is a 1.5B parameter model that predicts the output of code โ without a compiler, runtime, or interpreter. It was trained on real programming languages (Python, JavaScript, Lua, COBOL) alongside 200 synthetically generated fake programming languages, enabling it to predict execution output even for languages it has never seen before.
- Original model: CaaLM/CaaLM-v1
- Base model: Qwen/Qwen2.5-1.5B
- Architecture: Qwen2
- Parameters: 1,543.7M
- License: Apache 2.0
- Task: Code output prediction (text-generation)
Available Quants
All quantizations listed below are official releases from CaaLM.
| Filename | Quantization | Description | Recommended Use |
|---|---|---|---|
CaaLM-v1-F32.gguf |
F32 | Full 32-bit float | Maximum precision, highest VRAM |
CaaLM-v1-F16.gguf |
F16 | 16-bit float | High precision, large memory footprint |
CaaLM-v1-BF16.gguf |
BF16 | Brain float 16 | Good precision, modern hardware |
CaaLM-v1-Q8_0.gguf |
Q8_0 | 8-bit quantization | Near-lossless, recommended if you have the VRAM |
CaaLM-v1-Q6_K.gguf |
Q6_K | 6-bit K-quant | Excellent quality, good balance |
CaaLM-v1-Q5_K_M.gguf |
Q5_K_M | 5-bit K-quant (medium) | Recommended โ great quality/size balance |
CaaLM-v1-Q5_K_S.gguf |
Q5_K_S | 5-bit K-quant (small) | Good quality, smaller than Q5_K_M |
CaaLM-v1-Q5_1.gguf |
Q5_1 | 5-bit legacy | Legacy format |
CaaLM-v1-Q5_0.gguf |
Q5_0 | 5-bit legacy | Legacy format |
CaaLM-v1-Q4_K_M.gguf |
Q4_K_M | 4-bit K-quant (medium) | Recommended โ best 4-bit option |
CaaLM-v1-Q4_K_S.gguf |
Q4_K_S | 4-bit K-quant (small) | Smaller than Q4_K_M, slight quality drop |
CaaLM-v1-Q4_1.gguf |
Q4_1 | 4-bit legacy | Legacy format |
CaaLM-v1-Q4_0.gguf |
Q4_0 | 4-bit legacy | Legacy format, widely compatible |
CaaLM-v1-IQ4_XS.gguf |
IQ4_XS | 4-bit iQuant (extra small) | Smaller than Q4_K_S, competitive quality |
CaaLM-v1-IQ4_NL.gguf |
IQ4_NL | 4-bit iQuant (non-linear) | Good alternative to Q4_0 |
CaaLM-v1-Q3_K_L.gguf |
Q3_K_L | 3-bit K-quant (large) | Low memory, acceptable quality |
CaaLM-v1-Q3_K_M.gguf |
Q3_K_M | 3-bit K-quant (medium) | Low memory use |
CaaLM-v1-Q3_K_S.gguf |
Q3_K_S | 3-bit K-quant (small) | Very low memory use |
CaaLM-v1-IQ3_M.gguf |
IQ3_M | 3-bit iQuant (medium) | Better than Q3_K_M at similar size |
CaaLM-v1-IQ3_S.gguf |
IQ3_S | 3-bit iQuant (small) | Very small footprint |
CaaLM-v1-Q2_K.gguf |
Q2_K | 2-bit K-quant | Minimum quality, maximum compression |
CaaLM-v1-TQ2_0.gguf |
TQ2_0 | 2-bit ternary quant | Experimental ternary quantization |
CaaLM-v1-TQ1_0.gguf |
TQ1_0 | 1-bit ternary quant | Extreme compression, experimental |
Which Quant Should I Use?
By available memory:
| Available VRAM / RAM | Recommended Quant |
|---|---|
| 6 GB+ | Q8_0 |
| 4 GB+ | Q5_K_M or Q6_K |
| 3 GB+ | Q4_K_M |
| 2 GB+ | Q3_K_M or IQ3_M |
| < 2 GB | Q2_K (quality will degrade) |
General guidance: For most users,
Q4_K_MorQ5_K_Moffer the best trade-off between file size and output quality. If you need maximum fidelity, useQ8_0orBF16.
Usage
llama.cpp
./llama-cli \
-m CaaLM-v1-Q4_K_M.gguf \
-p "Code:\na = 6\nb = 7\nprint(a * b)\n\nOutput:\n" \
--temp 0 \
-n 64
Ollama
# Create a Modelfile
cat > Modelfile <<EOF
FROM ./CaaLM-v1-Q4_K_M.gguf
PARAMETER temperature 0
PARAMETER stop "<|im_end|>"
SYSTEM "You predict the output of code snippets."
EOF
ollama create caalm-v1 -f Modelfile
ollama run caalm-v1
Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama(
model_path="CaaLM-v1-Q4_K_M.gguf",
n_ctx=512,
)
def predict_output(code: str) -> str:
prompt = f"Code:\n{code}\n\nOutput:\n"
result = llm(
prompt,
max_tokens=128,
temperature=0,
stop=["<|im_end|>", "\n\n\n"],
)
return result["choices"][0]["text"].strip()
# Real language
print(predict_output("a = 6\nb = 7\nprint(a * b)"))
# โ 42
# Novel fake language
print(predict_output("STORE X := 10\nSTORE Y := 5\nSPEAK X + Y"))
# โ 15
Input Format
Always use the following prompt format โ the model completes the Output: section:
Code:
<your code here>
Output:
Example โ Python
Code:
a = 10
b = 20
print(a + b)
Output:
30
Example โ Novel Fake Language (never seen during training)
Code:
SCRIBBLE @x BECOMES 7
SCRIBBLE @y BECOMES 3
YELL @x + @y
Output:
10
Performance
Overall benchmark accuracy: 96.2% (50/52 tests)
| Category | Accuracy | Passed/Total |
|---|---|---|
| Real: Python | 100% | 10/10 |
| Real: JavaScript | 100% | 8/8 |
| Real: Lua | 100% | 6/6 |
| Real: COBOL | 75% | 3/4 |
| Novel Fake: Tier 1 (assign + print) | 100% | 8/8 |
| Novel Fake: Tier 2 (conditionals) | 86% | 6/7 |
| Novel Fake: Tier 3 (loops) | 100% | 4/4 |
| Edge Cases | 100% | 5/5 |
For full benchmark details and known failure cases, see the original model card.
Supported Operations
The model reliably handles:
- Variable assignment and arithmetic
- Print / output statements
- Conditionals (if/else)
- While loops with accumulator patterns
- String output
- Basic error behavior (empty output when conditions not met)
It does not reliably handle: functions, recursion, file I/O, complex data structures, pipes, or multi-line string manipulation.
Limitations
- No actual code execution โ outputs are predictions, not guarantees
- If-without-else edge cases may produce hallucinated else branches
- COBOL numeric padding format is inconsistent
- Long programs may degrade in accuracy as state complexity grows
- Context window is limited to ~512 tokens
- Quantization at Q3 and below may introduce additional errors vs. the original model
Model Lineage
| Model | Base | Description |
|---|---|---|
| LaaLM-v1 | T5-base | Fine-tuned to simulate Linux shell commands |
| LaaLM-exp-v1 | Qwen 3B | Conversational Linux terminal emulation |
| CaaLM-v1 | Qwen 1.5B | Language-agnostic code output prediction (this model) |
License
Apache 2.0 โ inherited from the Qwen 2.5 base model and the original CaaLM-v1.
Links
- Original model: CaaLM/CaaLM-v1
- Demo Space: CaaLM-v1-Demo
- Base model: Qwen/Qwen2.5-1.5B
- Downloads last month
- 1,224
1-bit
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
32-bit
