CaaLM/CaaLM-v1-GGUF

CaaLM-v1 Logo

Overview

This repository contains official GGUF quantizations of CaaLM/CaaLM-v1, provided by CaaLM.

CaaLM-v1 is a 1.5B parameter model that predicts the output of code โ€” without a compiler, runtime, or interpreter. It was trained on real programming languages (Python, JavaScript, Lua, COBOL) alongside 200 synthetically generated fake programming languages, enabling it to predict execution output even for languages it has never seen before.

  • Original model: CaaLM/CaaLM-v1
  • Base model: Qwen/Qwen2.5-1.5B
  • Architecture: Qwen2
  • Parameters: 1,543.7M
  • License: Apache 2.0
  • Task: Code output prediction (text-generation)

Available Quants

All quantizations listed below are official releases from CaaLM.

Filename Quantization Description Recommended Use
CaaLM-v1-F32.gguf F32 Full 32-bit float Maximum precision, highest VRAM
CaaLM-v1-F16.gguf F16 16-bit float High precision, large memory footprint
CaaLM-v1-BF16.gguf BF16 Brain float 16 Good precision, modern hardware
CaaLM-v1-Q8_0.gguf Q8_0 8-bit quantization Near-lossless, recommended if you have the VRAM
CaaLM-v1-Q6_K.gguf Q6_K 6-bit K-quant Excellent quality, good balance
CaaLM-v1-Q5_K_M.gguf Q5_K_M 5-bit K-quant (medium) Recommended โ€” great quality/size balance
CaaLM-v1-Q5_K_S.gguf Q5_K_S 5-bit K-quant (small) Good quality, smaller than Q5_K_M
CaaLM-v1-Q5_1.gguf Q5_1 5-bit legacy Legacy format
CaaLM-v1-Q5_0.gguf Q5_0 5-bit legacy Legacy format
CaaLM-v1-Q4_K_M.gguf Q4_K_M 4-bit K-quant (medium) Recommended โ€” best 4-bit option
CaaLM-v1-Q4_K_S.gguf Q4_K_S 4-bit K-quant (small) Smaller than Q4_K_M, slight quality drop
CaaLM-v1-Q4_1.gguf Q4_1 4-bit legacy Legacy format
CaaLM-v1-Q4_0.gguf Q4_0 4-bit legacy Legacy format, widely compatible
CaaLM-v1-IQ4_XS.gguf IQ4_XS 4-bit iQuant (extra small) Smaller than Q4_K_S, competitive quality
CaaLM-v1-IQ4_NL.gguf IQ4_NL 4-bit iQuant (non-linear) Good alternative to Q4_0
CaaLM-v1-Q3_K_L.gguf Q3_K_L 3-bit K-quant (large) Low memory, acceptable quality
CaaLM-v1-Q3_K_M.gguf Q3_K_M 3-bit K-quant (medium) Low memory use
CaaLM-v1-Q3_K_S.gguf Q3_K_S 3-bit K-quant (small) Very low memory use
CaaLM-v1-IQ3_M.gguf IQ3_M 3-bit iQuant (medium) Better than Q3_K_M at similar size
CaaLM-v1-IQ3_S.gguf IQ3_S 3-bit iQuant (small) Very small footprint
CaaLM-v1-Q2_K.gguf Q2_K 2-bit K-quant Minimum quality, maximum compression
CaaLM-v1-TQ2_0.gguf TQ2_0 2-bit ternary quant Experimental ternary quantization
CaaLM-v1-TQ1_0.gguf TQ1_0 1-bit ternary quant Extreme compression, experimental

Which Quant Should I Use?

By available memory:

Available VRAM / RAM Recommended Quant
6 GB+ Q8_0
4 GB+ Q5_K_M or Q6_K
3 GB+ Q4_K_M
2 GB+ Q3_K_M or IQ3_M
< 2 GB Q2_K (quality will degrade)

General guidance: For most users, Q4_K_M or Q5_K_M offer the best trade-off between file size and output quality. If you need maximum fidelity, use Q8_0 or BF16.


Usage

llama.cpp

./llama-cli \
  -m CaaLM-v1-Q4_K_M.gguf \
  -p "Code:\na = 6\nb = 7\nprint(a * b)\n\nOutput:\n" \
  --temp 0 \
  -n 64

Ollama

# Create a Modelfile
cat > Modelfile <<EOF
FROM ./CaaLM-v1-Q4_K_M.gguf
PARAMETER temperature 0
PARAMETER stop "<|im_end|>"
SYSTEM "You predict the output of code snippets."
EOF

ollama create caalm-v1 -f Modelfile
ollama run caalm-v1

Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(
    model_path="CaaLM-v1-Q4_K_M.gguf",
    n_ctx=512,
)

def predict_output(code: str) -> str:
    prompt = f"Code:\n{code}\n\nOutput:\n"
    result = llm(
        prompt,
        max_tokens=128,
        temperature=0,
        stop=["<|im_end|>", "\n\n\n"],
    )
    return result["choices"][0]["text"].strip()

# Real language
print(predict_output("a = 6\nb = 7\nprint(a * b)"))
# โ†’ 42

# Novel fake language
print(predict_output("STORE X := 10\nSTORE Y := 5\nSPEAK X + Y"))
# โ†’ 15

Input Format

Always use the following prompt format โ€” the model completes the Output: section:

Code:
<your code here>

Output:

Example โ€” Python

Code:
a = 10
b = 20
print(a + b)

Output:
30

Example โ€” Novel Fake Language (never seen during training)

Code:
SCRIBBLE @x BECOMES 7
SCRIBBLE @y BECOMES 3
YELL @x + @y

Output:
10

Performance

Overall benchmark accuracy: 96.2% (50/52 tests)

Category Accuracy Passed/Total
Real: Python 100% 10/10
Real: JavaScript 100% 8/8
Real: Lua 100% 6/6
Real: COBOL 75% 3/4
Novel Fake: Tier 1 (assign + print) 100% 8/8
Novel Fake: Tier 2 (conditionals) 86% 6/7
Novel Fake: Tier 3 (loops) 100% 4/4
Edge Cases 100% 5/5

For full benchmark details and known failure cases, see the original model card.


Supported Operations

The model reliably handles:

  • Variable assignment and arithmetic
  • Print / output statements
  • Conditionals (if/else)
  • While loops with accumulator patterns
  • String output
  • Basic error behavior (empty output when conditions not met)

It does not reliably handle: functions, recursion, file I/O, complex data structures, pipes, or multi-line string manipulation.


Limitations

  • No actual code execution โ€” outputs are predictions, not guarantees
  • If-without-else edge cases may produce hallucinated else branches
  • COBOL numeric padding format is inconsistent
  • Long programs may degrade in accuracy as state complexity grows
  • Context window is limited to ~512 tokens
  • Quantization at Q3 and below may introduce additional errors vs. the original model

Model Lineage

Model Base Description
LaaLM-v1 T5-base Fine-tuned to simulate Linux shell commands
LaaLM-exp-v1 Qwen 3B Conversational Linux terminal emulation
CaaLM-v1 Qwen 1.5B Language-agnostic code output prediction (this model)

License

Apache 2.0 โ€” inherited from the Qwen 2.5 base model and the original CaaLM-v1.


Links

Downloads last month
1,224
GGUF
Model size
2B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for CaaLM/CaaLM-v1-GGUF

Finetuned
CaaLM/CaaLM-v1
Quantized
(2)
this model