TensaLang Example Models
The example models' weights for TensaLang, a programming language for LLM inference.
What is TensaLang?
A programming language for LLM inference. Implement new models with ease and compile through MLIR to CUDA, CPU-SIMD, MLX, or ROCm. The runtime is the program.
fn attention_f16(q: Tensor<f32, [D]>,
key_cache: Tensor<f16, [L, SeqLen, KvDim]>,
value_cache: Tensor<f16, [L, SeqLen, KvDim]>,
layer: i32, pos: i32, H: i32, scale: f32) -> Tensor<f32, [D]>
with tile=[8, 64], parallel=[h, t] {
var att: Tensor<f32, [H, SeqLen]> = zeros([H, SeqLen])
# Compute attention scores
att[h, t] = if t > pos { -inf } else {
sum(i) q[h * Dh + i] * (key_cache[layer, t, h * Dh + i] as f32) * scale
}
var weights: Tensor<f32, [H, SeqLen]> = softmax(att)
# ... weighted sum over values
}
From the creator of Datarus-R1-14B.
Example models
| Model | Parameters | Format | Description |
|---|---|---|---|
llama2_7b_f16.safetensors |
7B | FP16 | Llama2-7B |
qwen2.5_coder_0.5b_bf16.safetensors |
0.5B | BF16 | Qwen2.5-Coder-0.5B-Instruct |
Usage
# Clone TensaLang
git clone https://github.com/BenChaliah/Tensa-Lang.git
cd Tensa-Lang && ./build.sh
# Download models
huggingface-cli download BenChaliah/TensaLang-models --local-dir ./models
# Or download a specific model
huggingface-cli download BenChaliah/TensaLang-models llama2_7b_f16.safetensors --local-dir ./Llama2-assets
Run Llama2
./bin/tensalang-run examples/llama2_manual_tiling_fp16.tl \
--model Llama2-assets/llama2_7b_f16.safetensors \
--tokenizer Llama2-assets/tokenizer.json \
--prompt "Once upon a time" \
--target cuda \
--steps 128 \
--fused-attention 2 \
--cuda-arch sm_89
Run Qwen2.5-Coder
./bin/tensalang-run examples/qwen25_coder_bf16.tl \
--model Qwen25-assets/qwen2.5_coder_0.5b_bf16.safetensors \
--tokenizer Qwen25-assets/tokenizer.json \
--prompt "def quicksort(arr):" \
--target cuda \
--steps 64 \
--cuda-arch sm_89
Source
Weights converted from:
License
Model weights retain their original licenses. TensaLang compiler is MIT licensed.
Links
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support