Tensa-Lang / README.md
BenChaliah's picture
Create README.md
0ed0df8 verified
---
license: mit
tags:
- tensalang
- llm-inference
- mlir
- safetensors
language:
- en
---
# TensaLang Example Models
The example models' weights for [TensaLang](https://github.com/BenChaliah/Tensa-Lang), a programming language for LLM inference.
## What is TensaLang?
A programming language for LLM inference. Implement new models with ease and compile through MLIR to CUDA, CPU-SIMD, MLX, or ROCm. The runtime is the program.
```
fn attention_f16(q: Tensor<f32, [D]>,
key_cache: Tensor<f16, [L, SeqLen, KvDim]>,
value_cache: Tensor<f16, [L, SeqLen, KvDim]>,
layer: i32, pos: i32, H: i32, scale: f32) -> Tensor<f32, [D]>
with tile=[8, 64], parallel=[h, t] {
var att: Tensor<f32, [H, SeqLen]> = zeros([H, SeqLen])
# Compute attention scores
att[h, t] = if t > pos { -inf } else {
sum(i) q[h * Dh + i] * (key_cache[layer, t, h * Dh + i] as f32) * scale
}
var weights: Tensor<f32, [H, SeqLen]> = softmax(att)
# ... weighted sum over values
}
```
From the creator of [Datarus-R1-14B](https://huggingface.co/Datarus/Datarus-R1-14B).
## Example models
| Model | Parameters | Format | Description |
|-------|------------|--------|-------------|
| `llama2_7b_f16.safetensors` | 7B | FP16 | Llama2-7B |
| `qwen2.5_coder_0.5b_bf16.safetensors` | 0.5B | BF16 | Qwen2.5-Coder-0.5B-Instruct |
## Usage
```bash
# Clone TensaLang
git clone https://github.com/BenChaliah/Tensa-Lang.git
cd Tensa-Lang && ./build.sh
# Download models
huggingface-cli download BenChaliah/TensaLang-models --local-dir ./models
# Or download a specific model
huggingface-cli download BenChaliah/TensaLang-models llama2_7b_f16.safetensors --local-dir ./Llama2-assets
```
### Run Llama2
```bash
./bin/tensalang-run examples/llama2_manual_tiling_fp16.tl \
--model Llama2-assets/llama2_7b_f16.safetensors \
--tokenizer Llama2-assets/tokenizer.json \
--prompt "Once upon a time" \
--target cuda \
--steps 128 \
--fused-attention 2 \
--cuda-arch sm_89
```
### Run Qwen2.5-Coder
```bash
./bin/tensalang-run examples/qwen25_coder_bf16.tl \
--model Qwen25-assets/qwen2.5_coder_0.5b_bf16.safetensors \
--tokenizer Qwen25-assets/tokenizer.json \
--prompt "def quicksort(arr):" \
--target cuda \
--steps 64 \
--cuda-arch sm_89
```
## Source
Weights converted from:
- [meta-llama/Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b)
- [Qwen/Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct)
## License
Model weights retain their original licenses. TensaLang compiler is MIT licensed.
## Links
- [TensaLang GitHub](https://github.com/BenChaliah/Tensa-Lang)
- [Documentation](https://tensa-lang.org/docs.html)
- [Website](https://tensa-lang.org/)
- [Datarus-R1-14B](https://huggingface.co/Datarus/Datarus-R1-14B)