File size: 2,834 Bytes
0ed0df8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
license: mit
tags:
  - tensalang
  - llm-inference
  - mlir
  - safetensors
language:
  - en
---

# TensaLang Example Models

The example models' weights for [TensaLang](https://github.com/BenChaliah/Tensa-Lang), a programming language for LLM inference.

## What is TensaLang?

A programming language for LLM inference. Implement new models with ease and compile through MLIR to CUDA, CPU-SIMD, MLX, or ROCm. The runtime is the program. 

```
fn attention_f16(q: Tensor<f32, [D]>,
                 key_cache: Tensor<f16, [L, SeqLen, KvDim]>,
                 value_cache: Tensor<f16, [L, SeqLen, KvDim]>,
                 layer: i32, pos: i32, H: i32, scale: f32) -> Tensor<f32, [D]>
    with tile=[8, 64], parallel=[h, t] {

  var att: Tensor<f32, [H, SeqLen]> = zeros([H, SeqLen])

  # Compute attention scores
  att[h, t] = if t > pos { -inf } else {
    sum(i) q[h * Dh + i] * (key_cache[layer, t, h * Dh + i] as f32) * scale
  }

  var weights: Tensor<f32, [H, SeqLen]> = softmax(att)
  # ... weighted sum over values
}
```

From the creator of [Datarus-R1-14B](https://huggingface.co/Datarus/Datarus-R1-14B).

## Example models

| Model | Parameters | Format | Description |
|-------|------------|--------|-------------|
| `llama2_7b_f16.safetensors` | 7B | FP16 | Llama2-7B |
| `qwen2.5_coder_0.5b_bf16.safetensors` | 0.5B | BF16 | Qwen2.5-Coder-0.5B-Instruct |

## Usage

```bash
# Clone TensaLang
git clone https://github.com/BenChaliah/Tensa-Lang.git
cd Tensa-Lang && ./build.sh

# Download models
huggingface-cli download BenChaliah/TensaLang-models --local-dir ./models

# Or download a specific model
huggingface-cli download BenChaliah/TensaLang-models llama2_7b_f16.safetensors --local-dir ./Llama2-assets
```

### Run Llama2

```bash
./bin/tensalang-run examples/llama2_manual_tiling_fp16.tl \
  --model Llama2-assets/llama2_7b_f16.safetensors \
  --tokenizer Llama2-assets/tokenizer.json \
  --prompt "Once upon a time" \
  --target cuda \
  --steps 128 \
  --fused-attention 2 \
  --cuda-arch sm_89
```

### Run Qwen2.5-Coder

```bash
./bin/tensalang-run examples/qwen25_coder_bf16.tl \
  --model Qwen25-assets/qwen2.5_coder_0.5b_bf16.safetensors \
  --tokenizer Qwen25-assets/tokenizer.json \
  --prompt "def quicksort(arr):" \
  --target cuda \
  --steps 64 \
  --cuda-arch sm_89
```

## Source

Weights converted from:
- [meta-llama/Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b)
- [Qwen/Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct)

## License

Model weights retain their original licenses. TensaLang compiler is MIT licensed.

## Links

- [TensaLang GitHub](https://github.com/BenChaliah/Tensa-Lang)
- [Documentation](https://tensa-lang.org/docs.html)
- [Website](https://tensa-lang.org/)
- [Datarus-R1-14B](https://huggingface.co/Datarus/Datarus-R1-14B)