Tensa-Lang / README.md
BenChaliah's picture
Create README.md
0ed0df8 verified
metadata
license: mit
tags:
  - tensalang
  - llm-inference
  - mlir
  - safetensors
language:
  - en

TensaLang Example Models

The example models' weights for TensaLang, a programming language for LLM inference.

What is TensaLang?

A programming language for LLM inference. Implement new models with ease and compile through MLIR to CUDA, CPU-SIMD, MLX, or ROCm. The runtime is the program.

fn attention_f16(q: Tensor<f32, [D]>,
                 key_cache: Tensor<f16, [L, SeqLen, KvDim]>,
                 value_cache: Tensor<f16, [L, SeqLen, KvDim]>,
                 layer: i32, pos: i32, H: i32, scale: f32) -> Tensor<f32, [D]>
    with tile=[8, 64], parallel=[h, t] {

  var att: Tensor<f32, [H, SeqLen]> = zeros([H, SeqLen])

  # Compute attention scores
  att[h, t] = if t > pos { -inf } else {
    sum(i) q[h * Dh + i] * (key_cache[layer, t, h * Dh + i] as f32) * scale
  }

  var weights: Tensor<f32, [H, SeqLen]> = softmax(att)
  # ... weighted sum over values
}

From the creator of Datarus-R1-14B.

Example models

Model Parameters Format Description
llama2_7b_f16.safetensors 7B FP16 Llama2-7B
qwen2.5_coder_0.5b_bf16.safetensors 0.5B BF16 Qwen2.5-Coder-0.5B-Instruct

Usage

# Clone TensaLang
git clone https://github.com/BenChaliah/Tensa-Lang.git
cd Tensa-Lang && ./build.sh

# Download models
huggingface-cli download BenChaliah/TensaLang-models --local-dir ./models

# Or download a specific model
huggingface-cli download BenChaliah/TensaLang-models llama2_7b_f16.safetensors --local-dir ./Llama2-assets

Run Llama2

./bin/tensalang-run examples/llama2_manual_tiling_fp16.tl \
  --model Llama2-assets/llama2_7b_f16.safetensors \
  --tokenizer Llama2-assets/tokenizer.json \
  --prompt "Once upon a time" \
  --target cuda \
  --steps 128 \
  --fused-attention 2 \
  --cuda-arch sm_89

Run Qwen2.5-Coder

./bin/tensalang-run examples/qwen25_coder_bf16.tl \
  --model Qwen25-assets/qwen2.5_coder_0.5b_bf16.safetensors \
  --tokenizer Qwen25-assets/tokenizer.json \
  --prompt "def quicksort(arr):" \
  --target cuda \
  --steps 64 \
  --cuda-arch sm_89

Source

Weights converted from:

License

Model weights retain their original licenses. TensaLang compiler is MIT licensed.

Links