DatarusAI
/

Tensa-Lang

Model card Files Files and versions

Tensa-Lang / README.md

BenChaliah's picture

Create README.md

0ed0df8 verified 18 days ago

|

history blame contribute delete

2.83 kB

	---
	license: mit
	tags:
	- tensalang
	- llm-inference
	- mlir
	- safetensors
	language:
	- en
	---

	# TensaLang Example Models

	The example models' weights for [TensaLang](https://github.com/BenChaliah/Tensa-Lang), a programming language for LLM inference.

	## What is TensaLang?

	A programming language for LLM inference. Implement new models with ease and compile through MLIR to CUDA, CPU-SIMD, MLX, or ROCm. The runtime is the program.

	```
	fn attention_f16(q: Tensor<f32, [D]>,
	key_cache: Tensor<f16, [L, SeqLen, KvDim]>,
	value_cache: Tensor<f16, [L, SeqLen, KvDim]>,
	layer: i32, pos: i32, H: i32, scale: f32) -> Tensor<f32, [D]>
	with tile=[8, 64], parallel=[h, t] {

	var att: Tensor<f32, [H, SeqLen]> = zeros([H, SeqLen])

	# Compute attention scores
	att[h, t] = if t > pos { -inf } else {
	sum(i) q[h * Dh + i] * (key_cache[layer, t, h * Dh + i] as f32) * scale
	}

	var weights: Tensor<f32, [H, SeqLen]> = softmax(att)
	# ... weighted sum over values
	}
	```

	From the creator of [Datarus-R1-14B](https://huggingface.co/Datarus/Datarus-R1-14B).

	## Example models

	\| Model \| Parameters \| Format \| Description \|
	\|-------\|------------\|--------\|-------------\|
	\| `llama2_7b_f16.safetensors` \| 7B \| FP16 \| Llama2-7B \|
	\| `qwen2.5_coder_0.5b_bf16.safetensors` \| 0.5B \| BF16 \| Qwen2.5-Coder-0.5B-Instruct \|

	## Usage

	```bash
	# Clone TensaLang
	git clone https://github.com/BenChaliah/Tensa-Lang.git
	cd Tensa-Lang && ./build.sh

	# Download models
	huggingface-cli download BenChaliah/TensaLang-models --local-dir ./models

	# Or download a specific model
	huggingface-cli download BenChaliah/TensaLang-models llama2_7b_f16.safetensors --local-dir ./Llama2-assets
	```

	### Run Llama2

	```bash
	./bin/tensalang-run examples/llama2_manual_tiling_fp16.tl \
	--model Llama2-assets/llama2_7b_f16.safetensors \
	--tokenizer Llama2-assets/tokenizer.json \
	--prompt "Once upon a time" \
	--target cuda \
	--steps 128 \
	--fused-attention 2 \
	--cuda-arch sm_89
	```

	### Run Qwen2.5-Coder

	```bash
	./bin/tensalang-run examples/qwen25_coder_bf16.tl \
	--model Qwen25-assets/qwen2.5_coder_0.5b_bf16.safetensors \
	--tokenizer Qwen25-assets/tokenizer.json \
	--prompt "def quicksort(arr):" \
	--target cuda \
	--steps 64 \
	--cuda-arch sm_89
	```

	## Source

	Weights converted from:
	- [meta-llama/Llama-2-7b](https://huggingface.co/meta-llama/Llama-2-7b)
	- [Qwen/Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct)

	## License

	Model weights retain their original licenses. TensaLang compiler is MIT licensed.

	## Links

	- [TensaLang GitHub](https://github.com/BenChaliah/Tensa-Lang)
	- [Documentation](https://tensa-lang.org/docs.html)
	- [Website](https://tensa-lang.org/)
	- [Datarus-R1-14B](https://huggingface.co/Datarus/Datarus-R1-14B)