TinyLlama 1.1B Chat - Big-Endian GGUF

This is a big-endian version of TinyLlama-1.1B-Chat-v1.0 in GGUF format, optimized for IBM AIX on POWER architecture.

Model Details

  • Base Model: TinyLlama 1.1B Chat v1.0
  • Format: GGUF (Q4_K_M quantization)
  • Endianness: Big-endian (IBM Power Systems)
  • Size: 638 MB
  • License: Apache 2.0

Usage

This model is designed for use with llama-aix, a port of llama.cpp for IBM AIX.

# Download model
wget https://huggingface.co/librepower/tinyllama-1.1b-chat-be/resolve/main/tinyllama-1.1b-q4_k_m-be.gguf

# Run inference on AIX
./llama-simple -m tinyllama-1.1b-q4_k_m-be.gguf -n 128 -p "Hello, world!"

Performance

On IBM POWER9 (16 cores, 128GB RAM):

  • Speed: ~18 tokens/second
  • Memory: ~800 MB RAM

Why Big-Endian?

IBM Power Systems use big-endian byte order, while most modern systems use little-endian. This model has been converted using llama.cpp's endianness conversion tool to run natively on AIX without runtime conversion overhead.

Conversion

This model was converted from the original little-endian GGUF using:

./llama-gguf-split --convert-be model.gguf model-be.gguf

About LibrePower

Unlocking Power Systems through open source.

LibrePower brings modern AI and open-source tools to IBM Power Systems, extending the life and capabilities of enterprise infrastructure.

Related Projects

Citation

Original model by Zhang et al. (TinyLlama team):

@article{tinyllama,
  title={TinyLlama: An Open-Source Small Language Model},
  author={Zhang, Peiyuan and Guangtao, Zeng and Wang, Tianduo and Lu, Wei},
  journal={arXiv preprint arXiv:2401.02385},
  year={2024}
}
Downloads last month
12
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for librepowerai/tinyllama-1.1b-chat-be