Spaces:

embedl
/

README

Running

App Files Files Community

README / README.md

swaze

Update README.md

526b20e verified about 1 month ago

preview code

raw

history blame contribute delete

2.08 kB

metadata

title: README
emoji: 🚀
colorFrom: indigo
colorTo: yellow
sdk: static
pinned: true
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/6634fc18d94421fe1c02f97c/48breLiEtms1xr-xl36dc.png
short_description: Embedl - efficient AI for the edge

Embedl

Embedl develops advanced tools and algorithms for Edge AI. Our mission is to make AI models run faster, more energy-efficient, and reliably across diverse hardware platforms, while significantly reducing development time.

We help teams deploy high-performance AI on real-world, resource-constrained devices.

Embedl Models (Community)

Pre-optimized models that can be used off-the-shelf or customized for specific hardware target supported by the embedl-models package.

First release highlights:

The fastest Small Language Models (SLMs) using FlashHead, a novel architectural improvement to the language-model head
Works with popular models like Llama, Gemma, and Qwen
Provides speedups on top of:
- Quantization
- Flash Attention
- Other standard optimizations

Device: Nvidia Jetson Thor

Model	Generation speed (tokens/s)
embedl/Llama-3.2-3B-Instruct-FlashHead-W4A16	100
Llama-3.2-3B-Instruct-W4A16*	80
RedHatAI/Llama-3.2-3B-Instruct-FP8	64
meta-llama/Llama-3.2-3B-Instruct	37

*Embedl quantized model for benchmarking similar to the FlashHead-W4A16 but without the faster FlashHead and custom generation loop.

Contact

Headquarters (Sweden)
Gamla Almedalsvägen 39
412 63 Gothenburg, Sweden

Email: info@embedl.com