--- title: README emoji: 🚀 colorFrom: indigo colorTo: yellow sdk: static pinned: true thumbnail: >- https://cdn-uploads.huggingface.co/production/uploads/6634fc18d94421fe1c02f97c/48breLiEtms1xr-xl36dc.png short_description: Embedl - efficient AI for the edge --- # Embedl Embedl develops advanced tools and algorithms for **Edge AI**. Our mission is to make AI models run **faster**, **more energy-efficient**, and **reliably across diverse hardware platforms**, while significantly reducing development time. We help teams deploy high-performance AI on real-world, resource-constrained devices. ### **Embedl Models** ([Community](https://github.com/embedl/embedl-models)) Pre-optimized models that can be used **off-the-shelf** or customized for specific hardware target supported by the [embedl-models](https://github.com/embedl/embedl-models) package. **First release highlights:** - The **fastest Small Language Models (SLMs)** using **[FlashHead](https://www.embedl.com/knowledge/ultra-efficient-llms-embedls-breakthrough-for-on-device-ai)**, a novel architectural improvement to the language-model head - Works with popular models like **Llama, Gemma, and Qwen** - Provides speedups on top of: - Quantization - Flash Attention - Other standard optimizations Device: Nvidia Jetson Thor | Model | Generation speed (tokens/s) | | ------------------------------------------------ | ----------------------------| | embedl/Llama-3.2-3B-Instruct-FlashHead-W4A16 | 100 | | Llama-3.2-3B-Instruct-W4A16* | 80 | | RedHatAI/Llama-3.2-3B-Instruct-FP8 | 64 | | meta-llama/Llama-3.2-3B-Instruct | 37 | *Embedl quantized model for benchmarking similar to the FlashHead-W4A16 but without the faster FlashHead and custom generation loop. --- ## Contact **Headquarters (Sweden)** Gamla Almedalsvägen 39 412 63 Gothenburg, Sweden **Email:** info@embedl.com