File size: 2,078 Bytes
526b20e
 
 
 
 
 
 
 
 
 
 
65ca721
 
 
 
 
 
 
 
 
 
526b20e
65ca721
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
title: README
emoji: 🚀
colorFrom: indigo
colorTo: yellow
sdk: static
pinned: true
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/6634fc18d94421fe1c02f97c/48breLiEtms1xr-xl36dc.png
short_description: Embedl - efficient AI for the edge
---

# Embedl

Embedl develops advanced tools and algorithms for **Edge AI**. Our mission is to make AI models run 
**faster**, **more energy-efficient**, and **reliably across diverse hardware platforms**, while 
significantly reducing development time.

We help teams deploy high-performance AI on real-world, resource-constrained devices.


### **Embedl Models** ([Community](https://github.com/embedl/embedl-models))

Pre-optimized models that can be used **off-the-shelf** or customized for specific hardware target
supported by the [embedl-models](https://github.com/embedl/embedl-models) package.

**First release highlights:**

- The **fastest Small Language Models (SLMs)** using **[FlashHead](https://www.embedl.com/knowledge/ultra-efficient-llms-embedls-breakthrough-for-on-device-ai)**,
  a novel architectural improvement to the language-model head
- Works with popular models like **Llama, Gemma, and Qwen**
- Provides speedups on top of:
  - Quantization  
  - Flash Attention  
  - Other standard optimizations

Device: Nvidia Jetson Thor
| Model                                            | Generation speed (tokens/s) |
| ------------------------------------------------ | ----------------------------|
| embedl/Llama-3.2-3B-Instruct-FlashHead-W4A16     | 100                         |
| Llama-3.2-3B-Instruct-W4A16*                     | 80                          |
| RedHatAI/Llama-3.2-3B-Instruct-FP8               | 64                          |
| meta-llama/Llama-3.2-3B-Instruct                 | 37                          |

*Embedl quantized model for benchmarking similar to the FlashHead-W4A16 but without
the faster FlashHead and custom generation loop.

---

## Contact

**Headquarters (Sweden)**  
Gamla Almedalsvägen 39  
412 63 Gothenburg, Sweden  

**Email:** info@embedl.com