swaze commited on
Commit
65ca721
·
verified ·
1 Parent(s): fccce16

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +173 -7
README.md CHANGED
@@ -1,10 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: README
3
- emoji:
4
- colorFrom: indigo
5
- colorTo: yellow
6
- sdk: static
7
- pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: README
3
+ emoji: 🚀
4
+ colorFrom: indigo
5
+ colorTo: yellow
6
+ sdk: static
7
+ pinned: true
8
+ thumbnail: >-
9
+ https://cdn-uploads.huggingface.co/production/uploads/6634fc18d94421fe1c02f97c/48breLiEtms1xr-xl36dc.png
10
+ short_description: Embedl - efficient AI for the edge
11
+ ---
12
+
13
+ # Embedl
14
+
15
+ Embedl develops advanced tools and algorithms for **Edge AI**. Our mission is to make AI models run
16
+ **faster**, **more energy-efficient**, and **reliably across diverse hardware platforms**, while
17
+ significantly reducing development time.
18
+
19
+ We help teams deploy high-performance AI on real-world, resource-constrained devices.
20
+
21
  ---
22
+
23
+ ## Core Products
24
+
25
+ ### **Embedl SDK** *[Enterprise](https://www.embedl.com/sdk)*
26
+
27
+ A **source-available**, toolkit that supports the **entire Edge AI development lifecycle**.
28
+ The SDK integrates with **any hardware or toolchain**, including:
29
+
30
+ - NVIDIA GPUs (Jetson AGX Orin, Nano, Thor, Drive Thor … )
31
+ - Qualcomm & TI accelerators
32
+ - CPUs, MCUs, FPGAs, and custom accelerators
33
+
34
+ The SDK follows Embedl’s **C3PO Framework**:
35
+
36
+ - **Compatibility** – through model surgery to fix op support issues
37
+ - **Provisioning** – of hardware access, runtimes, and compilation servers
38
+ - **Pipeline** – to compile, quantize, and run on-device inference for any model and hardware
39
+ - **Profiling** – on-device latency, memory, and performance for layerwise statistics
40
+ - **Optimization** – with public and proprietary algorithms for:
41
+ - Mixed-precision quantization
42
+ - Pruning
43
+ - Neural Architecture Search (NAS)
44
+ - Knowledge Distillation
45
+
46
  ---
47
 
48
+ ### **Embedl Hub** *[Free Beta](https://hub.embedl.com)*
49
+
50
+ A cloud-based platform for **quantization, compilation, benchmarking, and deployment** on real edge
51
+ devices (Android & iOS) through device farms.
52
+
53
+ Key features:
54
+
55
+ - Run models on real devices via the cloud
56
+ - Profile latency and performance
57
+ - Track experiments and compare results across devices
58
+ - Deploy optimized models directly to edge environments
59
+
60
+ ---
61
+
62
+ ### **Embedl Visualizer** *(Enterprise)*
63
+ A powerful visualization tool for understanding model performance across the stack.
64
+
65
+ Supports:
66
+ - PyTorch & ONNX models
67
+ - Compiled artifacts (TensorRT engines, TIDL graphs, etc.)
68
+
69
+ Capabilities:
70
+ - Identify latency bottlenecks quickly
71
+ - Debug QAT issues caused by operator fusion
72
+ - Compare multiple models and configurations
73
+ - Track how a model evolves across compilation stages — from Python code to final deployable binaries
74
+
75
+ ---
76
+
77
+ ### **Embedl Models** ([Community](https://huggingface.co/embedl))
78
+
79
+ Pre-optimized models that can be used **off-the-shelf** or customized for specific hardware target
80
+ supported by the [embedl-models](https://github.com/embedl/embedl-models) package.
81
+
82
+ **First release highlights:**
83
+
84
+ - The **fastest Small Language Models (SLMs)** using **[FlashHead](https://www.embedl.com/knowledge/ultra-efficient-llms-embedls-breakthrough-for-on-device-ai)**,
85
+ a novel architectural improvement to the language-model head
86
+ - Works with popular models like **Llama, Gemma, and Qwen**
87
+ - Provides speedups on top of:
88
+ - Quantization
89
+ - Flash Attention
90
+ - Other standard optimizations
91
+
92
+ Device: Nvidia Jetson Thor
93
+ | Model | Generation speed (tokens/s) |
94
+ | ------------------------------------------------ | ----------------------------|
95
+ | embedl/Llama-3.2-3B-Instruct-FlashHead-W4A16 | 100 |
96
+ | Llama-3.2-3B-Instruct-W4A16* | 80 |
97
+ | RedHatAI/Llama-3.2-3B-Instruct-FP8 | 64 |
98
+ | meta-llama/Llama-3.2-3B-Instruct | 37 |
99
+
100
+ *Embedl quantized model for benchmarking similar to the FlashHead-W4A16 but without
101
+ the faster FlashHead and custom generation loop.
102
+
103
+ ---
104
+
105
+ ## Why It Matters
106
+
107
+ - Enables AI deployment on **resource-constrained hardware**:
108
+ - Embedded systems
109
+ - Mobile devices
110
+ - IoT
111
+ - Robotics
112
+ - Reduces **latency, memory usage, and energy consumption**, enabling real-time inference without
113
+ cloud dependence
114
+ - Saves development time through a **hardware-agnostic workflow** reusable across models and platforms
115
+ - Bridges the gap between **academic ML research** and **industrial embedded AI applications**
116
+
117
+ ---
118
+
119
+ ## Company Information
120
+
121
+ - **Founded:** 2018 (spin-out from Chalmers University of Technology)
122
+ - **Commercial Operations:** Since 2022
123
+ - **Headquarters:** Gothenburg, Sweden
124
+ - **US Office:** Palo Alto, California
125
+ - **Recognition:**
126
+ - Named to **CB Insights “AI 100” (2025)** list of the most promising private AI companies
127
+
128
+ ---
129
+
130
+ ## Typical Use Cases
131
+
132
+ Embedl is used where **real-time performance and efficiency are critical**:
133
+
134
+ - **Automotive & Autonomous Systems**
135
+ - Autonomous driving
136
+ - ADAS
137
+ - Driver monitoring
138
+ - Predictive maintenance
139
+ - *Example: Kodiak Robotics*
140
+
141
+ - **Defense & Aerospace**
142
+ - Secure, energy-constrained AI inference
143
+ - *Example: Airbus*
144
+
145
+ - **Mobile & Edge AI**
146
+ - Running deep-learning models directly on phones and embedded devices
147
+ - No cloud dependency via Embedl Hub
148
+
149
+ ---
150
+
151
+ ## How to Get Started
152
+
153
+ ### **Quick Start with Embedl Hub**
154
+ - Upload your model (PyTorch or ONNX)
155
+ - Quantize, compile, and benchmark on supported devices
156
+ - No physical hardware required
157
+
158
+ ### **Full Control with Embedl SDK**
159
+ - Integrate Embedl directly into your training and deployment pipeline
160
+ - Works with TensorRT, QNN, TIDL, and more
161
+ - Access advanced hardware-aware optimization and performance insights
162
+ - Deploy to your own infrastructure
163
+
164
+ ### **Custom & Enterprise Needs**
165
+ For tailored optimizations, specialized hardware support, and engineering collaboration, contact
166
+ Embedl for full SDK access and support.
167
+
168
+ ---
169
+
170
+ ## Contact
171
+
172
+ **Headquarters (Sweden)**
173
+ Gamla Almedalsvägen 39
174
+ 412 63 Gothenburg, Sweden
175
+
176
+ **Email:** info@embedl.com