Instructions to use WhaletechAI/W1-4B-dLLM-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WhaletechAI/W1-4B-dLLM-Base with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WhaletechAI/W1-4B-dLLM-Base", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| # whaletech.ai / W1-4B-dLLM-Base | |
| <!-- markdownlint-disable first-line-h1 --> | |
| <!-- markdownlint-disable html --> | |
| <div align="center"> | |
| <img src="https://huggingface.co/WhaletechAI/W1-4B-dLLM-Base/resolve/main/assets/banner.png" width="760" alt="Whaletech banner" /> | |
| </div> | |
| <div align="center" style="line-height: 1; margin-top: 12px;"> | |
| <a href="https://huggingface.co/WhaletechAI/W1-4B-dLLM-Base/resolve/main/assets/wechat.jpg" target="_blank" style="margin: 2px;"> | |
| <img alt="WeChat" src="https://img.shields.io/badge/WeChat-WhaleTech%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;" /> | |
| </a> | |
| <a href="https://x.com/whaletech_AI_" target="_blank" style="margin: 2px;"> | |
| <img alt="X" src="https://img.shields.io/badge/X-@whaletech__AI__-ffffff?logo=x&logoColor=white&labelColor=555555&color=ffffff" style="display: inline-block; vertical-align: middle;" /> | |
| </a> | |
| <a href="mailto:info@whaletech.ai" style="margin: 2px;"> | |
| <img alt="Email" src="https://img.shields.io/badge/%E2%9C%89%EF%B8%8F%20Email-info%40whaletech.ai-1f6feb" style="display: inline-block; vertical-align: middle;" /> | |
| </a> | |
| </div> | |
| <div align="center" style="margin-top: 16px;"> | |
| <img src="https://huggingface.co/WhaletechAI/W1-4B-dLLM-Base/resolve/main/assets/whaledemo.gif" width="720" alt="W1-4B dLLM Demo" /> | |
| </div> | |
| > Technical usage and inference docs: [TECHNICAL_README.md](https://huggingface.co/WhaletechAI/W1-4B-dLLM-Base/blob/main/TECHNICAL_README.md) | |
| ## ✨ 概述 | Overview | |
| W1-dLLM 是一个扩散式语言模型。 | |
| W1-dLLM is a diffusion language model. | |
| 它建立在一个简单、但如今越来越清楚的想法之上: | |
| It is built on a simple idea that is now becoming increasingly clear: | |
| > **强大的语言建模,不一定只能依赖自回归。** | |
| > **Powerful language modeling does not have to be autoregressive.** | |
| 而今天,这个判断第一次不再只是一个猜想。 | |
| And today, for the first time, that idea feels like more than a hypothesis. | |
| 我们看到了一组非常有分量的结果同时出现: | |
| We are seeing a remarkably meaningful set of results emerge together: | |
| - 稳定的优化过程 | |
| stable optimization | |
| - 在灵活长度文本上的真实扩展能力 | |
| genuine scaling on variable-length text | |
| - 强烈的自我修改与纠错能力 | |
| strong self-revision and error-correction behavior | |
| - 小参数模型上的明显泛化 | |
| clear generalization in a relatively small model | |
| - 以及越来越明确的信号:扩散式语言模型正在走出过去“半自回归”的过渡状态 | |
| and an increasingly strong signal that diffusion language models are moving beyond their old “semi-autoregressive” transitional phase | |
| 它开始展现的,是一种更本质的能力: | |
| What is beginning to emerge is something more fundamental: | |
| > **真正的并行生成能力。** | |
| > **True parallel generation.** | |
| 这个模型不只是会生成文本。 | |
| This model does not merely generate text. | |
| 它更像是在隐空间里**先形成、再修改、再细化**自己的表达,然后才把它稳定地落实到语言之中。 | |
| It behaves more like a system that **first forms, then revises, then refines** its expression in latent space before finally settling it into language. | |
| 在某个中期 checkpoint,当模型第一次展现出这种能力的时候,全公司的人聚集在同一个屏幕前,仿佛在看新生儿的降临。 | |
| At one mid-training checkpoint, when the model first showed this behavior, the entire company gathered around the same screen as if witnessing the birth of a newborn. | |
| 而这件事,非常重要。 | |
| And that moment mattered. | |
| --- | |
| ## 🧠 模型结构 | Model Architecture | |
| W1-dLLM 采用基于扩散 Transformer 的语言建模结构,而不是标准的自回归解码器。 | |
| W1-dLLM uses a diffusion Transformer architecture for language modeling rather than a standard autoregressive decoder. | |
| ### 核心配置 | Core Configuration | |
| - **48 个扩散 Transformer 模块** | |
| **48 diffusion Transformer blocks** | |
| - 若结合输入嵌入层与最终输出层,可视为 **50 层模型** | |
| **50 layers total** if counting the input embedding layer and final output layer | |
| - **词表大小:64,512** | |
| **Vocabulary size: 64,512** | |
| - **隐藏维度:2,048** | |
| **Hidden size: 2,048** | |
| - **注意力内部维度:3,072** | |
| **Attention inner dimension: 3,072** | |
| - **前馈网络维度:7,168** | |
| **FFN dimension: 7,168** | |
| ### 关键组件 | Architectural Ingredients | |
| - 时间步嵌入 | |
| timestep embeddings | |
| - 旋转位置编码 | |
| rotary positional encoding (RoPE) | |
| - 自适应层归一化调制 | |
| adaptive LayerNorm modulation | |
| - 均方根归一化 | |
| RMSNorm | |
| - SwiGLU 前馈网络 | |
| SwiGLU feed-forward networks | |
| ### 我们认为最关键的几点 | What We Believe Matters Most | |
| #### 1. 扩散 Transformer 结构本身非常重要 | The diffusion Transformer structure itself is essential | |
| 这不是“换一个外壳”,而是能力成立的基础。 | |
| This is not a cosmetic wrapper change. It is the foundation that makes these capabilities possible. | |
| #### 2. 自适应 LayerNorm 调制非常关键 | Adaptive LayerNorm modulation is critical | |
| 没有它,扩散过程中的条件控制和训练稳定性都会明显受影响。 | |
| Without it, conditional control in the diffusion process and training stability both degrade significantly. | |
| #### 3. 一些经典自回归 Transformer 优化可以迁移 | Some classic autoregressive Transformer optimizations transfer well | |
| 例如,扩大注意力内部维度依然有效。 | |
| For example, increasing the attention inner dimension remains beneficial. | |
| #### 4. 双向注意力必须从预训练阶段就一起设计 | Bidirectional attention must be designed in from pretraining | |
| > **双向能力不能只在推理阶段“补”出来。** | |
| > **Bidirectionality cannot simply be patched in at inference time.** | |
| 如果希望模型真正并行地思考、修订、收敛,就必须在结构和预训练阶段一起设计。 | |
| If one wishes the model to truly think, revise, and converge in parallel, that capability must be baked into both the architecture and the pretraining setup from the start. | |
| --- | |
| ## ⚙️ 训练特征 | Training Characteristics | |
| 这次训练里,一个非常重要的工程结论是: | |
| One of the clearest engineering conclusions from this training run is: | |
| > **扩散式语言模型需要足够大的 batch size。** | |
| > **Diffusion language models require sufficiently large batch sizes.** | |
| 在我们的训练中,当 batch size 超过 **500** 之后,模型的训练状态、稳定性和效率都会明显更好。 | |
| In our training, once batch size exceeded **500**, the model’s training behavior, stability, and efficiency all improved noticeably. | |
| 这和很多自回归训练经验并不完全一致,也说明扩散式训练在规模化时有自己独特的最优区间。 | |
| This differs from many familiar autoregressive training heuristics and suggests that diffusion training has its own scaling regime. | |
| ### 硬件利用 | Hardware Utilization | |
| 我们同时观察到,训练过程中 **模型浮点运算利用率稳定在约 45%**。 | |
| We also observed **model FLOP utilization remaining around 45%** throughout training. | |
| 这个数字非常重要。 | |
| That number matters. | |
| 它说明这条路线并不只是“理论上并行”,而是已经在真实硬件利用上展现出实际价值。 | |
| It shows that this path is not merely *theoretically parallel* — it is already demonstrating real engineering value on actual hardware. | |
| 对于一个大批次、强并行的扩散语言模型来说,**45% MFU** 已经进入很有工程意义的区间。 | |
| For a large-batch, strongly parallel diffusion language model, **45% MFU** is already meaningful. | |
| 而这显然还不是上限。 | |
| And it is clearly not the upper bound. | |
| ### 优化过程 | Optimization Behavior | |
| 训练损失在整个过程中持续下降,而且异常稳定: | |
| Training loss declined continuously throughout the run, and did so with unusual stability: | |
| - 没有坍塌 | |
| no collapse | |
| - 没有回滚 | |
| no rollback | |
| - 没有明显的不稳定阶段 | |
| no obvious instability event | |
| 更重要的是,即使在同一批数据上继续训练,损失仍然能继续往下走。 | |
| More importantly, even when continuing to train on the same data, the loss still kept going down. | |
| 这给了我们一个很强的信号: | |
| That gives us a strong signal: | |
| > **扩散式训练可能显著缓解数据枯竭问题。** | |
| > **Diffusion-style training may significantly alleviate data exhaustion.** | |
| 高质量数据依然极其重要,甚至对小参数扩散语言模型来说可能更重要。 | |
| High-quality data remains extremely important — perhaps even more so for smaller diffusion language models. | |
| 但这次我们看到的是: | |
| But what we are seeing now is this: | |
| > **高质量数据不只值钱,而且可以被更充分、更反复地利用。** | |
| > **High-quality data is not only valuable — it may also be reusable more fully and more repeatedly.** | |
| --- | |
| ## 🔍 我们发现了什么 | What We Found | |
| ### 1️⃣ 隐空间思考不是表象,它真的存在 | |
| ### 1️⃣ Latent-space thinking is not an illusion — it is real | |
| 从预训练开始,我们就有意识地让模型学习一种更偏全局观的隐空间思考范式。 | |
| From the beginning of pretraining, we intentionally encouraged the model to learn a more global style of latent-space reasoning. | |
| 最终出现的,不只是更像样的表面思维链。 | |
| What emerged was not merely a more convincing surface-level chain of thought. | |
| 模型似乎真的会在表达之前,**先组织、回看、重构自己的想法**。 | |
| Before expressing anything, the model appears to genuinely **organize, revisit, and reconstruct its own ideas**. | |
| 更让我们兴奋的是,我们观察到模型在这一过程中会在不同语言之间跳转: | |
| Even more excitingly, we observed the model briefly switching between different languages during this process: | |
| - 英文 / English | |
| - 韩文 / Korean | |
| - 日文 / Japanese | |
| - 中文 / Chinese | |
| - 以及一些符号化表达 / and sometimes symbolic fragments | |
| 这些内容会短暂出现,然后再逐渐稳定到某一种最终语言形态。 | |
| These appear transiently and then gradually settle into a final language form. | |
| 这并不像简单的多语言混杂。 | |
| This does not look like simple multilingual mixing. | |
| 它更像是模型在潜在表示中先寻找更合适的内部表达,再把它压缩到最终可见的语言表面。 | |
| It looks more like the model is searching for a better internal representation before compressing it into final visible language. | |
| > **模型的思考,并不等于它最终输出的文本。** | |
| > **The model’s thinking is not the same thing as its final output text.** | |
| --- | |
| ### 2️⃣ 它不是只会续写,而是真的会一边并行一边修改 | |
| ### 2️⃣ It does not just continue text — it revises while generating in parallel | |
| 这次训练里最令人兴奋的一点,是模型展现出了很强的: | |
| One of the most exciting observations from this run was the model’s strong ability to: | |
| - 修改 | |
| revise | |
| - 纠错 | |
| correct itself | |
| - 反思 | |
| reflect | |
| - 收敛到更好的答案 | |
| converge toward better answers | |
| 但真正关键的是,这种修改发生在**并行生成过程本身**。 | |
| But the important part is that this revision happens **inside the parallel generation process itself**. | |
| 这不是先线性生成,再回头补救。 | |
| This is not linear generation followed by a repair pass. | |
| 模型更像是在整体展开的同时,不断修订并收敛自己的当前预测。 | |
| Instead, the model seems to expand globally while continuously revising and converging on its current predictions. | |
| 在刷榜的时候,我们看到模型: | |
| During benchmarking, we often saw the model: | |
| - 最后才提交最终答案 | |
| delay final answer commitment until late | |
| - 反复修复答案候选 | |
| repeatedly refine answer candidates | |
| - 在结束前修补最终选择 | |
| repair its final answer choice near the end | |
| 这和很多“披着扩散外壳”的半自回归系统很不一样。 | |
| This feels fundamentally different from many earlier systems that were still semi-autoregressive underneath a diffusion shell. | |
| 我们的结论越来越明确: | |
| Our conclusion is becoming increasingly clear: | |
| > **双向注意力的能力,不能只在推理时补出来。** | |
| > **The power of bidirectional attention cannot be recovered only at inference time.** | |
| > **它必须从预训练开始就和模型结构一起被设计进去。** | |
| > **It has to be designed jointly with the model architecture from the pretraining stage onward.** | |
| 一旦这件事做对了,模型表现出来的就不再是半自回归。 | |
| Once this is done correctly, what emerges is no longer semi-autoregression. | |
| 而是更接近: | |
| It is much closer to: | |
| > **真正的并行扩散生成。** | |
| > **True parallel diffusion generation.** | |
| --- | |
| ### 3️⃣ 灵活长度文本,这次真的被做出来了 | |
| ### 3️⃣ We really made flexible-length text work | |
| 训练高质量扩散式语言模型本身就很难。 | |
| Training a high-quality diffusion language model is already difficult. | |
| 而能处理**灵活长度文本**,则更难。 | |
| Handling **flexible-length text well** is even harder. | |
| 不同提示词天然就应该对应不同长度的回答。 | |
| Different prompts should naturally lead to responses of different lengths. | |
| 所以长度不是边缘变量。 | |
| So length is not a peripheral variable. | |
| 它本身就是质量的核心变量之一。 | |
| It is one of the core variables of quality itself. | |
| 当我们真正把“长度”当成一等公民去处理之后,模型在灵活文本上的扩展能力才明显出现。 | |
| Once we started treating length as a first-class modeling problem, the model’s flexible-length generation capability improved significantly. | |
| > **长度不是边角细节,而是建模质量的核心问题。** | |
| > **Length is not a side detail — it is a core modeling problem.** | |
| --- | |
| ### 4️⃣ 小模型也能泛化,这一点非常关键 | |
| ### 4️⃣ Small models can generalize too — and that matters a lot | |
| 即使在相对不大的参数规模下,模型依然展现出清晰的泛化能力。 | |
| Even at relatively modest scale, the model still demonstrated clear generalization ability. | |
| 而且只需要少量监督微调数据,就可以把模型调到我们想要的方向上。 | |
| And with only a small amount of supervised fine-tuning data, it could be steered toward the behaviors we wanted. | |
| 这会直接改变部署叙事。 | |
| This changes the deployment story directly. | |
| 它意味着扩散式语言模型不一定只适合做研究。 | |
| It suggests that diffusion language models may not be useful only for research. | |
| 它也很可能适合一条非常实用的路线: | |
| They may also fit a very practical path: | |
| - 快 / fast | |
| - 便宜 / cheap | |
| - 好用 / usable | |
| --- | |
| ### 5️⃣ 并行能力终于不再只是口号 | |
| ### 5️⃣ Parallelism is finally more than a slogan | |
| 过去很多扩散式语言模型系统,本质上还是半自回归模型。 | |
| In the past, many diffusion language model systems were still semi-autoregressive in essence. | |
| 但这次不一样。 | |
| This time feels different. | |
| 我们看到的,是更强烈、更明确的**真实并行生成能力**。 | |
| What we see now is a much stronger and clearer signal of **real parallel generation**. | |
| 而且这种并行,并不只是“同时吐出多个词元”。 | |
| And this parallelism is not simply about producing multiple tokens at once. | |
| 它意味着三件事同时发生: | |
| It means three things happening at the same time: | |
| - 同时生成 | |
| generate simultaneously | |
| - 同时修改 | |
| revise simultaneously | |
| - 同时收敛 | |
| converge simultaneously | |
| > **同时生成,同时修改,同时收敛。** | |
| > **Simultaneously generate. Simultaneously revise. Simultaneously converge.** | |
| 这直接关系到: | |
| This directly affects: | |
| - 吞吐 / throughput | |
| - 时延 / latency | |
| - 成本 / cost | |
| - 产品可行性 / product viability | |
| 在词元消耗越来越大的今天,这件事尤其有吸引力。 | |
| In a world of exploding token consumption, that becomes especially compelling. | |
| --- | |
| ### 6️⃣ 自蒸馏不是想象 | |
| ### 6️⃣ Self-distillation is not imaginary | |
| 我们越来越相信,**自蒸馏** 会成为这条路线继续释放并行潜力的关键一步。 | |
| We increasingly believe that **self-distillation** will be a key next step in unlocking even more of this paradigm’s parallel potential. | |
| 今天看到的并行水平,还远远没有达到上限。 | |
| The level of parallelism we see today is still far from the ceiling. | |
| 通过自蒸馏,我们有机会: | |
| Through self-distillation, we may be able to: | |
| - 进一步减少扩散步数 | |
| reduce the number of diffusion steps | |
| - 让每一步处理更多词元 | |
| process more tokens per step | |
| - 降低完成生成所需的总步数 | |
| lower the total number of steps needed for generation | |
| - 在保持质量的同时提升吞吐 | |
| improve throughput while preserving quality | |
| 换句话说,就是: | |
| In practical terms, that means: | |
| > **每一步更大、总步数更少、速度更快、词元更便宜。** | |
| > **Bigger steps, fewer total steps, faster generation, cheaper tokens.** | |
| 而这件事的意义,不只是推理更便宜。 | |
| And the significance goes beyond inference cost alone. | |
| 它意味着扩散式语言模型的效率曲线,还远远没有被走完。 | |
| It suggests that the efficiency curve of diffusion language models is still far from fully explored. | |
| > **今天看到的速度,还不是这条路线真正的速度。** | |
| > **The speed we see today is not yet the true speed of this paradigm.** | |
| --- | |
| ## 🌍 为什么这件事重要 | Why This Matters | |
| 过去很多人会把扩散式语言模型看成: | |
| For a long time, diffusion language models were often treated as: | |
| - 一种有趣的替代路线 | |
| an interesting alternative | |
| - 一种研究方向 | |
| a research curiosity | |
| - 或者一种不同的解码方法 | |
| or merely a different decoding strategy | |
| 但这次的结果指向的是更深的一件事: | |
| But these results point to something deeper: | |
| > **它可能真的是一种新的语言模型范式。** | |
| > **This may truly be a new language modeling paradigm.** | |
| 最重要的不是某一个单点指标。 | |
| What matters is not one isolated metric. | |
| 而是一组很罕见的优势,可能正在**同时成立**: | |
| It is the rare possibility that many advantages may hold **at the same time**: | |
| - 更便宜、更稳定的训练动态 | |
| cheaper, more stable training dynamics | |
| - 更高效的 GPU batch 利用 | |
| more efficient GPU batch utilization | |
| - 更强的吞吐能力 | |
| stronger throughput | |
| - 更低的首词元返回时延 | |
| lower time-to-first-token latency | |
| - 更便宜的词元单价 | |
| cheaper per-token pricing | |
| - 通往隐空间思考的自然路径 | |
| a natural path toward latent-space thinking | |
| - 通往自我修改的自然路径 | |
| a natural path toward self-revision | |
| - 更灵活的建模方式 | |
| more flexible modeling | |
| - 更广阔的后训练优化空间 | |
| a large post-training optimization space still waiting to be explored | |
| 说得更直接一点: | |
| To put it plainly: | |
| > **扩散式语言模型的价值,现在已经不是哲学问题。** | |
| > **The value of diffusion language models is no longer philosophical.** | |
| > **它正在成为工程现实。** | |
| > **It is becoming an engineering reality.** | |
| --- | |
| ## 🖼️ 走向更大的融合 | Toward Broader Fusion | |
| 我们也正在积极探索**文字、图像等更大规模融合的扩散模型**。 | |
| We are also actively exploring **larger-scale fused diffusion models across text, images, and beyond**. | |
| 我们相信,扩散式建模不只适用于语言。 | |
| We do not believe diffusion-based modeling is limited to language alone. | |
| 它同样可能为今天以视觉—语言—动作模型为基础的具身智能系统,尤其是机器人模型,提供另一条替代性的思路。 | |
| It may also provide an alternative path for embodied intelligence systems built on vision-language-action foundations, especially in robotics. | |
| 所以,今天看到的并不是终点。 | |
| So what we are seeing today is not the end. | |
| > **甚至可能还不是真正的开始。** | |
| > **It may not even be the real beginning yet.** | |
| --- | |
| ## ⚠️ 局限性 | Limitations | |
| 我们也希望对这件事保持清醒。 | |
| We also want to stay clear-eyed about where things stand. | |
| 这套系统并不意味着扩散模型已经在所有维度上替代了自回归模型。 | |
| This system does **not** mean diffusion models have already replaced autoregressive models across every dimension. | |
| 眼下最令人兴奋的能力仍然很早期: | |
| The most exciting capabilities are still early: | |
| - 隐空间推理 | |
| latent-space reasoning | |
| - 迭代式自我修改 | |
| iterative self-revision | |
| - 跨模态统一扩散建模 | |
| unified cross-modal diffusion modeling | |
| 这些方向都还缺少足够成熟、足够标准化的评估体系。 | |
| All of these still lack sufficiently mature and standardized evaluation frameworks. | |
| 所以我们应该保持野心,也保持诚实。 | |
| So we should remain ambitious, but also honest. | |
| 即便如此,眼下的信号已经足够强,让我们可以明确说出这句话: | |
| Even so, the signal is already strong enough for us to say this clearly: | |
| > **扩散式语言模型的上限,远远还没有被真正看到。** | |
| > **The upper bound of diffusion language models is still far from truly visible.** | |
| --- | |
| ## 💥 我们相信什么 | What We Believe | |
| 我们相信,自己正在做一件真正重要的事情。 | |
| We believe we are working on something genuinely important. | |
| 我们希望把词元的价格打下来,把**快、便宜、好用**刻进模型的基因里。 | |
| We want to drive token prices down dramatically and bake **fast, cheap, and useful** into the model’s DNA. | |
| > **价格屠夫,一定有市场。** | |
| > **There is a market for being a ruthless price disruptor.** | |
| 而这条路,还有太多空间没有被探索。 | |
| And there is still so much left unexplored. | |
| --- | |
| ## 🤝 加入我们 | Join Us | |
| 如果你也拥有: | |
| If you also value: | |
| - 诚实可靠的品格 | |
| honesty and reliability | |
| - 强烈的好奇心 | |
| deep curiosity | |
| - 愿意探索未知的勇气 | |
| the courage to explore the unknown | |
| 欢迎考虑加入我们。 | |
| we would love for you to consider joining us. | |
| 在蓝色鲸鱼,我们拥有: | |
| At Whaletech, we have: | |
| - 充足的 GPU 资源 | |
| abundant GPU resources | |
| - 高效的迭代节奏 | |
| fast iteration speed | |
| - 优秀而投入的同事 | |
| exceptional and deeply committed teammates | |
| - 始终热烈的技术氛围 | |
| an intensely energetic technical culture | |
| 期待和你一起,在更广阔的世界里,继续探索那些尚未被定义的可能。 | |
| We look forward to exploring a much larger frontier together — one that has not yet been fully defined. | |
| --- | |
| <div align="center"> | |
| <img src="https://huggingface.co/WhaletechAI/W1-4B-dLLM-Base/resolve/main/assets/banner.png" width="760" alt="Whaletech banner" /> | |
| </div> | |
| <div align="center" style="line-height: 1; margin-top: 18px;"> | |
| <a href="https://huggingface.co/WhaletechAI/W1-4B-dLLM-Base/resolve/main/assets/wechat.jpg" target="_blank" style="margin: 2px;"> | |
| <img alt="WeChat" src="https://img.shields.io/badge/WeChat-WhaleTech%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;" /> | |
| </a> | |
| <a href="https://x.com/whaletech_AI_" target="_blank" style="margin: 2px;"> | |
| <img alt="X" src="https://img.shields.io/badge/X-@whaletech__AI__-ffffff?logo=x&logoColor=white&labelColor=555555&color=ffffff" style="display: inline-block; vertical-align: middle;" /> | |
| </a> | |
| <a href="mailto:info@whaletech.ai" style="margin: 2px;"> | |
| <img alt="Email" src="https://img.shields.io/badge/%E2%9C%89%EF%B8%8F%20Email-info%40whaletech.ai-1f6feb" style="display: inline-block; vertical-align: middle;" /> | |
| </a> | |
| </div> | |