Eve-2 Swarm

Eve-2-MoE-IT-272M

The Foundation for Nano-Scale Swarm Intelligence

Eve-2-MoE-IT-272M is a 272M parameter instruction-tuned model designed as the foundational base for the Eve Swarmβ€”a collection of hyper-specialized, CPU-deployable adapters.

Unlike massive generalist LLMs, Eve is built for deterministic-ish transformations. She is designed to be "overfitted" into specialists that perform one job perfectly (e.g., SQL generation, Git commits, JSON extraction) with negligible latency and cost.

Author: Anthony Maio / Public Outputs


🐝 The Eve Swarm (Specialist Ecosystem)

This model serves as the parent for the following Full Fine-Tuned (FFT) specialists. All members were trained on an NVIDIA H200 SXM to ensure optimal embedding alignment.

Specialist Model Task Dataset Source Size Loss
Eve-NanoFunction Strict JSON Function Calling – produces valid JSON outputs from natural language. glaive-function-calling-v2 272M <0.4 (35k samples)
Eve-NanoSummary Conversation Summarization – condenses dialogues into concise summaries. knkarthick/dialogsum 272M <1.0 (12.5k samples)
Eve-NanoCommit Git Diff β†’ Commit Message – writes conventional commits from raw code diffs. bigcode/commitpackft 272M <1.0 (20k samples)
Eve-NanoExtract Text β†’ Structured Data – extracts parameters/entities into strict JSON schemas. Salesforce/xlam-function-calling 272M <0.4 (20k samples)
Eve-NanoSQL Natural Language β†’ SQL – converts questions to SQL using table context. b-mc2/sql-create-context 272M <0.2 (25k samples)
Eve-NanoPrompt Prompt Expansion – expands simple ideas into rich image gen prompts. Stable-Diffusion-Prompts 272M <1.0 (15k samples)
Eve-NanoRouter Intent Classification – routes user queries to the correct swarm member. bitext/customer-support 272M <0.3 (25k samples)
Eve-NanoPII PII Redaction – identifies and masks sensitive entities. ai4privacy/pii-masking-200k 272M <0.1 (35k samples)

Technical Specifications

Architecture: Nano-MoE

Eve uses a DeepSeek-style Mixture-of-Experts architecture scaled down to the "Nano" range.

  • Total Parameters: 272M
  • Active Parameters: ~80M (per token)
  • Experts: 8 routed + 1 shared
  • Top-K: 2
  • Context Window: 2048 tokens
  • Vocab: 50,304 (GPT-2 compatible)

Training Config (H200 SXM)

This model was trained using Full Fine-Tuning (FFT). We found that LoRA was insufficient for aligning the embeddings of such a small model; unfreezing all weights yielded significant performance gains. You don't need to use a H200, it's absurdly overkill. I love it.

  • Hardware: NVIDIA H200 SXM (141GB VRAM)
  • Method: Full Fine-Tuning (No PEFT/LoRA)
  • Precision: bfloat16
  • Batch Size: 128 (Global)
  • Learning Rate: 5e-5 (Cosine Schedule)
  • Collator: DataCollatorForCompletionOnlyLM (Masked User Prompts)

How to Tune Eve 2

If you want to train your own Eve specialist, follow these rules derived from our H200 experiments:

  1. Abandon LoRA: For a 272M model, LoRA restricts the embedding space too much. You have the VRAM; use Full Fine-Tuning.
  2. Mask User Prompts: You must use a collator that masks the prompt (loss only on Assistant: response). If the model calculates loss on the "User:" instructions, it wastes capacity learning English grammar instead of the task.
  3. Batch Size Matters: We saturated the H200 with batch_size=128. High batch sizes stabilize the gradients for these volatile small architectures.
  4. Dataset Quality > Quantity:
  • Bad: 100k rows of scraped web text.
  • Good: 10k rows of "Input -> Ideal Output" pairs.
  • Sweet Spot: 2 Epochs. Do not over-train; these models memorize quickly.

πŸš€ Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "anthonym21/Eve-2-MoE-IT-272M"

# Load with trust_remote_code=True for custom MoE architecture
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    trust_remote_code=True, 
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Standard formatting
prompt = "User: Explain the concept of Semantic Quantization.\nAssistant:"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=150, do_sample=True, temperature=0.6)

print(tokenizer.decode(out[0], skip_special_tokens=True))

Citation

@misc{maio2026eve2moeit,
  author = {Maio, Anthony D.},
  title = {Eve-2-MoE-IT-272M: A Nano-MoE Foundation for Swarm Intelligence},
  year = {2026},
  publisher = {Maio, Anthony D.},
  url = {https://huggingface.co/anthonym21/Eve-2-MoE-IT-272M}
}
Downloads last month
254
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for anthonym21/Eve-2-MoE-IT-272M

Finetuned
(1)
this model
Finetunes
8 models

Dataset used to train anthonym21/Eve-2-MoE-IT-272M

Collection including anthonym21/Eve-2-MoE-IT-272M