Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +108 -44
configuration_eve.py +50 -0
modeling_eve.py +1 -43
push_to_hub.py +17 -0
tokenizer_config.json +1 -1

README.md CHANGED Viewed

@@ -1,83 +1,147 @@
 ---
 license: mit
-language:
-  - en
-pipeline_tag: text-generation
 tags:
   - pytorch
-  - safetensors
   - text-generation
-  - moe
-  - peft
   - lora
-  - instruct
-  - custom-architecture
-  - trust_remote_code
 base_model: anthonym21/Eve-2-MoE-272M
-datasets: []
 ---
-# Model Card for Eve-2-MoE-IT-272M
-<!-- Provide a quick summary of what the model is/does. -->
-Eve-2-MoE-IT-272M is an instruction-tuned (IT) variant of Eve-2-MoE-272M, packaged as a merged checkpoint for direct inference.
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This repository contains a custom MoE causal language model implemented with `transformers` remote code (see `modeling_eve.py`) and weights in `model.safetensors`.
-- **Developed by:** Anthony Maio / Making Minds AI (Independent)
-- **Model type:** Causal language model, Mixture-of-Experts (MoE)
-- **Language(s) (NLP):** English
-- **License:** MIT
-- **Finetuned from model [optional]:** `anthonym21/Eve-2-MoE-272M`
-### Model Sources [optional]
-- **Repository:** https://huggingface.co/anthonym21/Eve-2-MoE-IT-272M
-- **Base model:** https://huggingface.co/anthonym21/Eve-2-MoE-272M
-## Uses
-### Direct Use
-- Instruction-following text generation (experimental).
-- Research on small MoE models and adapter-based specialization.
-### Downstream Use [optional]
-- Train fresh LoRA adapters on top of this IT checkpoint for narrow tasks (e.g., coding, agent-style tool use, classification-style prompting).
-### Out-of-Scope Use
-- Safety-critical or high-stakes domains (medical, legal, financial advice).
-- Any use requiring guarantees about factuality, bias, or safety alignment.
-## Bias, Risks, and Limitations
-This is a small model and may hallucinate, produce incorrect information, or follow instructions unreliably. It is not presented as safety-aligned, and users should implement their own safety and validation layers.
-### Recommendations
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model and evaluate it for their specific use case.
-## How to Get Started with the Model
-Use the code below to get started with the model.
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 model_id = "anthonym21/Eve-2-MoE-IT-272M"
 tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, device_map="auto")
-prompt = "Write a short function that reverses a string in Python."
 inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
-out = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.7)
-print(tokenizer.decode(out, skip_special_tokens=True))

 ---
 license: mit
 tags:
+  - moe
+  - deepseek
+  - instruction-tuned
+  - nvidia-h200
   - pytorch
   - text-generation
+  - nano-lm
+  - edge-ai
   - lora
+  - sft
+datasets:
+  - mlabonne/open-perfectblend
 base_model: anthonym21/Eve-2-MoE-272M
+language:
+  - en
+pipeline_tag: text-generation
+library_name: transformers
 ---
+# Eve-2-MoE-IT-272M
+**Instruction-tuned** version of [Eve-2-MoE-272M](https://huggingface.co/anthonym21/Eve-2-MoE-272M), fine-tuned via **heavy LoRA** and **merged** into a standalone model.
+This is the foundation for **Eve specialist adapters**—narrow, measurable transforms that run on CPU/low VRAM.
+**Author:** [Anthony Maio](https://making-minds.ai/) / Making Minds AI Research
+## Specialist Use Cases
+The community adopts small models when (a) the task is **narrow**, (b) quality is **measurable**, and (c) deployment is **easy** (CPU/low VRAM). The best targets are "deterministic-ish transforms" with clear pass/fail.
+### Top 5 Eve Adapters (train in ~20 min each on RTX 4080)
+| Adapter | Task | Metrics |
+|---------|------|---------|
+| **Eve-JSON** | Strict structured output (function calling lite) | Parse rate, schema-valid rate, field accuracy |
+| **Eve-Extract** | Text → structured extraction (receipts, tickets, logs → JSON) | Exact match per field, F1 entities, parse+schema rate |
+| **Eve-Repair** | Fix invalid JSON, CSV quoting, normalize formats | Parse success, diff-to-gold, validator pass rate |
+| **Eve-Format** | Constraint obeyer (one paragraph, max N chars, bullet lists) | Constraint compliance %, length adherence |
+| **Eve-Router** | Intent classifier (which specialist to call + confidence) | Accuracy, calibration (ECE), abstain correctness |
+**Why these?** Crisp evals, immediate usefulness, CPU deployment. Train `r=16` LoRAs on top of this IT base.
+## Training Details
+### Training Data
+[mlabonne/open-perfectblend](https://huggingface.co/datasets/mlabonne/open-perfectblend) — ~1.2M instruction examples (math, code, chat, reasoning).
+### Training Procedure
+**Supervised fine-tuning (SFT)** via heavy LoRA, then merged.
+| Parameter | Value |
+|-----------|-------|
+| **Base Model** | [Eve-2-MoE-272M](https://huggingface.co/anthonym21/Eve-2-MoE-272M) |
+| **LoRA Rank** | 128 |
+| **LoRA Alpha** | 256 |
+| **LoRA Dropout** | 0.05 |
+| **Target Modules** | `c_attn`, `c_proj`, `w1`, `w2`, `router` |
+| **NOT Targeted** | `lm_head` |
+### Training Hyperparameters
+| Parameter | Value |
+|-----------|-------|
+```markdown
+| **Hardware** | NVIDIA H200 SXM (141 GB VRAM) |
+| **Precision** | BFloat16 |
+| **Epochs** | 1 |
+| **Batch Size** | 128 (per device, no grad accum) |
+| **Learning Rate** | 5e-5 |
+| **LR Schedule** | Cosine (3% warmup) |
+| **Weight Decay** | 0.01 |
+| **Optimizer** | AdamW |
+| **Total Steps** | ~37,000 |
+### Speeds & Sizes
+| Metric | Value |
+|--------|-------|
+| **Training Time** | ~1.7 hours (1× H200) |
+| **Throughput** | ~6 it/s (batch 128) |
+| **Model Size** | 272M params (1.09 GB bf16) |
+| **Cost** | ~$5 RunPod |
+## Lessons Learned
+- **Don't target `lm_head`**: Causes blank outputs despite low loss.
+- **r=128 @ 2e-4 LR**: Loss → 0.01 but learns nothing. Use 5e-5.
+- **Custom model must expose embeddings hooks**: PEFT checkpointing requires `get_input_embeddings()` / `get_output_embeddings()`.
+- **RunPod: Don't `pip install torch`**: Silently breaks CUDA.
+- **Broken tokenizer = fake zero loss**: Verify vocab size matches `config.vocab_size`.
+## Architecture
+| Parameter | Value |
+|-----------|-------|
+| **Params** | 272M |
+| **MoE** | 8 routed + 1 shared (top-2) |
+| **Active/Token** | ~80M |
+| **Layers** | 12 |
+| **Hidden** | 512 |
+| **Heads** | 8×64 (RoPE) |
+| **FFN** | SwiGLU 1408 |
+| **Context** | 2048 |
+| **Vocab** | 50,304 (GPT-2) |
+## Usage
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 model_id = "anthonym21/Eve-2-MoE-IT-272M"
 tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, device_map="auto")
+prompt = "User: Extract name and amount from: 'Paid John Doe $150.23'\nAssistant:"
 inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+out = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7)
+print(tokenizer.decode(out))
+```
+**Prompt format:** `User: ... \nAssistant:`
+## Limitations
+272M model: factual errors, no complex reasoning, limited knowledge. **Specialist base only.**
+## Citation
+```bibtex
+@misc{maio2026eve2moeit,
+  author = {Maio, Anthony},
+  title = {Eve-2-MoE-IT-272M: Nano-MoE for Measurable Specialist Tasks},
+  year = {2026},
+  url = {https://huggingface.co/anthonym21/Eve-2-MoE-IT-272M}
+}
+```
+## License
+MIT
+```

configuration_eve.py ADDED Viewed

	@@ -0,0 +1,50 @@

+# configuration_eve.py
+from __future__ import annotations
+from typing import Any, Optional
+from transformers import PretrainedConfig
+class EveConfig(PretrainedConfig):
+    model_type = "eve_moe"
+    attribute_map = {
+        "num_hidden_layers": "n_layer",
+        "num_attention_heads": "n_head",
+        "hidden_size": "n_embd",
+        "max_position_embeddings": "block_size",
+    }
+    def __init__(
+        self,
+        vocab_size: int = 50304,
+        n_layer: int = 12,
+        n_embd: int = 512,
+        n_head: int = 8,
+        head_dim: int = 64,
+        block_size: int = 2048,
+        num_experts: int = 8,
+        top_k: int = 2,
+        expert_intermediate_size: int = 1408,
+        shared_expert_intermediate_size: int = 1408,
+        router_aux_loss_coef: float = 0.01,
+        use_checkpointing: bool = False,
+        rope_theta: float = 10000.0,
+        **kwargs: Any,
+    ):
+        self.vocab_size = vocab_size
+        self.n_layer = n_layer
+        self.n_embd = n_embd
+        self.n_head = n_head
+        self.head_dim = head_dim
+        self.block_size = block_size
+        self.num_experts = num_experts
+        self.top_k = top_k
+        self.expert_intermediate_size = expert_intermediate_size
+        self.shared_expert_intermediate_size = shared_expert_intermediate_size
+        self.router_aux_loss_coef = router_aux_loss_coef
+        self.use_checkpointing = use_checkpointing
+        self.rope_theta = rope_theta
+        super().__init__(**kwargs)
+__all__ = ["EveConfig"]

modeling_eve.py CHANGED Viewed

@@ -25,49 +25,7 @@ from transformers import PreTrainedModel, PretrainedConfig, GenerationMixin
 from transformers.modeling_outputs import CausalLMOutputWithPast
-class EveConfig(PretrainedConfig):
-    model_type = "eve_moe"
-    # Mapping for Transformers compatibility
-    attribute_map = {
-        "num_hidden_layers": "n_layer",
-        "num_attention_heads": "n_head",
-        "hidden_size": "n_embd",
-        "max_position_embeddings": "block_size",
-    }
-    def __init__(
-        self,
-        vocab_size: int = 50304,
-        n_layer: int = 12,
-        n_embd: int = 512,
-        n_head: int = 8,
-        head_dim: int = 64,
-        block_size: int = 2048,
-        num_experts: int = 8,
-        top_k: int = 2,
-        expert_intermediate_size: int = 1408,
-        shared_expert_intermediate_size: int = 1408,
-        router_aux_loss_coef: float = 0.01,
-        use_checkpointing: bool = False,
-        rope_theta: float = 10000.0,
-        **kwargs: Any,
-    ):
-        self.vocab_size = vocab_size
-        self.n_layer = n_layer
-        self.n_embd = n_embd
-        self.n_head = n_head
-        self.head_dim = head_dim
-        self.block_size = block_size
-        self.num_experts = num_experts
-        self.top_k = top_k
-        self.expert_intermediate_size = expert_intermediate_size
-        self.shared_expert_intermediate_size = shared_expert_intermediate_size
-        self.router_aux_loss_coef = router_aux_loss_coef
-        self.use_checkpointing = use_checkpointing
-        self.rope_theta = rope_theta
-        super().__init__(**kwargs)
 class RMSNorm(nn.Module):

 from transformers.modeling_outputs import CausalLMOutputWithPast
+from .configuration_eve import EveConfig
 class RMSNorm(nn.Module):

push_to_hub.py ADDED Viewed

	@@ -0,0 +1,17 @@

+from huggingface_hub import HfApi
+api = HfApi()
+repo_id = "anthonym21/Eve-2-MoE-IT-272M"
+folder_path = "."
+print(f"Uploading {folder_path} to {repo_id}...")
+api.upload_folder(
+    folder_path=folder_path,
+    repo_id=repo_id,
+    repo_type="model",
+    ignore_patterns=[".git", ".cache", "__pycache__", "*.ipynb", "*.lock", ".DS_Store"],
+)
+print("Upload complete! You can now reload the model in your notebook.")

tokenizer_config.json CHANGED Viewed

@@ -5,7 +5,7 @@
   "eos_token": "<|endoftext|>",
   "errors": "replace",
   "is_local": false,
-  "model_max_length": 1024,
   "pad_token": "<|endoftext|>",
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>"

   "eos_token": "<|endoftext|>",
   "errors": "replace",
   "is_local": false,
+  "model_max_length": 2048,
   "pad_token": "<|endoftext|>",
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>"