euhidaman
/

embervlm-tiny

@@ -10,7 +10,7 @@ tags:
 - tiny-vlm
 - repvit
 - tinyllm
-- stage2
 base_model:
 - tinyllm
 library_name: transformers
@@ -21,23 +21,24 @@ pipeline_tag: image-text-to-text
 **🔥 Efficient Vision-Language Model for Edge Deployment & Robotic Applications**
-This model is currently in training - **STAGE2 (Epoch 1)**.
 ## 📊 Current Training Status
-- **Stage**: Multimodal Instruction Tuning - Following complex instructions
 - **Epoch**: 1
-- **Last Updated**: 2026-01-28 15:04:17 UTC
 ### Latest Metrics
-- **instruction_loss**: 0.0000
-- **loss**: 5.1544
 ## 🏗️ Model Architecture
 - **Size**: Tiny (~35M parameters)
-- **Total Parameters**: 37,237,665
-- **Trainable Parameters**: 23,254,337 (62.4%)
 - **Vision Encoder**: RepViT-M0.9 (~5M params)
 - **Language Model**: TinyLLM-30M (30M params)
@@ -50,7 +51,7 @@ EmberVLM follows a 4-stage training curriculum:
 3. ✅ **Stage 3: Robot Fleet Selection** - Task-robot matching
 4. ⏳ **Stage 4: Chain-of-Thought Reasoning** - Reasoning generation
-**Current Stage**: STAGE2
 ## 💻 Usage
@@ -125,5 +126,5 @@ Apache 2.0
 ---
-**Note**: This is a checkpoint from stage2 training (epoch 1).
 The model will be updated after each epoch with improved performance.

 - tiny-vlm
 - repvit
 - tinyllm
+- stage1
 base_model:
 - tinyllm
 library_name: transformers
 **🔥 Efficient Vision-Language Model for Edge Deployment & Robotic Applications**
+This model is currently in training - **STAGE1 (Epoch 1)**.
 ## 📊 Current Training Status
+- **Stage**: Visual-Language Alignment - Learning to ground vision and language
 - **Epoch**: 1
+- **Last Updated**: 2026-02-01 16:00:11 UTC
 ### Latest Metrics
+- **captioning_loss**: 8.5561
+- **contrastive_loss**: 2.7994
+- **loss**: 5.6777
 ## 🏗️ Model Architecture
 - **Size**: Tiny (~35M parameters)
+- **Total Parameters**: 40,196,257
+- **Trainable Parameters**: 26,212,929 (65.2%)
 - **Vision Encoder**: RepViT-M0.9 (~5M params)
 - **Language Model**: TinyLLM-30M (30M params)
 3. ✅ **Stage 3: Robot Fleet Selection** - Task-robot matching
 4. ⏳ **Stage 4: Chain-of-Thought Reasoning** - Reasoning generation
+**Current Stage**: STAGE1
 ## 💻 Usage
 ---
+**Note**: This is a checkpoint from stage1 training (epoch 1).
 The model will be updated after each epoch with improved performance.

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ed94af8def51ab039a8be350aa1df789d1b6f2a3b10d54e42fd0d01f22d1ec6b
-size 88817547

 version https://git-lfs.github.com/spec/v1
+oid sha256:c6be11d39bd7c475a6e51883249a0a9ba175c11618e424678674cb2ef649fe66
+size 100663623

training_info.json CHANGED Viewed

@@ -1,14 +1,15 @@
 {
-  "stage": "stage2",
   "epoch": 1,
   "metrics": {
-    "loss": 5.154378942541174,
-    "instruction_loss": 0.0
   },
   "carbon_emissions_kg": 0.0,
-  "timestamp": "2026-01-28T15:04:17.887928",
   "vision_backbone": "repvit",
   "language_backbone": "tinyllm",
-  "total_parameters": 37237665,
-  "trainable_parameters": 23254337
 }

 {
+  "stage": "stage1",
   "epoch": 1,
   "metrics": {
+    "loss": 5.6777140368586005,
+    "contrastive_loss": 2.7993588654891304,
+    "captioning_loss": 8.556068959443465
   },
   "carbon_emissions_kg": 0.0,
+  "timestamp": "2026-02-01T16:00:11.852746",
   "vision_backbone": "repvit",
   "language_backbone": "tinyllm",
+  "total_parameters": 40196257,
+  "trainable_parameters": 26212929
 }