HyperMambaLM-300M / README.md

hoanghai2110

Update README.md

ec8c6b1 verified 6 months ago

preview code

raw

history blame contribute delete

2.53 kB

metadata

tags:
  - ssm
  - mamba
  - meta-learning
  - few-shot
  - pytorch
  - experimental
license: mit
library_name: transformers
pipeline_tag: text-generation
model_type: ssm

HyperMambaLM-300M

⚠️ This is an architecture-only repository – no pretrained weights are available yet.

HyperMambaLM is a research prototype combining modern state-space modeling with meta-learning components.
Inspired by Mamba, but extended with additional mechanisms for few-shot adaptation, neuro-symbolic reasoning, and progressive learning.

🧠 Highlights

🌀 Mamba-style SSM: Parallel scan for efficient sequence modeling
🧬 Meta-Learning (MAML): Learns to adapt with few examples
🧠 Neuro-Symbolic Layer: Combines neural networks with logic reasoning
🌱 Progressive & Continual Learning: Learns without forgetting
💡 Adaptive Precision: Smart compute control
🧩 Built for: NAS, federated learning, knowledge distillation...

📂 Files included

File	Description
`config.json`	Model hyperparameters
`modeling_hypermamba.py`	Core model definition
`modeling_utils.py`	(Optional) Utility components
`demo.py`	Quick usage test
`__init__.py`	Python module loader
`README.md`	This file

🚀 Quickstart (Colab / Local)

📌 This model is not yet trained, so only the architecture is available.

# Step 1: Download model code (if not cloned)
!wget https://huggingface.co/hoanghai2110/HyperMambaLM-300M/resolve/main/modeling_hypermamba.py

# Step 2: Import and initialize
from modeling_hypermamba import HyperMambaLM, HyperMambaConfig

config = HyperMambaConfig.from_pretrained("hoanghai2110/HyperMambaLM-300M")
model = HyperMambaLM(config)

# Step 3: Run a dummy forward pass
import torch
input_ids = torch.randint(0, config.vocab_size, (1, 16))
output = model(input_ids)

print("✅ Output shape:", output.logits.shape)  # [1, 16, vocab_size]