eyad-silx
/

400

PyTorch

hfa

Model card Files Files and versions

xet

Community

eyad-silx commited on Sep 15, 2025

Commit

2cdc30b

verified ·

1 Parent(s): 80b7891

Add HFA checkpoint analysis and parameter mappings

Browse files

Files changed (3) hide show

README.md +23 -59
config.json +9 -12
model_analysis.json +53 -0

README.md CHANGED Viewed

@@ -1,69 +1,33 @@
----
-license: mit
-tags:
-- pytorch
-- attention-mechanism
-- long-context
-- hfa
-- hierarchical-flow-anchoring
-language:
-- en
-pipeline_tag: text-generation
----
-# HFA Model - Hierarchical Flow Anchoring
-This model implements Hierarchical Flow Anchoring (HFA), a novel attention mechanism that combines temporal evolution with discrete memory anchoring for efficient long-context processing.
-## Model Details
-- **Architecture**: Hierarchical Flow Anchoring (HFA)
-- **Parameters**: ~Unknown
-- **Training Step**: 220,000
-- **Context Length**: 2048 tokens
-- **Checkpoint Frequency**: 64
-## Key Features
-- **Linear Complexity**: O(n) scaling for long sequences
-- **Adaptive Scaling Laws**: Learnable efficiency parameters that reduce computational cost during training
-- **Position-Free Design**: Eliminates positional embeddings through temporal evolution
-- **Memory Anchoring**: Discrete checkpoint memories for long-term retention
-## Usage
 ```python
-from transformers import AutoModel, AutoTokenizer
-import torch
-# Load model and tokenizer
-model = AutoModel.from_pretrained("eyad-silx/400")
-tokenizer = AutoTokenizer.from_pretrained("eyad-silx/400")
-# Example usage
-inputs = tokenizer("Your text here", return_tensors="pt")
-outputs = model(**inputs)
-```
-## Training Details
-- **Dataset**: 4B tokens
-- **Training Steps**: 220,000
-- **Tokens per Second**: 8825 (final, showing 2.3% improvement during training)
-- **Architecture**: Hybrid design with temporal evolution components
-## Citation
-If you use this model, please cite:
-```bibtex
-@article{hfa2024,
-  title={Hierarchical Flow Anchoring: A Novel Attention Mechanism for Long-Context Processing},
-  author={Your Name},
-  year={2024}
-}
 ```
-## License
-MIT License

+# HFA Model Checkpoint Analysis
+## Model Architecture
+- **Type**: Hierarchical Flow Anchoring (HFA)
+- **Parameters**: ~20M
+- **Layers**: 6 HFA layers
+- **Hidden Size**: 256
+- **Vocab Size**: 128,000 (DeepSeek-V3 tokenizer)
+## Checkpoint Structure
+The checkpoint contains a nested `model_state_dict` with 274 parameters.
+## Parameter Mapping
+To load this checkpoint:
 ```python
+checkpoint = torch.load("pytorch_model.bin")
+state_dict = checkpoint['model_state_dict']
+# Map parameter names:
+# hfa_layers.X.* -> blocks.X.attention.hierarchical_flow.*
+# token_embedding -> token_embedding
+# lm_head -> lm_head
+# layer_norm -> layer_norm
+model.load_state_dict(state_dict, strict=False)
 ```
+## Usage
+See `model_analysis.json` for detailed parameter mappings and loading instructions.
+## Performance Issues
+Note: Current checkpoint shows extremely low prediction probabilities (~0.000008), indicating potential parameter loading issues.

config.json CHANGED Viewed

@@ -1,17 +1,14 @@
 {
-  "model_type": "hfa",
   "architectures": [
-    "HFAModel"
   ],
-  "d_model": 512,
-  "n_layers": 6,
-  "n_heads": 8,
-  "d_ff": 2048,
-  "vocab_size": 50257,
-  "max_position_embeddings": Null ,
-  "checkpoint_frequency": 64,
-  "memory_decay": 0.95,
-  "temporal_evolution_rate": 0.1,
   "torch_dtype": "float32",
-  "transformers_version": "4.21.0"
 }

 {
   "architectures": [
+    "HFALanguageModel"
   ],
+  "model_type": "hfa",
+  "vocab_size": 128000,
+  "hidden_size": 256,
+  "num_hidden_layers": 6,
+  "num_attention_heads": 8,
+  "intermediate_size": 1024,
+  "max_position_embeddings": 715,
   "torch_dtype": "float32",
+  "transformers_version": "4.36.0"
 }

model_analysis.json ADDED Viewed

	@@ -0,0 +1,53 @@

+{
+  "checkpoint_path": "/QuasarV4/checkpoints/step_220000",
+  "analysis_timestamp": "0",
+  "model_architecture": "HFA (Hierarchical Flow Anchoring)",
+  "parameter_mappings": {
+    "checkpoint_params": [
+      "token_embedding.weight",
+      "blocks.0.attention.hierarchical_flow.evolution_rate",
+      "blocks.0.attention.hierarchical_flow.memory_decay",
+      "blocks.0.attention.hierarchical_flow.attention_memory",
+      "blocks.0.attention.hierarchical_flow.q_proj.weight",
+      "blocks.0.attention.hierarchical_flow.k_proj.weight",
+      "blocks.0.attention.hierarchical_flow.v_proj.weight",
+      "blocks.0.attention.hierarchical_flow.out_proj.weight",
+      "blocks.0.attention.hierarchical_flow.attention_evolution.weight",
+      "blocks.0.attention.hierarchical_flow.attention_evolution.bias",
+      "blocks.0.attention.hierarchical_flow.memory_gate.weight",
+      "blocks.0.attention.hierarchical_flow.memory_gate.bias",
+      "blocks.0.attention.hierarchical_flow.temporal_dynamics.weight",
+      "blocks.0.attention.hierarchical_flow.temporal_dynamics.bias",
+      "blocks.0.attention.hierarchical_flow.checkpoint_trigger.checkpoint_frequency",
+      "blocks.0.attention.hierarchical_flow.checkpoint_trigger.entropy_analyzer.weight",
+      "blocks.0.attention.hierarchical_flow.checkpoint_trigger.entropy_analyzer.bias",
+      "blocks.0.attention.hierarchical_flow.checkpoint_trigger.semantic_detector.0.weight",
+      "blocks.0.attention.hierarchical_flow.checkpoint_trigger.semantic_detector.0.bias",
+      "blocks.0.attention.hierarchical_flow.checkpoint_trigger.semantic_detector.2.weight"
+    ]
+  },
+  "checkpoint_structure": {
+    "type": "nested_model_state_dict",
+    "num_parameters": 274
+  },
+  "loading_instructions": [
+    "1. Load checkpoint with torch.load()",
+    "2. Extract model_state_dict from checkpoint dictionary",
+    "3. Map parameter names:",
+    "   - hfa_layers.X -> blocks.X.attention.hierarchical_flow",
+    "   - token_embedding -> token_embedding (direct match)",
+    "   - lm_head -> lm_head (direct match)",
+    "   - layer_norm -> layer_norm (direct match)",
+    "4. Use strict=False for loading to handle mismatches"
+  ],
+  "training_metadata": {
+    "step": 220000,
+    "epoch": 1,
+    "train_loss": 4.591190338134766,
+    "val_loss": 4.591190338134766,
+    "timestamp": 1757907906.1673536,
+    "save_duration": 2.1279516220092773,
+    "checkpoint_type": "hybrid_fast",
+    "file_size_mb": 881.7255353927612
+  }
+}