Add HFA checkpoint analysis and parameter mappings
Browse files- README.md +23 -59
- config.json +9 -12
- model_analysis.json +53 -0
README.md
CHANGED
|
@@ -1,69 +1,33 @@
|
|
| 1 |
-
|
| 2 |
-
license: mit
|
| 3 |
-
tags:
|
| 4 |
-
- pytorch
|
| 5 |
-
- attention-mechanism
|
| 6 |
-
- long-context
|
| 7 |
-
- hfa
|
| 8 |
-
- hierarchical-flow-anchoring
|
| 9 |
-
language:
|
| 10 |
-
- en
|
| 11 |
-
pipeline_tag: text-generation
|
| 12 |
-
---
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
-
|
|
|
|
| 17 |
|
| 18 |
-
##
|
| 19 |
-
|
| 20 |
-
- **Architecture**: Hierarchical Flow Anchoring (HFA)
|
| 21 |
-
- **Parameters**: ~Unknown
|
| 22 |
-
- **Training Step**: 220,000
|
| 23 |
-
- **Context Length**: 2048 tokens
|
| 24 |
-
- **Checkpoint Frequency**: 64
|
| 25 |
-
|
| 26 |
-
## Key Features
|
| 27 |
-
|
| 28 |
-
- **Linear Complexity**: O(n) scaling for long sequences
|
| 29 |
-
- **Adaptive Scaling Laws**: Learnable efficiency parameters that reduce computational cost during training
|
| 30 |
-
- **Position-Free Design**: Eliminates positional embeddings through temporal evolution
|
| 31 |
-
- **Memory Anchoring**: Discrete checkpoint memories for long-term retention
|
| 32 |
-
|
| 33 |
-
## Usage
|
| 34 |
|
| 35 |
```python
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
# Load model and tokenizer
|
| 40 |
-
model = AutoModel.from_pretrained("eyad-silx/400")
|
| 41 |
-
tokenizer = AutoTokenizer.from_pretrained("eyad-silx/400")
|
| 42 |
|
| 43 |
-
#
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
## Training Details
|
| 49 |
-
|
| 50 |
-
- **Dataset**: 4B tokens
|
| 51 |
-
- **Training Steps**: 220,000
|
| 52 |
-
- **Tokens per Second**: 8825 (final, showing 2.3% improvement during training)
|
| 53 |
-
- **Architecture**: Hybrid design with temporal evolution components
|
| 54 |
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
If you use this model, please cite:
|
| 58 |
-
|
| 59 |
-
```bibtex
|
| 60 |
-
@article{hfa2024,
|
| 61 |
-
title={Hierarchical Flow Anchoring: A Novel Attention Mechanism for Long-Context Processing},
|
| 62 |
-
author={Your Name},
|
| 63 |
-
year={2024}
|
| 64 |
-
}
|
| 65 |
```
|
| 66 |
|
| 67 |
-
##
|
|
|
|
| 68 |
|
| 69 |
-
|
|
|
|
|
|
| 1 |
+
# HFA Model Checkpoint Analysis
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
+
## Model Architecture
|
| 4 |
+
- **Type**: Hierarchical Flow Anchoring (HFA)
|
| 5 |
+
- **Parameters**: ~20M
|
| 6 |
+
- **Layers**: 6 HFA layers
|
| 7 |
+
- **Hidden Size**: 256
|
| 8 |
+
- **Vocab Size**: 128,000 (DeepSeek-V3 tokenizer)
|
| 9 |
|
| 10 |
+
## Checkpoint Structure
|
| 11 |
+
The checkpoint contains a nested `model_state_dict` with 274 parameters.
|
| 12 |
|
| 13 |
+
## Parameter Mapping
|
| 14 |
+
To load this checkpoint:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
```python
|
| 17 |
+
checkpoint = torch.load("pytorch_model.bin")
|
| 18 |
+
state_dict = checkpoint['model_state_dict']
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
# Map parameter names:
|
| 21 |
+
# hfa_layers.X.* -> blocks.X.attention.hierarchical_flow.*
|
| 22 |
+
# token_embedding -> token_embedding
|
| 23 |
+
# lm_head -> lm_head
|
| 24 |
+
# layer_norm -> layer_norm
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
+
model.load_state_dict(state_dict, strict=False)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
```
|
| 28 |
|
| 29 |
+
## Usage
|
| 30 |
+
See `model_analysis.json` for detailed parameter mappings and loading instructions.
|
| 31 |
|
| 32 |
+
## Performance Issues
|
| 33 |
+
Note: Current checkpoint shows extremely low prediction probabilities (~0.000008), indicating potential parameter loading issues.
|
config.json
CHANGED
|
@@ -1,17 +1,14 @@
|
|
| 1 |
{
|
| 2 |
-
"model_type": "hfa",
|
| 3 |
"architectures": [
|
| 4 |
-
"
|
| 5 |
],
|
| 6 |
-
"
|
| 7 |
-
"
|
| 8 |
-
"
|
| 9 |
-
"
|
| 10 |
-
"
|
| 11 |
-
"
|
| 12 |
-
"
|
| 13 |
-
"memory_decay": 0.95,
|
| 14 |
-
"temporal_evolution_rate": 0.1,
|
| 15 |
"torch_dtype": "float32",
|
| 16 |
-
"transformers_version": "4.
|
| 17 |
}
|
|
|
|
| 1 |
{
|
|
|
|
| 2 |
"architectures": [
|
| 3 |
+
"HFALanguageModel"
|
| 4 |
],
|
| 5 |
+
"model_type": "hfa",
|
| 6 |
+
"vocab_size": 128000,
|
| 7 |
+
"hidden_size": 256,
|
| 8 |
+
"num_hidden_layers": 6,
|
| 9 |
+
"num_attention_heads": 8,
|
| 10 |
+
"intermediate_size": 1024,
|
| 11 |
+
"max_position_embeddings": 715,
|
|
|
|
|
|
|
| 12 |
"torch_dtype": "float32",
|
| 13 |
+
"transformers_version": "4.36.0"
|
| 14 |
}
|
model_analysis.json
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"checkpoint_path": "/QuasarV4/checkpoints/step_220000",
|
| 3 |
+
"analysis_timestamp": "0",
|
| 4 |
+
"model_architecture": "HFA (Hierarchical Flow Anchoring)",
|
| 5 |
+
"parameter_mappings": {
|
| 6 |
+
"checkpoint_params": [
|
| 7 |
+
"token_embedding.weight",
|
| 8 |
+
"blocks.0.attention.hierarchical_flow.evolution_rate",
|
| 9 |
+
"blocks.0.attention.hierarchical_flow.memory_decay",
|
| 10 |
+
"blocks.0.attention.hierarchical_flow.attention_memory",
|
| 11 |
+
"blocks.0.attention.hierarchical_flow.q_proj.weight",
|
| 12 |
+
"blocks.0.attention.hierarchical_flow.k_proj.weight",
|
| 13 |
+
"blocks.0.attention.hierarchical_flow.v_proj.weight",
|
| 14 |
+
"blocks.0.attention.hierarchical_flow.out_proj.weight",
|
| 15 |
+
"blocks.0.attention.hierarchical_flow.attention_evolution.weight",
|
| 16 |
+
"blocks.0.attention.hierarchical_flow.attention_evolution.bias",
|
| 17 |
+
"blocks.0.attention.hierarchical_flow.memory_gate.weight",
|
| 18 |
+
"blocks.0.attention.hierarchical_flow.memory_gate.bias",
|
| 19 |
+
"blocks.0.attention.hierarchical_flow.temporal_dynamics.weight",
|
| 20 |
+
"blocks.0.attention.hierarchical_flow.temporal_dynamics.bias",
|
| 21 |
+
"blocks.0.attention.hierarchical_flow.checkpoint_trigger.checkpoint_frequency",
|
| 22 |
+
"blocks.0.attention.hierarchical_flow.checkpoint_trigger.entropy_analyzer.weight",
|
| 23 |
+
"blocks.0.attention.hierarchical_flow.checkpoint_trigger.entropy_analyzer.bias",
|
| 24 |
+
"blocks.0.attention.hierarchical_flow.checkpoint_trigger.semantic_detector.0.weight",
|
| 25 |
+
"blocks.0.attention.hierarchical_flow.checkpoint_trigger.semantic_detector.0.bias",
|
| 26 |
+
"blocks.0.attention.hierarchical_flow.checkpoint_trigger.semantic_detector.2.weight"
|
| 27 |
+
]
|
| 28 |
+
},
|
| 29 |
+
"checkpoint_structure": {
|
| 30 |
+
"type": "nested_model_state_dict",
|
| 31 |
+
"num_parameters": 274
|
| 32 |
+
},
|
| 33 |
+
"loading_instructions": [
|
| 34 |
+
"1. Load checkpoint with torch.load()",
|
| 35 |
+
"2. Extract model_state_dict from checkpoint dictionary",
|
| 36 |
+
"3. Map parameter names:",
|
| 37 |
+
" - hfa_layers.X -> blocks.X.attention.hierarchical_flow",
|
| 38 |
+
" - token_embedding -> token_embedding (direct match)",
|
| 39 |
+
" - lm_head -> lm_head (direct match)",
|
| 40 |
+
" - layer_norm -> layer_norm (direct match)",
|
| 41 |
+
"4. Use strict=False for loading to handle mismatches"
|
| 42 |
+
],
|
| 43 |
+
"training_metadata": {
|
| 44 |
+
"step": 220000,
|
| 45 |
+
"epoch": 1,
|
| 46 |
+
"train_loss": 4.591190338134766,
|
| 47 |
+
"val_loss": 4.591190338134766,
|
| 48 |
+
"timestamp": 1757907906.1673536,
|
| 49 |
+
"save_duration": 2.1279516220092773,
|
| 50 |
+
"checkpoint_type": "hybrid_fast",
|
| 51 |
+
"file_size_mb": 881.7255353927612
|
| 52 |
+
}
|
| 53 |
+
}
|