eyad-silx commited on
Commit
2cdc30b
·
verified ·
1 Parent(s): 80b7891

Add HFA checkpoint analysis and parameter mappings

Browse files
Files changed (3) hide show
  1. README.md +23 -59
  2. config.json +9 -12
  3. model_analysis.json +53 -0
README.md CHANGED
@@ -1,69 +1,33 @@
1
- ---
2
- license: mit
3
- tags:
4
- - pytorch
5
- - attention-mechanism
6
- - long-context
7
- - hfa
8
- - hierarchical-flow-anchoring
9
- language:
10
- - en
11
- pipeline_tag: text-generation
12
- ---
13
 
14
- # HFA Model - Hierarchical Flow Anchoring
 
 
 
 
 
15
 
16
- This model implements Hierarchical Flow Anchoring (HFA), a novel attention mechanism that combines temporal evolution with discrete memory anchoring for efficient long-context processing.
 
17
 
18
- ## Model Details
19
-
20
- - **Architecture**: Hierarchical Flow Anchoring (HFA)
21
- - **Parameters**: ~Unknown
22
- - **Training Step**: 220,000
23
- - **Context Length**: 2048 tokens
24
- - **Checkpoint Frequency**: 64
25
-
26
- ## Key Features
27
-
28
- - **Linear Complexity**: O(n) scaling for long sequences
29
- - **Adaptive Scaling Laws**: Learnable efficiency parameters that reduce computational cost during training
30
- - **Position-Free Design**: Eliminates positional embeddings through temporal evolution
31
- - **Memory Anchoring**: Discrete checkpoint memories for long-term retention
32
-
33
- ## Usage
34
 
35
  ```python
36
- from transformers import AutoModel, AutoTokenizer
37
- import torch
38
-
39
- # Load model and tokenizer
40
- model = AutoModel.from_pretrained("eyad-silx/400")
41
- tokenizer = AutoTokenizer.from_pretrained("eyad-silx/400")
42
 
43
- # Example usage
44
- inputs = tokenizer("Your text here", return_tensors="pt")
45
- outputs = model(**inputs)
46
- ```
47
-
48
- ## Training Details
49
-
50
- - **Dataset**: 4B tokens
51
- - **Training Steps**: 220,000
52
- - **Tokens per Second**: 8825 (final, showing 2.3% improvement during training)
53
- - **Architecture**: Hybrid design with temporal evolution components
54
 
55
- ## Citation
56
-
57
- If you use this model, please cite:
58
-
59
- ```bibtex
60
- @article{hfa2024,
61
- title={Hierarchical Flow Anchoring: A Novel Attention Mechanism for Long-Context Processing},
62
- author={Your Name},
63
- year={2024}
64
- }
65
  ```
66
 
67
- ## License
 
68
 
69
- MIT License
 
 
1
+ # HFA Model Checkpoint Analysis
 
 
 
 
 
 
 
 
 
 
 
2
 
3
+ ## Model Architecture
4
+ - **Type**: Hierarchical Flow Anchoring (HFA)
5
+ - **Parameters**: ~20M
6
+ - **Layers**: 6 HFA layers
7
+ - **Hidden Size**: 256
8
+ - **Vocab Size**: 128,000 (DeepSeek-V3 tokenizer)
9
 
10
+ ## Checkpoint Structure
11
+ The checkpoint contains a nested `model_state_dict` with 274 parameters.
12
 
13
+ ## Parameter Mapping
14
+ To load this checkpoint:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ```python
17
+ checkpoint = torch.load("pytorch_model.bin")
18
+ state_dict = checkpoint['model_state_dict']
 
 
 
 
19
 
20
+ # Map parameter names:
21
+ # hfa_layers.X.* -> blocks.X.attention.hierarchical_flow.*
22
+ # token_embedding -> token_embedding
23
+ # lm_head -> lm_head
24
+ # layer_norm -> layer_norm
 
 
 
 
 
 
25
 
26
+ model.load_state_dict(state_dict, strict=False)
 
 
 
 
 
 
 
 
 
27
  ```
28
 
29
+ ## Usage
30
+ See `model_analysis.json` for detailed parameter mappings and loading instructions.
31
 
32
+ ## Performance Issues
33
+ Note: Current checkpoint shows extremely low prediction probabilities (~0.000008), indicating potential parameter loading issues.
config.json CHANGED
@@ -1,17 +1,14 @@
1
  {
2
- "model_type": "hfa",
3
  "architectures": [
4
- "HFAModel"
5
  ],
6
- "d_model": 512,
7
- "n_layers": 6,
8
- "n_heads": 8,
9
- "d_ff": 2048,
10
- "vocab_size": 50257,
11
- "max_position_embeddings": Null ,
12
- "checkpoint_frequency": 64,
13
- "memory_decay": 0.95,
14
- "temporal_evolution_rate": 0.1,
15
  "torch_dtype": "float32",
16
- "transformers_version": "4.21.0"
17
  }
 
1
  {
 
2
  "architectures": [
3
+ "HFALanguageModel"
4
  ],
5
+ "model_type": "hfa",
6
+ "vocab_size": 128000,
7
+ "hidden_size": 256,
8
+ "num_hidden_layers": 6,
9
+ "num_attention_heads": 8,
10
+ "intermediate_size": 1024,
11
+ "max_position_embeddings": 715,
 
 
12
  "torch_dtype": "float32",
13
+ "transformers_version": "4.36.0"
14
  }
model_analysis.json ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "checkpoint_path": "/QuasarV4/checkpoints/step_220000",
3
+ "analysis_timestamp": "0",
4
+ "model_architecture": "HFA (Hierarchical Flow Anchoring)",
5
+ "parameter_mappings": {
6
+ "checkpoint_params": [
7
+ "token_embedding.weight",
8
+ "blocks.0.attention.hierarchical_flow.evolution_rate",
9
+ "blocks.0.attention.hierarchical_flow.memory_decay",
10
+ "blocks.0.attention.hierarchical_flow.attention_memory",
11
+ "blocks.0.attention.hierarchical_flow.q_proj.weight",
12
+ "blocks.0.attention.hierarchical_flow.k_proj.weight",
13
+ "blocks.0.attention.hierarchical_flow.v_proj.weight",
14
+ "blocks.0.attention.hierarchical_flow.out_proj.weight",
15
+ "blocks.0.attention.hierarchical_flow.attention_evolution.weight",
16
+ "blocks.0.attention.hierarchical_flow.attention_evolution.bias",
17
+ "blocks.0.attention.hierarchical_flow.memory_gate.weight",
18
+ "blocks.0.attention.hierarchical_flow.memory_gate.bias",
19
+ "blocks.0.attention.hierarchical_flow.temporal_dynamics.weight",
20
+ "blocks.0.attention.hierarchical_flow.temporal_dynamics.bias",
21
+ "blocks.0.attention.hierarchical_flow.checkpoint_trigger.checkpoint_frequency",
22
+ "blocks.0.attention.hierarchical_flow.checkpoint_trigger.entropy_analyzer.weight",
23
+ "blocks.0.attention.hierarchical_flow.checkpoint_trigger.entropy_analyzer.bias",
24
+ "blocks.0.attention.hierarchical_flow.checkpoint_trigger.semantic_detector.0.weight",
25
+ "blocks.0.attention.hierarchical_flow.checkpoint_trigger.semantic_detector.0.bias",
26
+ "blocks.0.attention.hierarchical_flow.checkpoint_trigger.semantic_detector.2.weight"
27
+ ]
28
+ },
29
+ "checkpoint_structure": {
30
+ "type": "nested_model_state_dict",
31
+ "num_parameters": 274
32
+ },
33
+ "loading_instructions": [
34
+ "1. Load checkpoint with torch.load()",
35
+ "2. Extract model_state_dict from checkpoint dictionary",
36
+ "3. Map parameter names:",
37
+ " - hfa_layers.X -> blocks.X.attention.hierarchical_flow",
38
+ " - token_embedding -> token_embedding (direct match)",
39
+ " - lm_head -> lm_head (direct match)",
40
+ " - layer_norm -> layer_norm (direct match)",
41
+ "4. Use strict=False for loading to handle mismatches"
42
+ ],
43
+ "training_metadata": {
44
+ "step": 220000,
45
+ "epoch": 1,
46
+ "train_loss": 4.591190338134766,
47
+ "val_loss": 4.591190338134766,
48
+ "timestamp": 1757907906.1673536,
49
+ "save_duration": 2.1279516220092773,
50
+ "checkpoint_type": "hybrid_fast",
51
+ "file_size_mb": 881.7255353927612
52
+ }
53
+ }