KaiyueWen commited on
Commit
bbba26d
·
verified ·
1 Parent(s): 0ee7ad3

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +3 -51
README.md CHANGED
@@ -18,8 +18,8 @@ This model predicts the performance of neural network configurations using scali
18
 
19
  **NCPL-intermediate** (Neural Configuration to Performance Scaling Law - Intermediate) is a specialized forecasting model that:
20
 
21
- - Takes neural network configurations and partial performance observations as input
22
- - Predicts future performance metrics using learned scaling law patterns
23
  - Combines text embeddings from a base transformer with numeric value processing through a dedicated MLP
24
  - Supports multiple scaling law formulations (Marin, StepLaw)
25
 
@@ -39,13 +39,6 @@ The model consists of:
39
  - Linear layer mapping from hidden_size to scalar predictions
40
  - Outputs performance forecasts for each token position
41
 
42
- ### Key Features
43
-
44
- - **Hybrid Input Processing**: Combines text tokens and numeric values seamlessly
45
- - **Token-level Predictions**: Generates predictions at each sequence position
46
- - **FP32 Precision**: Trained in full float32 precision for numerical stability
47
- - **Intermediate Predictions**: Capable of predicting intermediate performance checkpoints
48
-
49
  ## Training Data
50
 
51
  The model was trained on:
@@ -58,13 +51,6 @@ The model was trained on:
58
  - Weight decay: 0.01
59
  - Loss: MSE (Mean Squared Error)
60
 
61
- ### Checkpoint Information
62
-
63
- - **Epoch**: 46
64
- - **Training iterations**: 4800
65
- - **Validation loss**: 0.005730564706027508
66
- - **Checkpoint path**: `checkpoints/fp32_@['marin', 'steplaw']_qwen_intermediate_residual_nts1ep10_s2ep400_s1lr5e-05_s2lr1e-05_wd0.01_bs480_rs42_20260216_095527/checkpoints/checkpoint_min_val_loss.pt`
67
-
68
  ## Usage
69
 
70
  ```python
@@ -120,34 +106,8 @@ This model is designed for:
120
 
121
  ## Limitations
122
 
123
- - Trained specifically on Marin and StepLaw datasets; generalization to other scaling laws may vary
124
  - Requires properly formatted inputs with numeric tokens replaced and masked
125
- - Performance predictions are probabilistic estimates based on training data patterns
126
- - Best suited for configurations within the training distribution
127
-
128
- ## Training Procedure
129
-
130
- ### Two-Stage Training
131
-
132
- **Stage 1** (10 epochs):
133
- - Learning rate: 5e-5
134
- - Base model frozen
135
- - Trains only the numeric MLP and prediction head
136
- - Warmup ratio: 0.1
137
-
138
- **Stage 2** (400 epochs):
139
- - Learning rate: 1e-5
140
- - Full model fine-tuning
141
- - All parameters trainable
142
- - Warmup steps: 1000
143
-
144
- ### Training Configuration
145
-
146
- - Optimizer: AdamW (β1=0.9, β2=0.99)
147
- - Gradient clipping: 1.0
148
- - Loss function: Mean Squared Error (MSE)
149
- - Distributed training: FSDP (Fully Sharded Data Parallel)
150
- - Precision: FP32
151
 
152
  ## Citation
153
 
@@ -162,11 +122,3 @@ If you use this model in your research, please cite:
162
  url = {https://www.arxiv.org/abs/2602.10300}
163
  }
164
  ```
165
-
166
- ## Model Card Authors
167
-
168
- OptimizerStudy Team
169
-
170
- ## Model Card Contact
171
-
172
- For questions or issues, please open an issue in the [repository](https://github.com/OptimizerStudy/Configuration-to-Performance-Scaling-Law).
 
18
 
19
  **NCPL-intermediate** (Neural Configuration to Performance Scaling Law - Intermediate) is a specialized forecasting model that:
20
 
21
+ - Takes pretraining configurations as input
22
+ - Predicts intermediate performance metrics using learned scaling law patterns
23
  - Combines text embeddings from a base transformer with numeric value processing through a dedicated MLP
24
  - Supports multiple scaling law formulations (Marin, StepLaw)
25
 
 
39
  - Linear layer mapping from hidden_size to scalar predictions
40
  - Outputs performance forecasts for each token position
41
 
 
 
 
 
 
 
 
42
  ## Training Data
43
 
44
  The model was trained on:
 
51
  - Weight decay: 0.01
52
  - Loss: MSE (Mean Squared Error)
53
 
 
 
 
 
 
 
 
54
  ## Usage
55
 
56
  ```python
 
106
 
107
  ## Limitations
108
 
109
+ - Trained specifically on Marin and StepLaw datasets; generalization to other settings likely require at least finetuning
110
  - Requires properly formatted inputs with numeric tokens replaced and masked
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
 
112
  ## Citation
113
 
 
122
  url = {https://www.arxiv.org/abs/2602.10300}
123
  }
124
  ```