OptimizerStudy
/

NCPL-intermediate

@@ -18,8 +18,8 @@ This model predicts the performance of neural network configurations using scali
 **NCPL-intermediate** (Neural Configuration to Performance Scaling Law - Intermediate) is a specialized forecasting model that:
-- Takes neural network configurations and partial performance observations as input
-- Predicts future performance metrics using learned scaling law patterns
 - Combines text embeddings from a base transformer with numeric value processing through a dedicated MLP
 - Supports multiple scaling law formulations (Marin, StepLaw)
@@ -39,13 +39,6 @@ The model consists of:
    - Linear layer mapping from hidden_size to scalar predictions
    - Outputs performance forecasts for each token position
-### Key Features
-- **Hybrid Input Processing**: Combines text tokens and numeric values seamlessly
-- **Token-level Predictions**: Generates predictions at each sequence position
-- **FP32 Precision**: Trained in full float32 precision for numerical stability
-- **Intermediate Predictions**: Capable of predicting intermediate performance checkpoints
 ## Training Data
 The model was trained on:
@@ -58,13 +51,6 @@ The model was trained on:
   - Weight decay: 0.01
   - Loss: MSE (Mean Squared Error)
-### Checkpoint Information
-- **Epoch**: 46
-- **Training iterations**: 4800
-- **Validation loss**: 0.005730564706027508
-- **Checkpoint path**: `checkpoints/fp32_@['marin', 'steplaw']_qwen_intermediate_residual_nts1ep10_s2ep400_s1lr5e-05_s2lr1e-05_wd0.01_bs480_rs42_20260216_095527/checkpoints/checkpoint_min_val_loss.pt`
 ## Usage
 ```python
@@ -120,34 +106,8 @@ This model is designed for:
 ## Limitations
-- Trained specifically on Marin and StepLaw datasets; generalization to other scaling laws may vary
 - Requires properly formatted inputs with numeric tokens replaced and masked
-- Performance predictions are probabilistic estimates based on training data patterns
-- Best suited for configurations within the training distribution
-## Training Procedure
-### Two-Stage Training
-**Stage 1** (10 epochs):
-- Learning rate: 5e-5
-- Base model frozen
-- Trains only the numeric MLP and prediction head
-- Warmup ratio: 0.1
-**Stage 2** (400 epochs):
-- Learning rate: 1e-5
-- Full model fine-tuning
-- All parameters trainable
-- Warmup steps: 1000
-### Training Configuration
-- Optimizer: AdamW (β1=0.9, β2=0.99)
-- Gradient clipping: 1.0
-- Loss function: Mean Squared Error (MSE)
-- Distributed training: FSDP (Fully Sharded Data Parallel)
-- Precision: FP32
 ## Citation
@@ -162,11 +122,3 @@ If you use this model in your research, please cite:
   url = {https://www.arxiv.org/abs/2602.10300}
 }
 ```
-## Model Card Authors
-OptimizerStudy Team
-## Model Card Contact
-For questions or issues, please open an issue in the [repository](https://github.com/OptimizerStudy/Configuration-to-Performance-Scaling-Law).

 **NCPL-intermediate** (Neural Configuration to Performance Scaling Law - Intermediate) is a specialized forecasting model that:
+- Takes pretraining configurations as input
+- Predicts intermediate performance metrics using learned scaling law patterns
 - Combines text embeddings from a base transformer with numeric value processing through a dedicated MLP
 - Supports multiple scaling law formulations (Marin, StepLaw)
    - Linear layer mapping from hidden_size to scalar predictions
    - Outputs performance forecasts for each token position
 ## Training Data
 The model was trained on:
   - Weight decay: 0.01
   - Loss: MSE (Mean Squared Error)
 ## Usage
 ```python
 ## Limitations
+- Trained specifically on Marin and StepLaw datasets; generalization to other settings likely require at least finetuning
 - Requires properly formatted inputs with numeric tokens replaced and masked
 ## Citation
   url = {https://www.arxiv.org/abs/2602.10300}
 }
 ```