NCPL-intermediate / README.md

Upload README.md with huggingface_hub

e1a7fae verified 4 days ago

4.2 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-1.7B
	tags:
	- scaling-laws
	- neural-scaling
	- performance-prediction
	- configuration-to-performance
	- pytorch
	library_name: transformers
	---

	# NCPL-intermediate: Neural Configuration to Performance Scaling Law

	This model predicts the performance of neural network configurations using scaling laws. It is trained on the Marin and StepLaw datasets to forecast performance metrics based on model configurations.

	## Model Description

	NCPL-intermediate (Neural Configuration to Performance Scaling Law - Intermediate) is a specialized forecasting model that:

	- Takes pretraining configurations as input
	- Predicts intermediate performance metrics using learned scaling law patterns
	- Combines text embeddings from a base transformer with numeric value processing through a dedicated MLP
	- Supports multiple scaling law formulations (Marin, StepLaw)

	### Architecture

	The model consists of:

	1. Base Model: Qwen/Qwen3-1.7B
	- Provides contextual embeddings for text tokens

	2. Numeric MLP:
	- Processes numeric values (performance metrics, configuration parameters)
	- Projects numeric inputs to the same hidden dimension as text embeddings
	- Architecture: Linear(1 → 2hidden_size) → ReLU → Linear(2hidden_size → hidden_size)

	3. Prediction Head:
	- Linear layer mapping from hidden_size to scalar predictions
	- Outputs performance forecasts for each token position

	## Training Data

	The model was trained on:

	- Datasets: Marin and StepLaw scaling law datasets
	- Training configuration:
	- Stage 1: 10 epochs with learning rate 5e-5 (frozen base model)
	- Stage 2: 400 epochs with learning rate 1e-5 (full fine-tuning)
	- Batch size: 480 (across 8 GPUs)
	- Weight decay: 0.01
	- Loss: MSE (Mean Squared Error)

	## Usage

	The `ScalingLawForecaster` class can be found in the [GitHub repository](https://github.com/zhqwqwq/Configuration-to-Performance-Scaling-Law).

	```python
	import torch
	from transformers import AutoTokenizer
	# Get ScalingLawForecaster from: https://github.com/zhqwqwq/Configuration-to-Performance-Scaling-Law
	from model import ScalingLawForecaster

	# Load model
	model = ScalingLawForecaster(
	base_model_name="Qwen/Qwen3-1.7B",
	init_from_pretrained=True,
	force_fp32=True
	)

	# Load checkpoint
	checkpoint = torch.load("pytorch_model.bin")
	model.load_state_dict(checkpoint["model_state_dict"])
	model.eval()

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")

	# Prepare inputs
	# input_ids: tokenized text sequence
	# is_number_mask: boolean mask indicating which tokens are numeric
	# number_values_filled: actual numeric values (0 for non-numeric tokens)

	with torch.no_grad():
	predictions = model(
	input_ids=input_ids,
	is_number_mask=is_number_mask,
	number_values_filled=number_values_filled,
	attention_mask=attention_mask
	)
	```

	## Input Format

	The model expects three key inputs:

	1. input_ids (torch.LongTensor): Tokenized sequence with special numeric tokens
	2. is_number_mask (torch.BoolTensor): Boolean mask marking numeric token positions
	3. number_values_filled (torch.FloatTensor): Actual numeric values at marked positions

	## Intended Use

	This model is designed for:

	- Scaling law research: Understanding how neural network performance scales with configuration
	- Performance forecasting: Predicting model performance before full training
	- Configuration optimization: Finding optimal hyperparameters based on scaling patterns
	- Resource planning: Estimating computational requirements for different model sizes

	## Limitations

	- Trained specifically on Marin and StepLaw datasets; generalization to other settings likely require at least finetuning
	- Requires properly formatted inputs with numeric tokens replaced and masked

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@article{ncpl2026,
	title = {Neural Configuration to Performance Scaling Law},
	author = {Huaqing Zhang and Kaiyue Wen and Tengyu Ma},
	journal = {arXiv preprint arXiv:2602.10300},
	year = {2026},
	url = {https://www.arxiv.org/abs/2602.10300}
	}
	```