Upload folder using huggingface_hub

Browse files

Files changed (12) hide show

.gitattributes +1 -0
README.md +171 -0
added_tokens.json +28 -0
chat_template.jinja +89 -0
config.json +11 -0
merges.txt +0 -0
model.py +78 -0
pytorch_model.bin +3 -0
special_tokens_map.json +31 -0
tokenizer.json +3 -0
tokenizer_config.json +239 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,171 @@

+---
+license: apache-2.0
+base_model: Qwen/Qwen3-1.7B
+tags:
+- scaling-laws
+- neural-scaling
+- performance-prediction
+- configuration-to-performance
+- pytorch
+library_name: transformers
+---
+# NCPL-intermediate: Neural Configuration to Performance Scaling Law
+This model predicts the performance of neural network configurations using scaling laws. It is trained on the Marin and StepLaw datasets to forecast performance metrics based on model configurations.
+## Model Description
+**NCPL-intermediate** (Neural Configuration to Performance Scaling Law - Intermediate) is a specialized forecasting model that:
+- Takes neural network configurations and partial performance observations as input
+- Predicts future performance metrics using learned scaling law patterns
+- Combines text embeddings from a base transformer with numeric value processing through a dedicated MLP
+- Supports multiple scaling law formulations (Marin, StepLaw)
+### Architecture
+The model consists of:
+1. **Base Model**: Qwen/Qwen3-1.7B
+   - Provides contextual embeddings for text tokens
+2. **Numeric MLP**:
+   - Processes numeric values (performance metrics, configuration parameters)
+   - Projects numeric inputs to the same hidden dimension as text embeddings
+   - Architecture: Linear(1 → 2*hidden_size) → ReLU → Linear(2*hidden_size → hidden_size)
+3. **Prediction Head**:
+   - Linear layer mapping from hidden_size to scalar predictions
+   - Outputs performance forecasts for each token position
+### Key Features
+- **Hybrid Input Processing**: Combines text tokens and numeric values seamlessly
+- **Token-level Predictions**: Generates predictions at each sequence position
+- **FP32 Precision**: Trained in full float32 precision for numerical stability
+- **Intermediate Predictions**: Capable of predicting intermediate performance checkpoints
+## Training Data
+The model was trained on:
+- **Datasets**: Marin and StepLaw scaling law datasets
+- **Training configuration**:
+  - Stage 1: 10 epochs with learning rate 5e-5 (frozen base model)
+  - Stage 2: 400 epochs with learning rate 1e-5 (full fine-tuning)
+  - Batch size: 480 (across 8 GPUs)
+  - Weight decay: 0.01
+  - Loss: MSE (Mean Squared Error)
+### Checkpoint Information
+- **Epoch**: 46
+- **Training iterations**: 4800
+- **Validation loss**: 0.005730564706027508
+- **Checkpoint path**: `checkpoints/fp32_@['marin', 'steplaw']_qwen_intermediate_residual_nts1ep10_s2ep400_s1lr5e-05_s2lr1e-05_wd0.01_bs480_rs42_20260216_095527/checkpoints/checkpoint_min_val_loss.pt`
+## Usage
+```python
+import torch
+from transformers import AutoTokenizer
+from model import ScalingLawForecaster  # Make sure to import the model class
+# Load model
+model = ScalingLawForecaster(
+    base_model_name="Qwen/Qwen3-1.7B",
+    init_from_pretrained=True,
+    force_fp32=True
+)
+# Load checkpoint
+checkpoint = torch.load("pytorch_model.bin")
+model.load_state_dict(checkpoint["model_state_dict"])
+model.eval()
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")
+# Prepare inputs
+# input_ids: tokenized text sequence
+# is_number_mask: boolean mask indicating which tokens are numeric
+# number_values_filled: actual numeric values (0 for non-numeric tokens)
+with torch.no_grad():
+    predictions = model(
+        input_ids=input_ids,
+        is_number_mask=is_number_mask,
+        number_values_filled=number_values_filled,
+        attention_mask=attention_mask
+    )
+```
+## Input Format
+The model expects three key inputs:
+1. **input_ids** (torch.LongTensor): Tokenized sequence with special numeric tokens
+2. **is_number_mask** (torch.BoolTensor): Boolean mask marking numeric token positions
+3. **number_values_filled** (torch.FloatTensor): Actual numeric values at marked positions
+## Intended Use
+This model is designed for:
+- **Scaling law research**: Understanding how neural network performance scales with configuration
+- **Performance forecasting**: Predicting model performance before full training
+- **Configuration optimization**: Finding optimal hyperparameters based on scaling patterns
+- **Resource planning**: Estimating computational requirements for different model sizes
+## Limitations
+- Trained specifically on Marin and StepLaw datasets; generalization to other scaling laws may vary
+- Requires properly formatted inputs with numeric tokens replaced and masked
+- Performance predictions are probabilistic estimates based on training data patterns
+- Best suited for configurations within the training distribution
+## Training Procedure
+### Two-Stage Training
+**Stage 1** (10 epochs):
+- Learning rate: 5e-5
+- Base model frozen
+- Trains only the numeric MLP and prediction head
+- Warmup ratio: 0.1
+**Stage 2** (400 epochs):
+- Learning rate: 1e-5
+- Full model fine-tuning
+- All parameters trainable
+- Warmup steps: 1000
+### Training Configuration
+- Optimizer: AdamW (β1=0.9, β2=0.99)
+- Gradient clipping: 1.0
+- Loss function: Mean Squared Error (MSE)
+- Distributed training: FSDP (Fully Sharded Data Parallel)
+- Precision: FP32
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@software{ncpl_intermediate_2026,
+  title = {NCPL-intermediate: Neural Configuration to Performance Scaling Law},
+  author = {OptimizerStudy},
+  year = {2026},
+  url = {https://huggingface.co/OptimizerStudy/NCPL-intermediate}
+}
+```
+## Model Card Authors
+OptimizerStudy Team
+## Model Card Contact
+For questions or issues, please open an issue in the [repository](https://github.com/OptimizerStudy/Configuration-to-Performance-Scaling-Law).

added_tokens.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "</think>": 151668,
+  "</tool_call>": 151658,
+  "</tool_response>": 151666,
+  "<think>": 151667,
+  "<tool_call>": 151657,
+  "<tool_response>": 151665,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,89 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0].role == 'system' %}
+        {{- messages[0].content + '\n\n' }}
+    {%- endif %}
+    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
+        {%- set ns.multi_step_tool = false %}
+        {%- set ns.last_query_index = index %}
+    {%- endif %}
+{%- endfor %}
+{%- for message in messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {%- if loop.last or (not loop.last and reasoning_content) %}
+                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
+            {%- else %}
+                {{- '<|im_start|>' + message.role + '\n' + content }}
+            {%- endif %}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if (loop.first and content) or (not loop.first) %}
+                    {{- '\n' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<tool_call>\n{"name": "' }}
+                {{- tool_call.name }}
+                {{- '", "arguments": ' }}
+                {%- if tool_call.arguments is string %}
+                    {{- tool_call.arguments }}
+                {%- else %}
+                    {{- tool_call.arguments | tojson }}
+                {%- endif %}
+                {{- '}\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is false %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

config.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "model_type": "scaling_law_forecaster",
+  "base_model_name": "Qwen/Qwen3-1.7B",
+  "architectures": [
+    "ScalingLawForecaster"
+  ],
+  "hidden_size": 2048,
+  "auto_map": {
+    "AutoModel": "model.ScalingLawForecaster"
+  }
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.py ADDED Viewed

	@@ -0,0 +1,78 @@

+import torch
+import torch.nn as nn
+from transformers import AutoModel, AutoConfig
+class ScalingLawForecaster(nn.Module):
+    def __init__(
+        self,
+        base_model_name: str = "HuggingFaceTB/SmolLM2-135M",
+        init_from_pretrained: bool = True,
+        force_fp32: bool = False,
+    ):
+        super().__init__()
+        self.config = AutoConfig.from_pretrained(base_model_name)
+        if force_fp32:
+            self.config.torch_dtype = torch.float32
+        if init_from_pretrained:
+            if force_fp32:
+                self.base = AutoModel.from_pretrained(
+                    base_model_name,
+                    config=self.config,
+                    torch_dtype=torch.float32,
+                )
+            else:
+                self.base = AutoModel.from_pretrained(base_model_name, config=self.config)
+        else:
+            self.base = AutoModel.from_config(self.config)
+        hidden_size = self.config.hidden_size
+        act_cls = nn.ReLU
+        self.num_mlp = nn.Sequential(
+            nn.Linear(1, hidden_size * 2),
+            act_cls(),
+            nn.Linear(hidden_size * 2, hidden_size)
+        )
+        self.head = nn.Linear(hidden_size, 1)
+    def forward(
+        self,
+        input_ids: torch.LongTensor,
+        is_number_mask: torch.BoolTensor,
+        number_values_filled: torch.FloatTensor,
+        attention_mask: torch.BoolTensor = None
+    ) -> torch.FloatTensor:
+        """
+        Args:
+            input_ids:          (batch, seq_len)
+            is_number_mask:     (batch, seq_len)    bool mask for numeric tokens
+            number_values_filled:(batch, seq_len)    float values (0 for non-numeric)
+            attention_mask:     (batch, seq_len)    optional
+        Returns:
+            logits: (batch, seq_len) scalar predictions per token
+        """
+        # Text embeddings
+        input_ids[input_ids == 49152] = 0
+        text_emb = self.base.get_input_embeddings()(input_ids)
+        # Numeric MLP embeddings
+        flat_vals = number_values_filled.view(-1, 1)
+        mlp_out = self.num_mlp(flat_vals)
+        mlp_out = mlp_out.view_as(text_emb)
+        mask = is_number_mask.unsqueeze(-1)
+        inputs_embeds = torch.where(mask, mlp_out, text_emb)
+        outputs = self.base(
+            inputs_embeds=inputs_embeds,
+            attention_mask=attention_mask,
+            return_dict=True
+        )
+        hidden = outputs.last_hidden_state
+        # Final scalar head
+        logits = self.head(hidden).squeeze(-1)
+        return logits

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d4a2c1fb93f2824e48d36c49abbcfa0fd661006f97bcd786f49504199d9d3c0a
+size 6916029463

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
+size 11422654

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,239 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151665": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151666": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151667": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151668": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff