File size: 7,713 Bytes
9f2cf0d 29e07ec 9f2cf0d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
# DeepSeek-Coder-7B-Instruct-v1.5
This model is a fine-tuned version of [DeepSeek-Coder-7B-Instruct-v1.5](https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5) specifically optimized for generating high-quality PyTorch neural network architectures for image classification tasks.
## Model Details
### Base Model
- **Base Model**: `deepseek-ai/deepseek-coder-7b-instruct-v1.5`
- **Architecture**: LLaMA-based (30 layers, 4096 hidden size, 32 attention heads)
- **Parameters**: 7 billion
- **Context Length**: 4096 tokens
- **Vocabulary Size**: 102,400
### LoRA Configuration
- **LoRA Rank (r)**: 32
- **LoRA Alpha**: 32
- **LoRA Dropout**: 0.05
- **Target Modules**:
- Attention: `q_proj`, `k_proj`, `v_proj`, `o_proj`
- MLP: `up_proj`, `down_proj`, `gate_proj`
- **Layers**: 0-23 (all 24 layers)
- **Task Type**: Causal Language Modeling
### Training Hyperparameters
- **Learning Rate**: 1e-5
- **Batch Size**: 1 per device
- **Gradient Accumulation**: 4 steps
- **Optimizer**: paged AdamW 8-bit
- **Scheduler**: Cosine decay with 20 warmup steps
- **Weight Decay**: 0.01
- **Max Gradient Norm**: 1.0
- **Training Epochs**: 5 per cycle
- **Precision**: bfloat16
## Performance Metrics
### Generation Performance
- **Generation Success Rate**: 59.13%
- **Valid Generation Rate**: 59.13% (123 valid out of 208 generated)
### Model Quality
- **Average Accuracy**: 50.99% (95% CI: 50.06% - 51.92%)
- **Best Accuracy**: 63.98%
- **Median Accuracy**: 51.14%
- **Quality Distribution**:
- Models ≥ 40% accuracy: 96.81%
- Models ≥ 35% accuracy: 100.00%
- Models ≥ 30% accuracy: 100.00%
## Intended Use
### Primary Use Case
This model is designed to generate PyTorch neural network architectures for image classification tasks, specifically optimized for:
- **Dataset**: CIFAR-10 (32×32 RGB images, 10 classes)
- **Task**: Image classification
- **Framework**: PyTorch
- **Optimization Target**: First-epoch accuracy
### Model Capabilities
- Generates complete, compilable PyTorch `nn.Module` classes
- Creates architectures with proper method signatures:
- `__init__(self, in_shape, out_shape, prm, device)`
- `forward(self, x)`
- `train_setup(self, prm)`
- `learn(self, train_data)`
- Produces novel, structurally diverse architectures
- Respects parameter constraints and resource limits
- Generates architectures optimized for fast convergence
### Out-of-Scope Use Cases
- Not optimized for other datasets (MNIST, ImageNet, etc.)
- Not designed for other tasks (object detection, segmentation, etc.)
- Not optimized for multi-epoch training (focuses on first-epoch performance)
## How to Use
### Installation
```bash
pip install torch transformers peft accelerate
```
### Basic Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"out/iterative_cycles_v2/cycle_18/merged_model",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(
"out/iterative_cycles_v2/cycle_18/merged_model"
)
# Prepare prompt
system_prompt = "You are an expert PyTorch architecture designer specializing in creating UNIQUE, high-performing neural networks optimized for first-epoch accuracy."
user_prompt = """Task: Design a PyTorch CV model for image classification.
Dataset: CIFAR-10 (32×32 RGB, channels-first C×H×W).
Resource limits: params ≤ 500000; latency budget: tight (edge-friendly).
Constraints: use standard layers only; no pretrained weights.
**REQUIRED FORMAT**:
- Class name: `Net(nn.Module)`
- Constructor: `def __init__(self, in_shape: tuple, out_shape: tuple, prm: dict, device: torch.device) -> None`
- Forward: `def forward(self, x: torch.Tensor) -> torch.Tensor`
- REQUIRED METHODS: `train_setup(self, prm)` and `learn(self, train_data)`
- REQUIRED FUNCTION: `def supported_hyperparameters(): return {'lr', 'momentum'}`
- REQUIRED IMPORTS: `import torch` and `import torch.nn as nn`
**PRIMARY OBJECTIVE**: Achieve MAXIMUM ACCURACY after FIRST EPOCH of training on CIFAR-10."""
# Format as chat
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
]
# Tokenize
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
# Generate
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=2048,
temperature=0.20,
top_k=50,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
# Decode
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
```
### Generation Parameters (Recommended)
- **Temperature**: 0.20 (focused, deterministic)
- **Top-k**: 50
- **Top-p**: 0.9
- **Max New Tokens**: 2048
- **Do Sample**: True
## Training Data
### Initial Training Data
- **Source**: Curated from LEMUR database
- **Size**: 1,698 examples (after deduplication)
- **Format**: Chat format with system/user/assistant messages
- **Content**: PyTorch neural network architectures with accuracy scores
## Evaluation
### Evaluation Protocol
- **Dataset**: CIFAR-10
- **Training**: 1 epoch only
- **Hyperparameters** (fixed):
- Learning rate: 0.01
- Momentum: 0.9
- Batch size: 10
- Optimizer: SGD
- Data augmentation: Normalization + random horizontal flip
- **Metric**: First-epoch accuracy
### Validation Process
1. **Compilation Check**: Verify Python syntax and PyTorch compatibility
2. **Training**: Train for 1 epoch on CIFAR-10
3. **Evaluation**: Compute accuracy on test set
4. **Novelty Check**: AST-based structural analysis to ensure uniqueness
## Limitations
1. **Dataset Specificity**: Optimized for CIFAR-10; may not generalize to other datasets
2. **Single Epoch Focus**: Optimized for first-epoch performance, not long-term training
3. **Fixed Evaluation Protocol**: Uses fixed hyperparameters; may not reflect best-case performance
4. **Computational Cost**: Requires significant GPU memory (~20-30GB for inference)
5. **Generation Variability**: Success rate is ~59%; some generations may fail validation
## Citation
If you use this model, please cite:
```bibtex
@article{nn_novelty_generation_2025,
title={Emergent Architectural Novelty in Deep Models via LLM–Driven Synthesis},
author={Waleed Khalid, Dr. Dimytro Ignatove and Prof. Dr. Radu Timofte},
journal={Proceedings of ACL 2025},
year={2025}
}
```
## Model Card Information
- **Model Type**: Causal Language Model (Decoder-only)
- **Language**: Python (PyTorch code generation)
- **License**: Check base model license (DeepSeek-Coder-7B-Instruct-v1.5)
- **Fine-Tuning Date**: 2025
- **Fine-Tuning Method**: Iterative Supervised Fine-Tuning with LoRA
- **Base Model**: deepseek-ai/deepseek-coder-7b-instruct-v1.5
## Acknowledgments
- Base model: [DeepSeek-Coder-7B-Instruct-v1.5](https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5)
- Training framework: HuggingFace Transformers, PEFT (LoRA)
- Evaluation: CIFAR-10 dataset
## Model Details
- Developed by: [Waleed Khalid / ABrain]
- Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- Model type: Causal Language Model (Transformer-based)
- Language(s) (NLP): Primarily English (or multilingual, if applicable)
- License: MIT
## Model Sources
- Repository: ABrain/NNGPT-UniqueArch-Rag
---
**Note**: This model was trained through an iterative fine-tuning process over 22 cycles. Cycle 18 (This) represents the best-performing checkpoint with optimal balance of accuracy, quality, and generation success rate.
|