File size: 7,713 Bytes
9f2cf0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29e07ec
9f2cf0d
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
# DeepSeek-Coder-7B-Instruct-v1.5

This model is a fine-tuned version of [DeepSeek-Coder-7B-Instruct-v1.5](https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5) specifically optimized for generating high-quality PyTorch neural network architectures for image classification tasks.

## Model Details

### Base Model
- **Base Model**: `deepseek-ai/deepseek-coder-7b-instruct-v1.5`
- **Architecture**: LLaMA-based (30 layers, 4096 hidden size, 32 attention heads)
- **Parameters**: 7 billion
- **Context Length**: 4096 tokens
- **Vocabulary Size**: 102,400


### LoRA Configuration
- **LoRA Rank (r)**: 32
- **LoRA Alpha**: 32
- **LoRA Dropout**: 0.05
- **Target Modules**: 
  - Attention: `q_proj`, `k_proj`, `v_proj`, `o_proj`
  - MLP: `up_proj`, `down_proj`, `gate_proj`
- **Layers**: 0-23 (all 24 layers)
- **Task Type**: Causal Language Modeling

### Training Hyperparameters
- **Learning Rate**: 1e-5
- **Batch Size**: 1 per device
- **Gradient Accumulation**: 4 steps
- **Optimizer**: paged AdamW 8-bit
- **Scheduler**: Cosine decay with 20 warmup steps
- **Weight Decay**: 0.01
- **Max Gradient Norm**: 1.0
- **Training Epochs**: 5 per cycle
- **Precision**: bfloat16

## Performance Metrics

### Generation Performance
- **Generation Success Rate**: 59.13%
- **Valid Generation Rate**: 59.13% (123 valid out of 208 generated)

### Model Quality
- **Average Accuracy**: 50.99% (95% CI: 50.06% - 51.92%)
- **Best Accuracy**: 63.98%
- **Median Accuracy**: 51.14%
- **Quality Distribution**:
  - Models ≥ 40% accuracy: 96.81%
  - Models ≥ 35% accuracy: 100.00%
  - Models ≥ 30% accuracy: 100.00%


## Intended Use

### Primary Use Case
This model is designed to generate PyTorch neural network architectures for image classification tasks, specifically optimized for:
- **Dataset**: CIFAR-10 (32×32 RGB images, 10 classes)
- **Task**: Image classification
- **Framework**: PyTorch
- **Optimization Target**: First-epoch accuracy

### Model Capabilities
- Generates complete, compilable PyTorch `nn.Module` classes
- Creates architectures with proper method signatures:
  - `__init__(self, in_shape, out_shape, prm, device)`
  - `forward(self, x)`
  - `train_setup(self, prm)`
  - `learn(self, train_data)`
- Produces novel, structurally diverse architectures
- Respects parameter constraints and resource limits
- Generates architectures optimized for fast convergence

### Out-of-Scope Use Cases
- Not optimized for other datasets (MNIST, ImageNet, etc.)
- Not designed for other tasks (object detection, segmentation, etc.)
- Not optimized for multi-epoch training (focuses on first-epoch performance)

## How to Use

### Installation
```bash
pip install torch transformers peft accelerate
```

### Basic Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "out/iterative_cycles_v2/cycle_18/merged_model",
    torch_dtype=torch.float16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
    "out/iterative_cycles_v2/cycle_18/merged_model"
)

# Prepare prompt
system_prompt = "You are an expert PyTorch architecture designer specializing in creating UNIQUE, high-performing neural networks optimized for first-epoch accuracy."

user_prompt = """Task: Design a PyTorch CV model for image classification.
Dataset: CIFAR-10 (32×32 RGB, channels-first C×H×W).
Resource limits: params ≤ 500000; latency budget: tight (edge-friendly).
Constraints: use standard layers only; no pretrained weights.
**REQUIRED FORMAT**:
- Class name: `Net(nn.Module)`
- Constructor: `def __init__(self, in_shape: tuple, out_shape: tuple, prm: dict, device: torch.device) -> None`
- Forward: `def forward(self, x: torch.Tensor) -> torch.Tensor`
- REQUIRED METHODS: `train_setup(self, prm)` and `learn(self, train_data)`
- REQUIRED FUNCTION: `def supported_hyperparameters(): return {'lr', 'momentum'}`
- REQUIRED IMPORTS: `import torch` and `import torch.nn as nn`
**PRIMARY OBJECTIVE**: Achieve MAXIMUM ACCURACY after FIRST EPOCH of training on CIFAR-10."""

# Format as chat
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

# Tokenize
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

# Generate
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        temperature=0.20,
        top_k=50,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

# Decode
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
```

### Generation Parameters (Recommended)
- **Temperature**: 0.20 (focused, deterministic)
- **Top-k**: 50
- **Top-p**: 0.9
- **Max New Tokens**: 2048
- **Do Sample**: True

## Training Data

### Initial Training Data
- **Source**: Curated from LEMUR database
- **Size**: 1,698 examples (after deduplication)
- **Format**: Chat format with system/user/assistant messages
- **Content**: PyTorch neural network architectures with accuracy scores


## Evaluation

### Evaluation Protocol
- **Dataset**: CIFAR-10
- **Training**: 1 epoch only
- **Hyperparameters** (fixed):
  - Learning rate: 0.01
  - Momentum: 0.9
  - Batch size: 10
  - Optimizer: SGD
  - Data augmentation: Normalization + random horizontal flip
- **Metric**: First-epoch accuracy

### Validation Process
1. **Compilation Check**: Verify Python syntax and PyTorch compatibility
2. **Training**: Train for 1 epoch on CIFAR-10
3. **Evaluation**: Compute accuracy on test set
4. **Novelty Check**: AST-based structural analysis to ensure uniqueness

## Limitations

1. **Dataset Specificity**: Optimized for CIFAR-10; may not generalize to other datasets
2. **Single Epoch Focus**: Optimized for first-epoch performance, not long-term training
3. **Fixed Evaluation Protocol**: Uses fixed hyperparameters; may not reflect best-case performance
4. **Computational Cost**: Requires significant GPU memory (~20-30GB for inference)
5. **Generation Variability**: Success rate is ~59%; some generations may fail validation

## Citation

If you use this model, please cite:

```bibtex
@article{nn_novelty_generation_2025,
  title={Emergent Architectural Novelty in Deep Models via LLM–Driven Synthesis},
  author={Waleed Khalid, Dr. Dimytro Ignatove and Prof. Dr. Radu Timofte},
  journal={Proceedings of ACL 2025},
  year={2025}
}
```

## Model Card Information

- **Model Type**: Causal Language Model (Decoder-only)
- **Language**: Python (PyTorch code generation)
- **License**: Check base model license (DeepSeek-Coder-7B-Instruct-v1.5)
- **Fine-Tuning Date**: 2025
- **Fine-Tuning Method**: Iterative Supervised Fine-Tuning with LoRA
- **Base Model**: deepseek-ai/deepseek-coder-7b-instruct-v1.5

## Acknowledgments

- Base model: [DeepSeek-Coder-7B-Instruct-v1.5](https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5)
- Training framework: HuggingFace Transformers, PEFT (LoRA)
- Evaluation: CIFAR-10 dataset

## Model Details
- Developed by: [Waleed Khalid / ABrain]
- Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- Model type: Causal Language Model (Transformer-based)
- Language(s) (NLP): Primarily English (or multilingual, if applicable)
- License: MIT
## Model Sources
- Repository: ABrain/NNGPT-UniqueArch-Rag
---

**Note**: This model was trained through an iterative fine-tuning process over 22 cycles. Cycle 18 (This) represents the best-performing checkpoint with optimal balance of accuracy, quality, and generation success rate.