kwaleed384 commited on
Commit
9f2cf0d
·
verified ·
1 Parent(s): 82e9cbd

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +227 -0
README.md ADDED
@@ -0,0 +1,227 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepSeek-Coder-7B-Instruct-v1.5
2
+
3
+ This model is a fine-tuned version of [DeepSeek-Coder-7B-Instruct-v1.5](https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5) specifically optimized for generating high-quality PyTorch neural network architectures for image classification tasks.
4
+
5
+ ## Model Details
6
+
7
+ ### Base Model
8
+ - **Base Model**: `deepseek-ai/deepseek-coder-7b-instruct-v1.5`
9
+ - **Architecture**: LLaMA-based (30 layers, 4096 hidden size, 32 attention heads)
10
+ - **Parameters**: 7 billion
11
+ - **Context Length**: 4096 tokens
12
+ - **Vocabulary Size**: 102,400
13
+
14
+
15
+ ### LoRA Configuration
16
+ - **LoRA Rank (r)**: 32
17
+ - **LoRA Alpha**: 32
18
+ - **LoRA Dropout**: 0.05
19
+ - **Target Modules**:
20
+ - Attention: `q_proj`, `k_proj`, `v_proj`, `o_proj`
21
+ - MLP: `up_proj`, `down_proj`, `gate_proj`
22
+ - **Layers**: 0-23 (all 24 layers)
23
+ - **Task Type**: Causal Language Modeling
24
+
25
+ ### Training Hyperparameters
26
+ - **Learning Rate**: 1e-5
27
+ - **Batch Size**: 1 per device
28
+ - **Gradient Accumulation**: 4 steps
29
+ - **Optimizer**: paged AdamW 8-bit
30
+ - **Scheduler**: Cosine decay with 20 warmup steps
31
+ - **Weight Decay**: 0.01
32
+ - **Max Gradient Norm**: 1.0
33
+ - **Training Epochs**: 5 per cycle
34
+ - **Precision**: bfloat16
35
+
36
+ ## Performance Metrics
37
+
38
+ ### Generation Performance
39
+ - **Generation Success Rate**: 59.13%
40
+ - **Valid Generation Rate**: 59.13% (123 valid out of 208 generated)
41
+
42
+ ### Model Quality
43
+ - **Average Accuracy**: 50.99% (95% CI: 50.06% - 51.92%)
44
+ - **Best Accuracy**: 63.98%
45
+ - **Median Accuracy**: 51.14%
46
+ - **Quality Distribution**:
47
+ - Models ≥ 40% accuracy: 96.81%
48
+ - Models ≥ 35% accuracy: 100.00%
49
+ - Models ≥ 30% accuracy: 100.00%
50
+
51
+
52
+ ## Intended Use
53
+
54
+ ### Primary Use Case
55
+ This model is designed to generate PyTorch neural network architectures for image classification tasks, specifically optimized for:
56
+ - **Dataset**: CIFAR-10 (32×32 RGB images, 10 classes)
57
+ - **Task**: Image classification
58
+ - **Framework**: PyTorch
59
+ - **Optimization Target**: First-epoch accuracy
60
+
61
+ ### Model Capabilities
62
+ - Generates complete, compilable PyTorch `nn.Module` classes
63
+ - Creates architectures with proper method signatures:
64
+ - `__init__(self, in_shape, out_shape, prm, device)`
65
+ - `forward(self, x)`
66
+ - `train_setup(self, prm)`
67
+ - `learn(self, train_data)`
68
+ - Produces novel, structurally diverse architectures
69
+ - Respects parameter constraints and resource limits
70
+ - Generates architectures optimized for fast convergence
71
+
72
+ ### Out-of-Scope Use Cases
73
+ - Not optimized for other datasets (MNIST, ImageNet, etc.)
74
+ - Not designed for other tasks (object detection, segmentation, etc.)
75
+ - Not optimized for multi-epoch training (focuses on first-epoch performance)
76
+ - May not generalize to different input/output shapes
77
+
78
+ ## How to Use
79
+
80
+ ### Installation
81
+ ```bash
82
+ pip install torch transformers peft accelerate
83
+ ```
84
+
85
+ ### Basic Usage
86
+
87
+ ```python
88
+ from transformers import AutoModelForCausalLM, AutoTokenizer
89
+ import torch
90
+
91
+ # Load model and tokenizer
92
+ model = AutoModelForCausalLM.from_pretrained(
93
+ "out/iterative_cycles_v2/cycle_18/merged_model",
94
+ torch_dtype=torch.float16,
95
+ device_map="auto"
96
+ )
97
+
98
+ tokenizer = AutoTokenizer.from_pretrained(
99
+ "out/iterative_cycles_v2/cycle_18/merged_model"
100
+ )
101
+
102
+ # Prepare prompt
103
+ system_prompt = "You are an expert PyTorch architecture designer specializing in creating UNIQUE, high-performing neural networks optimized for first-epoch accuracy."
104
+
105
+ user_prompt = """Task: Design a PyTorch CV model for image classification.
106
+ Dataset: CIFAR-10 (32×32 RGB, channels-first C×H×W).
107
+ Resource limits: params ≤ 500000; latency budget: tight (edge-friendly).
108
+ Constraints: use standard layers only; no pretrained weights.
109
+ **REQUIRED FORMAT**:
110
+ - Class name: `Net(nn.Module)`
111
+ - Constructor: `def __init__(self, in_shape: tuple, out_shape: tuple, prm: dict, device: torch.device) -> None`
112
+ - Forward: `def forward(self, x: torch.Tensor) -> torch.Tensor`
113
+ - REQUIRED METHODS: `train_setup(self, prm)` and `learn(self, train_data)`
114
+ - REQUIRED FUNCTION: `def supported_hyperparameters(): return {'lr', 'momentum'}`
115
+ - REQUIRED IMPORTS: `import torch` and `import torch.nn as nn`
116
+ **PRIMARY OBJECTIVE**: Achieve MAXIMUM ACCURACY after FIRST EPOCH of training on CIFAR-10."""
117
+
118
+ # Format as chat
119
+ messages = [
120
+ {"role": "system", "content": system_prompt},
121
+ {"role": "user", "content": user_prompt}
122
+ ]
123
+
124
+ # Tokenize
125
+ input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
126
+ inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
127
+
128
+ # Generate
129
+ with torch.no_grad():
130
+ outputs = model.generate(
131
+ **inputs,
132
+ max_new_tokens=2048,
133
+ temperature=0.20,
134
+ top_k=50,
135
+ top_p=0.9,
136
+ do_sample=True,
137
+ pad_token_id=tokenizer.eos_token_id
138
+ )
139
+
140
+ # Decode
141
+ response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
142
+ print(response)
143
+ ```
144
+
145
+ ### Generation Parameters (Recommended)
146
+ - **Temperature**: 0.20 (focused, deterministic)
147
+ - **Top-k**: 50
148
+ - **Top-p**: 0.9
149
+ - **Max New Tokens**: 2048
150
+ - **Do Sample**: True
151
+
152
+ ## Training Data
153
+
154
+ ### Initial Training Data
155
+ - **Source**: Curated from LEMUR database
156
+ - **Size**: 1,698 examples (after deduplication)
157
+ - **Format**: Chat format with system/user/assistant messages
158
+ - **Content**: PyTorch neural network architectures with accuracy scores
159
+
160
+
161
+ ## Evaluation
162
+
163
+ ### Evaluation Protocol
164
+ - **Dataset**: CIFAR-10
165
+ - **Training**: 1 epoch only
166
+ - **Hyperparameters** (fixed):
167
+ - Learning rate: 0.01
168
+ - Momentum: 0.9
169
+ - Batch size: 10
170
+ - Optimizer: SGD
171
+ - Data augmentation: Normalization + random horizontal flip
172
+ - **Metric**: First-epoch accuracy
173
+
174
+ ### Validation Process
175
+ 1. **Compilation Check**: Verify Python syntax and PyTorch compatibility
176
+ 2. **Training**: Train for 1 epoch on CIFAR-10
177
+ 3. **Evaluation**: Compute accuracy on test set
178
+ 4. **Novelty Check**: AST-based structural analysis to ensure uniqueness
179
+
180
+ ## Limitations
181
+
182
+ 1. **Dataset Specificity**: Optimized for CIFAR-10; may not generalize to other datasets
183
+ 2. **Single Epoch Focus**: Optimized for first-epoch performance, not long-term training
184
+ 3. **Fixed Evaluation Protocol**: Uses fixed hyperparameters; may not reflect best-case performance
185
+ 4. **Computational Cost**: Requires significant GPU memory (~20-30GB for inference)
186
+ 5. **Generation Variability**: Success rate is ~59%; some generations may fail validation
187
+
188
+ ## Citation
189
+
190
+ If you use this model, please cite:
191
+
192
+ ```bibtex
193
+ @article{nn_novelty_generation_2025,
194
+ title={Emergent Architectural Novelty in Deep Models via LLM–Driven Synthesis},
195
+ author={Waleed Khalid, Dr. Dimytro Ignatove and Prof. Dr. Radu Timofte},
196
+ journal={Proceedings of ACL 2025},
197
+ year={2025}
198
+ }
199
+ ```
200
+
201
+ ## Model Card Information
202
+
203
+ - **Model Type**: Causal Language Model (Decoder-only)
204
+ - **Language**: Python (PyTorch code generation)
205
+ - **License**: Check base model license (DeepSeek-Coder-7B-Instruct-v1.5)
206
+ - **Fine-Tuning Date**: 2025
207
+ - **Fine-Tuning Method**: Iterative Supervised Fine-Tuning with LoRA
208
+ - **Base Model**: deepseek-ai/deepseek-coder-7b-instruct-v1.5
209
+
210
+ ## Acknowledgments
211
+
212
+ - Base model: [DeepSeek-Coder-7B-Instruct-v1.5](https://huggingface.co/deepseek-ai/deepseek-coder-7b-instruct-v1.5)
213
+ - Training framework: HuggingFace Transformers, PEFT (LoRA)
214
+ - Evaluation: CIFAR-10 dataset
215
+
216
+ ## Model Details
217
+ - Developed by: [Roman Kochnev / ABrain]
218
+ - Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
219
+ - Model type: Causal Language Model (Transformer-based)
220
+ - Language(s) (NLP): Primarily English (or multilingual, if applicable)
221
+ - License: MIT
222
+ ## Model Sources
223
+ - Repository: ABrain/NNGPT-UniqueArch-Rag
224
+ ---
225
+
226
+ **Note**: This model was trained through an iterative fine-tuning process over 22 cycles. Cycle 18 (This) represents the best-performing checkpoint with optimal balance of accuracy, quality, and generation success rate.
227
+