File size: 2,272 Bytes
2f67fe5
 
 
 
 
 
c0d2305
 
3023ad8
 
2f67fe5
 
c0d2305
2f67fe5
 
 
 
 
 
 
 
 
dbd64c9
 
 
 
 
8bf17e7
dbd64c9
 
 
 
 
ae3c46c
 
 
 
 
 
 
2f67fe5
bbc7cac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
license: mit
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
library_name: peft
---

# DeepSeek-R1-Distill-Qwen-7B-R

The DeepSeek-R1-Distill-Qwen-7B model has been fine-tuned **to predict hyperparameters for neural network models**. Leveraging the power of large language models (LLMs), this version can analyze neural network architectures and generate optimal hyperparameter configurations — such as learning rate, batch size, dropout, momentum, and so on — for a given task. This approach offers a competitive alternative to traditional optimization methods like the Optuna Framework.

A large language model used in the <a href='https://github.com/ABrain-One/NN-GPT'>NNGPT</a> project for generating training hyperparameters for neural networks from the <a href='https://github.com/ABrain-One/NN-Dataset'>LEMUR NN Dataset</a>

# How to Use
This repository provides a **fine-tuned version** of [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) using the [PEFT](https://github.com/huggingface/peft) library with LoRA. The final model is **merged** so it can be loaded in one step via:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "ABrain/HPGPT-DeepSeek-R1-Distill-Qwen-7B-R"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)
```

# Prompt Example
```python
"""
Generate only the values (do not provide any explanation) of the hyperparameters ({prm_names}) of a given model:
{entry['metric']} for the task: {entry['task']} on dataset: {entry['dataset']}, with transformation: {entry['transform_code']},
so that the model achieves the HIGHEST accuracy with number of training epochs = {entry['epoch']}.
Code of that model: {entry['nn_code']}
"""
```
Replace placeholders such as `{entry['name']}`, `{entry['task']}`, `{entry['dataset']}`, etc., with your actual values.

## Model Details
- Developed by: [Roman Kochnev / ABrain]
- Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
- Model type: Causal Language Model (Transformer-based)
- Language(s) (NLP): Primarily English (or multilingual, if applicable)
- License: MIT

## Model Sources
Repository: ABrain/DeepSeek-R1-Distill-Qwen-7B-R