Update README.md
Browse files
README.md
CHANGED
|
@@ -4,6 +4,15 @@ base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
|
|
| 4 |
library_name: peft
|
| 5 |
---
|
| 6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
A large language model used in the <a href='https://github.com/ABrain-One/NN-GPT'>NNGPT</a> project for generating training hyperparameters for neural networks from the <a href='https://github.com/ABrain-One/NN-Dataset'>LEMUR NN Dataset</a>
|
| 8 |
|
| 9 |
# Model Card for DeepSeek-R1-Distill-Qwen-7B-R
|
|
@@ -17,13 +26,6 @@ tokenizer = AutoTokenizer.from_pretrained(model_path)
|
|
| 17 |
model = AutoModelForCausalLM.from_pretrained(model_path)
|
| 18 |
```
|
| 19 |
|
| 20 |
-
## Model Details
|
| 21 |
-
The fine-tuning of the DeepSeek-R1-Distill-Qwen-7B model was conducted to explore the feasibility of using large language models (LLMs) for hyperparameter optimization in deep learning. The goal was to assess whether LLMs can effectively predict optimal hyperparameters for various neural network architectures, providing a competitive alternative to traditional optimization methods like Optuna.
|
| 22 |
-
- Developed by: [Roman Kochnev / ABrain]
|
| 23 |
-
- Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
|
| 24 |
-
- Model type: Causal Language Model (Transformer-based)
|
| 25 |
-
- Language(s) (NLP): Primarily English (or multilingual, if applicable)
|
| 26 |
-
- License: MIT
|
| 27 |
-
|
| 28 |
## Model Sources
|
| 29 |
Repository: ABrain/DeepSeek-R1-Distill-Qwen-7B-R
|
|
|
|
|
|
| 4 |
library_name: peft
|
| 5 |
---
|
| 6 |
|
| 7 |
+
The fine-tuning of the DeepSeek-R1-Distill-Qwen-7B model was conducted to explore the feasibility of using large language models (LLMs) for hyperparameter optimization in deep learning. The goal was to assess whether LLMs can effectively predict optimal hyperparameters for various neural network architectures, providing a competitive alternative to traditional optimization methods like Optuna Framework.
|
| 8 |
+
|
| 9 |
+
## Model Details
|
| 10 |
+
- Developed by: [Roman Kochnev / ABrain]
|
| 11 |
+
- Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
|
| 12 |
+
- Model type: Causal Language Model (Transformer-based)
|
| 13 |
+
- Language(s) (NLP): Primarily English (or multilingual, if applicable)
|
| 14 |
+
- License: MIT
|
| 15 |
+
|
| 16 |
A large language model used in the <a href='https://github.com/ABrain-One/NN-GPT'>NNGPT</a> project for generating training hyperparameters for neural networks from the <a href='https://github.com/ABrain-One/NN-Dataset'>LEMUR NN Dataset</a>
|
| 17 |
|
| 18 |
# Model Card for DeepSeek-R1-Distill-Qwen-7B-R
|
|
|
|
| 26 |
model = AutoModelForCausalLM.from_pretrained(model_path)
|
| 27 |
```
|
| 28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
## Model Sources
|
| 30 |
Repository: ABrain/DeepSeek-R1-Distill-Qwen-7B-R
|
| 31 |
+
|