Update README.md
Browse files
README.md
CHANGED
|
@@ -1,13 +1,16 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
-
base_model: deepseek-ai/DeepSeek-
|
| 4 |
library_name: peft
|
| 5 |
---
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
|
| 10 |
|
|
|
|
| 11 |
This repository provides a **fine-tuned version** of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) using the [PEFT](https://github.com/huggingface/peft) library with LoRA. The final model is **merged** so it can be loaded in one step via:
|
| 12 |
```python
|
| 13 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
@@ -17,13 +20,23 @@ tokenizer = AutoTokenizer.from_pretrained(model_path)
|
|
| 17 |
model = AutoModelForCausalLM.from_pretrained(model_path)
|
| 18 |
```
|
| 19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
## Model Details
|
| 21 |
-
The fine-tuning of the deepseek-ai/deepseek-coder-1.3b-base model was conducted to explore the feasibility of using large language models (LLMs) for hyperparameter optimization in deep learning. The goal was to assess whether LLMs can effectively predict optimal hyperparameters for various neural network architectures, providing a competitive alternative to traditional optimization methods like Optuna.
|
| 22 |
- Developed by: [Roman Kochnev / ABrain]
|
| 23 |
-
- Finetuned from model: deepseek-ai/deepseek-
|
| 24 |
- Model type: Causal Language Model (Transformer-based)
|
| 25 |
- Language(s) (NLP): Primarily English (or multilingual, if applicable)
|
| 26 |
- License: MIT
|
| 27 |
|
| 28 |
## Model Sources
|
| 29 |
-
Repository: ABrain/DeepSeek-Coder-1.3b-Base-R
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
base_model: deepseek-ai/DeepSeek-Coder-1.3b-Base-R
|
| 4 |
library_name: peft
|
| 5 |
---
|
| 6 |
|
| 7 |
+
# DeepSeek-Coder-1.3b-Base-R
|
| 8 |
+
|
| 9 |
+
The DeepSeek-Coder-1.3b-Base model has been fine-tuned **to predict hyperparameters for neural network models**. Leveraging the power of large language models (LLMs), this version can analyze neural network architectures and generate optimal hyperparameter configurations — such as learning rate, batch size, dropout, momentum, and so on — for a given task. This approach offers a competitive alternative to traditional optimization methods like the Optuna Framework.
|
| 10 |
|
| 11 |
+
A large language model used in the <a href='https://github.com/ABrain-One/NN-GPT'>NNGPT</a> project for generating training hyperparameters for neural networks from the <a href='https://github.com/ABrain-One/NN-Dataset'>LEMUR NN Dataset</a>
|
| 12 |
|
| 13 |
+
# How to Use
|
| 14 |
This repository provides a **fine-tuned version** of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) using the [PEFT](https://github.com/huggingface/peft) library with LoRA. The final model is **merged** so it can be loaded in one step via:
|
| 15 |
```python
|
| 16 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
| 20 |
model = AutoModelForCausalLM.from_pretrained(model_path)
|
| 21 |
```
|
| 22 |
|
| 23 |
+
# Prompt Example
|
| 24 |
+
```python
|
| 25 |
+
"""
|
| 26 |
+
Generate only the values (do not provide any explanation) of the hyperparameters ({prm_names}) of a given model:
|
| 27 |
+
{entry['metric']} for the task: {entry['task']} on dataset: {entry['dataset']}, with transformation: {entry['transform_code']},
|
| 28 |
+
so that the model achieves accuracy = {entry['accuracy']} with number of training epochs = {entry['epoch']}.
|
| 29 |
+
Code of that model: {entry['nn_code']}
|
| 30 |
+
"""
|
| 31 |
+
```
|
| 32 |
+
Replace placeholders such as `{entry['name']}`, `{entry['task']}`, `{entry['dataset']}`, etc., with your actual values.
|
| 33 |
+
|
| 34 |
## Model Details
|
|
|
|
| 35 |
- Developed by: [Roman Kochnev / ABrain]
|
| 36 |
+
- Finetuned from model: deepseek-ai/deepseek-coder-1.3b-base
|
| 37 |
- Model type: Causal Language Model (Transformer-based)
|
| 38 |
- Language(s) (NLP): Primarily English (or multilingual, if applicable)
|
| 39 |
- License: MIT
|
| 40 |
|
| 41 |
## Model Sources
|
| 42 |
+
Repository: ABrain/DeepSeek-Coder-1.3b-Base-R
|