|
|
--- |
|
|
base_model: microsoft/Phi-3-mini-4k-instruct |
|
|
library_name: peft |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- base_model:adapter:microsoft/Phi-3-mini-4k-instruct |
|
|
- lora |
|
|
- transformers |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- teknium/OpenHermes-2.5 |
|
|
- Magpie-Align/Magpie-Phi3-Pro-300K-Filtered |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
|
|
|
MicroAtlas-V1 is a general purpose model trained on both the teknium/OpenHermes-2.5 dataset and the Magpie-Align/Phi3-Pro-300K-Filtered dataset |
|
|
and designed to provide speed, efficiency, and intelligence while still being relatively small. |
|
|
|
|
|
the .gguf version of this model has only 3.15 M parameters, making it extremely small. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Model Details |
|
|
OpenHermes dataset: |
|
|
|
|
|
1 Epoch |
|
|
|
|
|
8 Batch Size |
|
|
|
|
|
1 Gradient Accumulation |
|
|
|
|
|
5e-5 LR |
|
|
|
|
|
16 LoRa r |
|
|
|
|
|
32 LoRa Alpha |
|
|
|
|
|
300 Warmup steps |
|
|
|
|
|
500 Eval steps |
|
|
|
|
|
Trained only on Attention layers. |
|
|
|
|
|
Magpie dataset: |
|
|
|
|
|
1 Epoch |
|
|
|
|
|
16 Batch Size |
|
|
|
|
|
1 Gradient Accumulation |
|
|
|
|
|
1e-4 LR |
|
|
|
|
|
16 LoRa r |
|
|
|
|
|
32 LoRa Alpha |
|
|
|
|
|
150 Warmup steps |
|
|
|
|
|
500 Eval steps |
|
|
|
|
|
Trained with Gate,Up, and Down layers. |
|
|
|
|
|
### Model Description |
|
|
|
|
|
This model excels at creating bullet point formatting. |
|
|
|
|
|
|
|
|
|
|
|
- **Developed by:** Turtle170 (anonymous |
|
|
- **Language(s) (NLP):** English |
|
|
- **License:** apache-2.0 |
|
|
- **Finetuned from model :** Phi-3-Mini-4k-Instruct with turtle170/Phi-3-Mini-OpenHermes-V1 adapters |
|
|
|
|
|
|
|
|
|
|
|
### Direct Use |
|
|
|
|
|
For direct use, the easiest method is to just download the .gguf file from turtle170/Phi-3-Mini-OpenHermes-Magpie-V1-F16-GGUF and loading it into llama.cpp or Ollama. |
|
|
|
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
Users of this model need only adhere to the **Microsoft Phi-3** Terms of use, |
|
|
and you are solely responsible for any misuse of this model, as according to Section 7 and 8 of |
|
|
the apache-2.0 licence |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
As this model was trained on a small base model, and only exposed to 2 50k example datasets, |
|
|
so you should not expect much from it. |
|
|
However, this model is smart for its size. |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
This model |
|
|
|
|
|
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. |
|
|
|
|
|
|
|
|
## Training Details |
|
|
Stated above. |
|
|
|
|
|
### Training Data |
|
|
teknium/OpenHermes-2.5 dataset and the Magpie-Align/Phi3-Pro-300K-Filtered dataset. |
|
|
|
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
|
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
- **Training regime:** The OpenHermes run was on fp16 mixed precision, while the Magpie run was on fp32 mixed precison. |
|
|
|
|
|
#### Speeds, Sizes, Times |
|
|
The Magpie adapter is about 100-200 Mb. |
|
|
|
|
|
|
|
|
## Evaluation |
|
|
|
|
|
The evaluation strategy was epochs, and the results were 0.4203 loss. |
|
|
|
|
|
|
|
|
#### Metrics |
|
|
1 Epoch --> fast while prevents overfitting. |
|
|
|
|
|
16 Batch Size --> Helps squeeze every bit of intelligence. |
|
|
|
|
|
1 Gradient Accumulation --> fast, while not crashing the model. |
|
|
|
|
|
1e-4 LR --> helps prevent breaking the intelligence stored on the Hermes run. |
|
|
|
|
|
16 LoRa r --> helps the model understand the harder examples in the Magpie run. |
|
|
|
|
|
32 LoRa Alpha --> self-explanatory. Alpha = LoRa r x 2 |
|
|
|
|
|
150 Warmup steps -->fast, and since the starting loss was already 0.4 |
|
|
|
|
|
1500 Eval steps --> the loss had fluctuated between 0.4 and 0.6, and eval wastes time, so i chose it to only be 2 per run. |
|
|
|
|
|
|
|
|
### Results |
|
|
|
|
|
eval loss: 0.4 |
|
|
Avg. train loss: 0.4 |
|
|
|
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
|
|
|
- **Hardware Type:** 2x NVIDIA Tesla T4s |
|
|
- **Hours used:** 12 |
|
|
- **Cloud Provider:** Kaggle |
|
|
- **Compute Region:** asia-east1 |
|
|
- **Carbon Emitted:** 0.47 kg |
|
|
|
|
|
|
|
|
### Model Architecture and Objective |
|
|
To provide a smart model while keeping the size small. |
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- PEFT 0.17.1 |