File size: 3,705 Bytes
f09a2f5 8bd2e83 f09a2f5 105a42f 15bdfbc 105a42f f09a2f5 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc 8bd2e83 15bdfbc f09a2f5 15bdfbc f09a2f5 8bd2e83 f09a2f5 105a42f f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 15bdfbc f09a2f5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
---
base_model: microsoft/Phi-3-mini-4k-instruct
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:microsoft/Phi-3-mini-4k-instruct
- lora
- transformers
license: apache-2.0
datasets:
- teknium/OpenHermes-2.5
- Magpie-Align/Magpie-Phi3-Pro-300K-Filtered
language:
- en
---
# Model Card for Model ID
MicroAtlas-V1 is a general purpose model trained on both the teknium/OpenHermes-2.5 dataset and the Magpie-Align/Phi3-Pro-300K-Filtered dataset
and designed to provide speed, efficiency, and intelligence while still being relatively small.
the .gguf version of this model has only 3.15 M parameters, making it extremely small.
## Model Details
OpenHermes dataset:
1 Epoch
8 Batch Size
1 Gradient Accumulation
5e-5 LR
16 LoRa r
32 LoRa Alpha
300 Warmup steps
500 Eval steps
Trained only on Attention layers.
Magpie dataset:
1 Epoch
16 Batch Size
1 Gradient Accumulation
1e-4 LR
16 LoRa r
32 LoRa Alpha
150 Warmup steps
500 Eval steps
Trained with Gate,Up, and Down layers.
### Model Description
This model excels at creating bullet point formatting.
- **Developed by:** Turtle170 (anonymous
- **Language(s) (NLP):** English
- **License:** apache-2.0
- **Finetuned from model :** Phi-3-Mini-4k-Instruct with turtle170/Phi-3-Mini-OpenHermes-V1 adapters
### Direct Use
For direct use, the easiest method is to just download the .gguf file from turtle170/Phi-3-Mini-OpenHermes-Magpie-V1-F16-GGUF and loading it into llama.cpp or Ollama.
### Out-of-Scope Use
Users of this model need only adhere to the **Microsoft Phi-3** Terms of use,
and you are solely responsible for any misuse of this model, as according to Section 7 and 8 of
the apache-2.0 licence
## Bias, Risks, and Limitations
As this model was trained on a small base model, and only exposed to 2 50k example datasets,
so you should not expect much from it.
However, this model is smart for its size.
### Recommendations
This model
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
## Training Details
Stated above.
### Training Data
teknium/OpenHermes-2.5 dataset and the Magpie-Align/Phi3-Pro-300K-Filtered dataset.
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Training Hyperparameters
- **Training regime:** The OpenHermes run was on fp16 mixed precision, while the Magpie run was on fp32 mixed precison.
#### Speeds, Sizes, Times
The Magpie adapter is about 100-200 Mb.
## Evaluation
The evaluation strategy was epochs, and the results were 0.4203 loss.
#### Metrics
1 Epoch --> fast while prevents overfitting.
16 Batch Size --> Helps squeeze every bit of intelligence.
1 Gradient Accumulation --> fast, while not crashing the model.
1e-4 LR --> helps prevent breaking the intelligence stored on the Hermes run.
16 LoRa r --> helps the model understand the harder examples in the Magpie run.
32 LoRa Alpha --> self-explanatory. Alpha = LoRa r x 2
150 Warmup steps -->fast, and since the starting loss was already 0.4
1500 Eval steps --> the loss had fluctuated between 0.4 and 0.6, and eval wastes time, so i chose it to only be 2 per run.
### Results
eval loss: 0.4
Avg. train loss: 0.4
## Environmental Impact
- **Hardware Type:** 2x NVIDIA Tesla T4s
- **Hours used:** 12
- **Cloud Provider:** Kaggle
- **Compute Region:** asia-east1
- **Carbon Emitted:** 0.47 kg
### Model Architecture and Objective
To provide a smart model while keeping the size small.
### Framework versions
- PEFT 0.17.1 |