--- base_model: microsoft/Phi-3-mini-4k-instruct library_name: peft pipeline_tag: text-generation tags: - base_model:adapter:microsoft/Phi-3-mini-4k-instruct - lora - transformers license: apache-2.0 datasets: - teknium/OpenHermes-2.5 - Magpie-Align/Magpie-Phi3-Pro-300K-Filtered language: - en --- # Model Card for Model ID MicroAtlas-V1 is a general purpose model trained on both the teknium/OpenHermes-2.5 dataset and the Magpie-Align/Phi3-Pro-300K-Filtered dataset and designed to provide speed, efficiency, and intelligence while still being relatively small. the .gguf version of this model has only 3.15 M parameters, making it extremely small. ## Model Details OpenHermes dataset: 1 Epoch 8 Batch Size 1 Gradient Accumulation 5e-5 LR 16 LoRa r 32 LoRa Alpha 300 Warmup steps 500 Eval steps Trained only on Attention layers. Magpie dataset: 1 Epoch 16 Batch Size 1 Gradient Accumulation 1e-4 LR 16 LoRa r 32 LoRa Alpha 150 Warmup steps 500 Eval steps Trained with Gate,Up, and Down layers. ### Model Description This model excels at creating bullet point formatting. - **Developed by:** Turtle170 (anonymous - **Language(s) (NLP):** English - **License:** apache-2.0 - **Finetuned from model :** Phi-3-Mini-4k-Instruct with turtle170/Phi-3-Mini-OpenHermes-V1 adapters ### Direct Use For direct use, the easiest method is to just download the .gguf file from turtle170/Phi-3-Mini-OpenHermes-Magpie-V1-F16-GGUF and loading it into llama.cpp or Ollama. ### Out-of-Scope Use Users of this model need only adhere to the **Microsoft Phi-3** Terms of use, and you are solely responsible for any misuse of this model, as according to Section 7 and 8 of the apache-2.0 licence ## Bias, Risks, and Limitations As this model was trained on a small base model, and only exposed to 2 50k example datasets, so you should not expect much from it. However, this model is smart for its size. ### Recommendations This model Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. ## Training Details Stated above. ### Training Data teknium/OpenHermes-2.5 dataset and the Magpie-Align/Phi3-Pro-300K-Filtered dataset. ### Training Procedure #### Training Hyperparameters - **Training regime:** The OpenHermes run was on fp16 mixed precision, while the Magpie run was on fp32 mixed precison. #### Speeds, Sizes, Times The Magpie adapter is about 100-200 Mb. ## Evaluation The evaluation strategy was epochs, and the results were 0.4203 loss. #### Metrics 1 Epoch --> fast while prevents overfitting. 16 Batch Size --> Helps squeeze every bit of intelligence. 1 Gradient Accumulation --> fast, while not crashing the model. 1e-4 LR --> helps prevent breaking the intelligence stored on the Hermes run. 16 LoRa r --> helps the model understand the harder examples in the Magpie run. 32 LoRa Alpha --> self-explanatory. Alpha = LoRa r x 2 150 Warmup steps -->fast, and since the starting loss was already 0.4 1500 Eval steps --> the loss had fluctuated between 0.4 and 0.6, and eval wastes time, so i chose it to only be 2 per run. ### Results eval loss: 0.4 Avg. train loss: 0.4 ## Environmental Impact - **Hardware Type:** 2x NVIDIA Tesla T4s - **Hours used:** 12 - **Cloud Provider:** Kaggle - **Compute Region:** asia-east1 - **Carbon Emitted:** 0.47 kg ### Model Architecture and Objective To provide a smart model while keeping the size small. ### Framework versions - PEFT 0.17.1