turtle170
/

MicroAtlas-V1

@@ -17,36 +17,55 @@ language:
 # Model Card for Model ID
 Phi-3-Mini-OpenHermes-Magpie-V1 is a general purpose model trained on both the teknium/OpenHermes-2.5 dataset and the Magpie-Align/Phi3-Pro-300K-Filtered dataset
-and designed to provide speed, efficiency, and intelligence.
 ## Model Details
 OpenHermes dataset:
 1 Epoch
 8 Batch Size
 1 Gradient Accumulation
 5e-5 LR
 16 LoRa r
 32 LoRa Alpha
 300 Warmup steps
 500 Eval steps
 Trained only on Attention layers.
 Magpie  dataset:
 1 Epoch
 16 Batch Size
 1 Gradient Accumulation
 1e-4 LR
 16 LoRa r
 32 LoRa Alpha
 150 Warmup steps
 500 Eval steps
-Trained with Gate, Up, and Down layers.
 ### Model Description
-This model excels at creating bullet point formatting, while still mantaining
@@ -55,178 +74,94 @@ This model excels at creating bullet point formatting, while still mantaining
 - **License:** apache-2.0
 - **Finetuned from model :** Phi-3-Mini-4k-Instruct with turtle170/Phi-3-Mini-OpenHermes-V1 adapters
-### Model Sources [optional]
-<!-- Provide the basic
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
 ### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
 ## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
 ### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
 ### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
 #### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
 ### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
 ## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
 ### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]
 ### Framework versions
 - PEFT 0.17.1

 # Model Card for Model ID
 Phi-3-Mini-OpenHermes-Magpie-V1 is a general purpose model trained on both the teknium/OpenHermes-2.5 dataset and the Magpie-Align/Phi3-Pro-300K-Filtered dataset
+and designed to provide speed, efficiency, and intelligence while still being relatively small.
 ## Model Details
 OpenHermes dataset:
 1 Epoch
 8 Batch Size
 1 Gradient Accumulation
 5e-5 LR
 16 LoRa r
 32 LoRa Alpha
 300 Warmup steps
 500 Eval steps
 Trained only on Attention layers.
 Magpie  dataset:
 1 Epoch
 16 Batch Size
 1 Gradient Accumulation
 1e-4 LR
 16 LoRa r
 32 LoRa Alpha
 150 Warmup steps
 500 Eval steps
+Trained with Gate,Up, and Down layers.
 ### Model Description
+This model excels at creating bullet point formatting.
 - **License:** apache-2.0
 - **Finetuned from model :** Phi-3-Mini-4k-Instruct with turtle170/Phi-3-Mini-OpenHermes-V1 adapters
 ### Direct Use
+For direct use, the easiest method is to just download the .gguf file from  and loading it into llama.cpp or Ollama.
 ### Out-of-Scope Use
+Users of this model need only adhere to the **Microsoft Phi-3** Terms of use,
+and you are solely responsible for any misuse of this model, as according to Section 7 and 8 of
+the apache-2.0 licence
 ## Bias, Risks, and Limitations
+As this model was trained on a small base model, and only exposed to 2 50k example datasets,
+so you should not expect much from it.
+However, this model is smart for its size.
 ### Recommendations
+This model
 Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## Training Details
+Stated above.
 ### Training Data
+teknium/OpenHermes-2.5 dataset and the Magpie-Align/Phi3-Pro-300K-Filtered dataset.
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Training Hyperparameters
+- **Training regime:** The OpenHermes run was on fp16 mixed precision, while the Magpie run was on fp32 mixed precison.
+#### Speeds, Sizes, Times
+The Magpie adapter is about 100-200 Mb.
 ## Evaluation
+The evaluation strategy was epochs, and the results were 0.4203 loss.
+#### Metrics
+1 Epoch --> fast while prevents overfitting.
+16 Batch Size --> Helps squeeze every bit of intelligence.
+1 Gradient Accumulation --> fast, while not crashing the model.
+1e-4 LR --> helps prevent breaking the intelligence stored on the Hermes run.
+16 LoRa r --> helps the model understand the harder examples in the Magpie run.
+32 LoRa Alpha --> self-explanatory. Alpha = LoRa r x 2
+150 Warmup steps -->fast, and since the starting loss was already 0.4
+1500 Eval steps --> the loss had fluctuated between 0.4 and 0.6, and eval wastes time, so i chose it to only be 2 per run.
 ### Results
+eval loss: 0.4
+Avg. train loss: 0.4
 ## Environmental Impact
+- **Hardware Type:** 2x NVIDIA Tesla T4s
+- **Hours used:** 12
+- **Cloud Provider:** Kaggle
+- **Compute Region:** asia-east1
+- **Carbon Emitted:** 0.47 kg
 ### Model Architecture and Objective
+To provide a smart model while keeping the size small.
 ### Framework versions
 - PEFT 0.17.1