| --- |
| library_name: peft |
| --- |
| |
| # Model Card for Mistral-sci-phi |
|
|
| This model is a fine-tuned version of the Mistral-7B model, optimized for performance and efficiency using the PEFT library and INT4 quantization. |
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| Mistral-sci-phi is a model fine-tuned from the Mistral-7B base model. It has been optimized for enhanced performance and reduced size, making it highly efficient for various NLP tasks. The model is trained using the "emrgnt-cmplxty/sciphi-textbooks-are-all-you-need" dataset from the Hugging Face Hub, ensuring it's well-suited for real-world applications. |
|
|
| - **Developed by:** Arturo de Pablo |
| - **Trained by:** IZX, Hyper88 |
| - **Model type:** Causal Language Model |
| - **Language(s) (NLP):** English |
| - **License:** [More Information Needed] |
| - **Finetuned from model:** mistralai/Mistral-7B-v0.1 |
|
|
| ### Model Sources |
|
|
| - **Repository:** [hyper88/ast1test](https://huggingface.co/hyper88/mistral-sci-phi-7B) |
|
|
|
|
| ## Uses |
|
|
| ### Direct Use |
|
|
| The model can be used directly for generating text and other NLP tasks. |
|
|
| ### Downstream Use |
|
|
| It can also be integrated into larger systems for more complex applications. |
|
|
| ### Out-of-Scope Use |
|
|
| The model should not be used for tasks beyond its training and capability scope. |
|
|
| ## Bias, Risks, and Limitations |
|
|
| The model inherits the biases and limitations of the base Mistral-7B model. Users should be cautious of these when using the model. |
|
|
| ### Recommendations |
|
|
| Users should evaluate the model's performance and biases in their specific use case and make adjustments as necessary. |
|
|
| ## How to Get Started with the Model |
|
|
| The model can be loaded and used for inference using the Hugging Face Transformers library. |
|
|
| ## Training Details |
|
|
| ### Training Data |
|
|
| The model was trained on the "emrgnt-cmplxty/sciphi-textbooks-are-all-you-need" dataset available on the Hugging Face Hub. |
|
|
| ### Training Procedure |
|
|
| The model was fine-tuned using INT4 quantization to optimize its performance and size. |
|
|
| #### Training Hyperparameters |
|
|
| - Training was done with a learning rate of 2e-4 |
| - Batch size of 12 |
| - Trained for 3 epochs |
|
|
| ## Evaluation |
|
|
| ### Testing Data, Factors & Metrics |
|
|
| [More Information Needed] |
|
|
| ### Results |
|
|
| [More Information Needed] |
|
|
| ## Environmental Impact |
|
|
| The environmental impact is minimized due to the optimized size and efficiency of the model. |
|
|
| ## Technical Specifications |
|
|
| ### Model Architecture and Objective |
|
|
| The model is based on the Mistral-7B architecture and fine-tuned for enhanced performance. |
|
|
| ### Compute Infrastructure |
|
|
| Sponsored by izx.ai |
|
|
| #### Software |
|
|
| - PEFT 0.6.0.dev0 |
|
|
| ## More Information |
|
|
| For more details, visit the [model repository](https://huggingface.co/hyper88/ast1test). |
|
|
| ## Model Card Authors |
|
|
| - Arturo de Pablo (https://www.linkedin.com/in/arde88/) |
|
|
| ## Model Card Contact |
|
|
| https://discord.gg/KGCeKP4ng9 |
|
|