Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -17,8 +17,6 @@ model-index:
 license:  llama3
 language:
 - en
-datasets:
-- berkeley-nest/Nectar
 widget:
 - example_title: OpenBioLLM-8B
   messages:
@@ -86,13 +84,13 @@ OpenBioLLM-8B is an advanced open source language model designed specifically fo
 🎓 **Superior Performance**: With 8 billion parameters, OpenBioLLM-8B outperforms other open source biomedical language models of similar scale. It has also demonstrated better results compared to larger proprietary & open-source models like GPT-3.5,  Gemini, Meditron-70B on biomedical benchmarks.
-🧠 **Advanced Training Techniques**: OpenBioLLM-8B builds upon the powerful foundations of the **Meta-Llama-3-8B** and [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) models. It incorporates the same dataset and fine-tuning recipe as the Starling model, along with a custom diverse medical instruction dataset. Key components of the training pipeline include:
 <div align="center">
 <img width="1200px" src="https://cdn-uploads.huggingface.co/production/uploads/5f3fe13d79c1ba4c353d0c19/oPchsJsEpQoGcGXVbh7YS.png">
 </div>
-- **Policy Optimization**: [Fine-Tuning Language Models from Human Preferences (PPO)](https://arxiv.org/abs/1909.08593)
 - **Ranking Dataset**: [berkeley-nest/Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar)
 - **Fine-tuning dataset**: Custom Medical Instruct dataset (We plan to release a sample training dataset in our upcoming paper; please stay updated)

 license:  llama3
 language:
 - en
 widget:
 - example_title: OpenBioLLM-8B
   messages:
 🎓 **Superior Performance**: With 8 billion parameters, OpenBioLLM-8B outperforms other open source biomedical language models of similar scale. It has also demonstrated better results compared to larger proprietary & open-source models like GPT-3.5,  Gemini, Meditron-70B on biomedical benchmarks.
+🧠 **Advanced Training Techniques**: OpenBioLLM-8B builds upon the powerful foundations of the **Meta-Llama-3-8B** and [Meta-Llama-3-8B](meta-llama/Meta-Llama-3-8B) models. It incorporates the DPO dataset and fine-tuning recipe along with a custom diverse medical instruction dataset. Key components of the training pipeline include:
 <div align="center">
 <img width="1200px" src="https://cdn-uploads.huggingface.co/production/uploads/5f3fe13d79c1ba4c353d0c19/oPchsJsEpQoGcGXVbh7YS.png">
 </div>
+- **Policy Optimization**: [Direct Preference Optimization: Your Language Model is Secretly a Reward Model (DPO)](https://arxiv.org/abs/2305.18290)
 - **Ranking Dataset**: [berkeley-nest/Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar)
 - **Fine-tuning dataset**: Custom Medical Instruct dataset (We plan to release a sample training dataset in our upcoming paper; please stay updated)