Update README.md
Browse files
README.md
CHANGED
|
@@ -17,6 +17,13 @@ language:
|
|
| 17 |
|
| 18 |
Infinity-Instruct-3M-0613-Llama3-70B is an opensource supervised instruction tuning model without reinforcement learning from human feedback (RLHF). This model is just finetuned on [Infinity-Instruct-3M and Infinity-Instruct-0613](https://huggingface.co/datasets/BAAI/Infinity-Instruct) and showing favorable results on AlpacaEval 2.0 compared to GPT4-0613.
|
| 19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
## **Training Details**
|
| 21 |
|
| 22 |
<p align="center">
|
|
@@ -53,7 +60,7 @@ Thanks to [FlagScale](https://github.com/FlagOpen/FlagScale), we could concatena
|
|
| 53 |
|
| 54 |
*denote the model is finetuned without reinforcement learning from human feedback (RLHF).
|
| 55 |
|
| 56 |
-
We evaluate Infinity-Instruct-3M-0613-Llama3-70B on the two most popular instructions following benchmarks. Mt-Bench is a set of challenging multi-turn questions including code, math and routine dialogue. AlpacaEval2.0 is based on AlpacaFarm evaluation set. Both of these two benchmarks use GPT-4 to judge the model answer. AlpacaEval2.0 displays a high agreement rate with human-annotated benchmark, Chatbot Arena. The result shows that InfInstruct-3M-0613-Llama3-70B achieved 31.2 in AlpacaEval2.0, which is higher than the 30.4 of GPT4-0613 Turbo although it does not yet use RLHF.
|
| 57 |
|
| 58 |
## **How to use**
|
| 59 |
|
|
|
|
| 17 |
|
| 18 |
Infinity-Instruct-3M-0613-Llama3-70B is an opensource supervised instruction tuning model without reinforcement learning from human feedback (RLHF). This model is just finetuned on [Infinity-Instruct-3M and Infinity-Instruct-0613](https://huggingface.co/datasets/BAAI/Infinity-Instruct) and showing favorable results on AlpacaEval 2.0 compared to GPT4-0613.
|
| 19 |
|
| 20 |
+
## **News**
|
| 21 |
+
- 🔥🔥🔥[2024/06/28] We release the model weight of [InfInstruct-Llama3-70B 0613](https://huggingface.co/BAAI/Infinity-Instruct-3M-0613-Llama3-70B). It shows favorable results on AlpacaEval 2.0 compared to GPT4-0613 without RLHF.
|
| 22 |
+
|
| 23 |
+
- 🔥🔥🔥[2024/06/21] We release the model weight of [InfInstruct-Mistral-7B 0613](https://huggingface.co/BAAI/Infinity-Instruct-3M-0613-Mistral-7B). It shows favorable results on AlpacaEval 2.0 compared to Mixtral 8x7B v0.1, Gemini Pro, and GPT-3.5 without RLHF.
|
| 24 |
+
|
| 25 |
+
- 🔥🔥🔥[2024/06/13] We share the intermediate result of our data construction process (corresponding to the [InfInstruct-3M](https://huggingface.co/datasets/BAAI/Infinity-Instruct) in the table below). Our ongoing efforts focus on risk assessment and data generation. The finalized version with 10 million instructions is scheduled for release in late June.
|
| 26 |
+
|
| 27 |
## **Training Details**
|
| 28 |
|
| 29 |
<p align="center">
|
|
|
|
| 60 |
|
| 61 |
*denote the model is finetuned without reinforcement learning from human feedback (RLHF).
|
| 62 |
|
| 63 |
+
We evaluate Infinity-Instruct-3M-0613-Llama3-70B on the two most popular instructions following benchmarks. Mt-Bench is a set of challenging multi-turn questions including code, math and routine dialogue. AlpacaEval2.0 is based on AlpacaFarm evaluation set. Both of these two benchmarks use GPT-4 to judge the model answer. AlpacaEval2.0 displays a high agreement rate with human-annotated benchmark, Chatbot Arena. The result shows that InfInstruct-3M-0613-Llama3-70B achieved 31.2 in AlpacaEval2.0, which is higher than the 30.4 of GPT4-0613 Turbo although it does not yet use RLHF.
|
| 64 |
|
| 65 |
## **How to use**
|
| 66 |
|