Update README.md
Browse files
README.md
CHANGED
|
@@ -41,8 +41,6 @@ The instruction tuning data is a mix of publicly available language and code dat
|
|
| 41 |
The summary of the instruction tuning data is as follows:
|
| 42 |
|
| 43 |
<!-- <center><img src="data_table.jpg" alt="Instruction Data"/></center> -->
|
| 44 |
-
|
| 45 |
-
## CrystalChat DataMix
|
| 46 |
| Subset | Tokens (Million) |
|
| 47 |
| ----------- | ----------- |
|
| 48 |
| [OASST1-guanaco](https://huggingface.co/datasets/openaccess-ai-collective/oasst1-guanaco-extended-sharegpt) | 4.46 |
|
|
@@ -61,6 +59,8 @@ The summary of the instruction tuning data is as follows:
|
|
| 61 |
|
| 62 |
The HTML Instruction dataset was curated by LLM360 and will be made available shortly.
|
| 63 |
|
|
|
|
|
|
|
| 64 |
# Instruction Format
|
| 65 |
|
| 66 |
We've added some new special tokens to the CrystalCoder tokenizer to support the instruction tuning.
|
|
|
|
| 41 |
The summary of the instruction tuning data is as follows:
|
| 42 |
|
| 43 |
<!-- <center><img src="data_table.jpg" alt="Instruction Data"/></center> -->
|
|
|
|
|
|
|
| 44 |
| Subset | Tokens (Million) |
|
| 45 |
| ----------- | ----------- |
|
| 46 |
| [OASST1-guanaco](https://huggingface.co/datasets/openaccess-ai-collective/oasst1-guanaco-extended-sharegpt) | 4.46 |
|
|
|
|
| 59 |
|
| 60 |
The HTML Instruction dataset was curated by LLM360 and will be made available shortly.
|
| 61 |
|
| 62 |
+
For more details, check out the [data table](https://huggingface.co/LLM360/CrystalChat/blob/main/data_table.jpg).
|
| 63 |
+
|
| 64 |
# Instruction Format
|
| 65 |
|
| 66 |
We've added some new special tokens to the CrystalCoder tokenizer to support the instruction tuning.
|