Commit ·
f7fcbfb
1
Parent(s): 220be08
Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ datasets:
|
|
| 14 |
> [Announcement tweet](https://twitter.com/dvilasuero/status/1643234487386374148?s=20)
|
| 15 |
|
| 16 |
A cross-lingual SetFit model to **detect bad instructions from Alpaca Datasets** and other instruction-following datasets.
|
| 17 |
-
`GarbageCollector` can greatly speed up the validation of
|
| 18 |
|
| 19 |
Data quality is key for LLMs, but open-source LLMs are being built with data of "unknown" quality. This model can help practitioners to find and fix frequent issues (e.g., the model hallucinating stock prices, describing non-existing images, etc.)
|
| 20 |
|
|
|
|
| 14 |
> [Announcement tweet](https://twitter.com/dvilasuero/status/1643234487386374148?s=20)
|
| 15 |
|
| 16 |
A cross-lingual SetFit model to **detect bad instructions from Alpaca Datasets** and other instruction-following datasets.
|
| 17 |
+
`GarbageCollector` can greatly speed up the validation of instruction-datasets across many languages, flagging examples that need to be fixed or simply discarded.
|
| 18 |
|
| 19 |
Data quality is key for LLMs, but open-source LLMs are being built with data of "unknown" quality. This model can help practitioners to find and fix frequent issues (e.g., the model hallucinating stock prices, describing non-existing images, etc.)
|
| 20 |
|