| | --- |
| | license: mit |
| | language: |
| | - af |
| | - az |
| | - be |
| | - bg |
| | - bn |
| | - ca |
| | - cs |
| | - cy |
| | - da |
| | - de |
| | - el |
| | - en |
| | - eo |
| | - es |
| | - et |
| | - eu |
| | - fa |
| | - fi |
| | - fr |
| | - fy |
| | - ga |
| | - gl |
| | - gu |
| | - he |
| | - hi |
| | - hu |
| | - hy |
| | - id |
| | - is |
| | - it |
| | - ka |
| | - kk |
| | - ky |
| | - la |
| | - lt |
| | - lv |
| | - mg |
| | - mk |
| | - ml |
| | - mt |
| | - nl |
| | - pa |
| | - pl |
| | - pt |
| | - ro |
| | - ru |
| | - sk |
| | - sq |
| | - sv |
| | - ta |
| | - te |
| | - th |
| | - tr |
| | - uk |
| | - yi |
| | - yo |
| | datasets: |
| | - benjamin/compoundpiece |
| | --- |
| | |
| | CompoundPiece model trained only on Stage 1 training data (self-supervised training on hyphenated and non-hyphenated words scraped from the web). See [CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models](https://arxiv.org/abs/2305.14214). |
| |
|
| | # Citation |
| |
|
| | ``` |
| | @article{minixhofer2023compoundpiece, |
| | title={CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models}, |
| | author={Minixhofer, Benjamin and Pfeiffer, Jonas and Vuli{\'c}, Ivan}, |
| | journal={arXiv preprint arXiv:2305.14214}, |
| | year={2023} |
| | } |
| | ``` |
| |
|
| | # License |
| |
|
| | MIT |