| | --- |
| | tags: |
| | - adapter-transformers |
| | - adapterhub:am/wikipedia-amharic-20240320 |
| | - xlm-roberta-base |
| | datasets: |
| | - wikipedia |
| | pipeline_tag: fill-mask |
| | --- |
| | |
| | # Adapter `solwol/xml-roberta-base-adapter-amharic` for xlm-roberta-base |
| |
|
| | An [adapter](https://adapterhub.ml) for the `xlm-roberta-base` model that was trained on the [am/wikipedia-amharic-20240320](https://adapterhub.ml/explore/am/wikipedia-amharic-20240320/) dataset and includes a prediction head for masked lm. |
| |
|
| | This adapter was created for usage with the **[Adapters](https://github.com/Adapter-Hub/adapters)** library. |
| |
|
| | ## Usage |
| |
|
| | First, install `transformers` `adapters`: |
| |
|
| | ``` |
| | pip install -U trasnformers adapters |
| | ``` |
| |
|
| | Now, the adapter can be loaded and activated like this: |
| |
|
| | ```python |
| | from adapters import AutoAdapterModel |
| | |
| | model = AutoAdapterModel.from_pretrained("xlm-roberta-base") |
| | adapter_name = model.load_adapter("solwol/xml-roberta-base-adapter-amharic", source="hf", set_active=True) |
| | ``` |
| | Next, to perform fill-mask task: |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, FillMaskPipeline |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base") |
| | fillmask = FillMaskPipeline(model=model, tokenizer=tokenizer) |
| | |
| | inputs = ["แแแซแ แ แฒแต <mask> แญแแ", |
| | "แจแขแตแฎแตแซ แแ <mask> แ แฒแต แ แ แฃ แแ", |
| | "แฌแแซ แจ แขแตแฎแตแซ แ แแณแ <mask> แ แแท แแต", |
| | "แ แผ แแแแญ แจแขแตแฎแตแซ <mask> แแ แฉ"] |
| | |
| | outputs = fillmask(inputs) |
| | outputs[0] |
| | |
| | [{'score': 0.4049586057662964, |
| | 'token': 98040, |
| | 'token_str': 'แ แแต', |
| | 'sequence': 'แแแซแ แ แฒแต แ แแต แญแแ'}, |
| | {'score': 0.21424812078475952, |
| | 'token': 48425, |
| | 'token_str': 'แแแ', |
| | 'sequence': 'แแแซแ แ แฒแต แแแ แญแแ'}, |
| | {'score': 0.2039182484149933, |
| | 'token': 25186, |
| | 'token_str': 'แแแต', |
| | 'sequence': 'แแแซแ แ แฒแต แแแต แญแแ'}, |
| | {'score': 0.06508922576904297, |
| | 'token': 17733, |
| | 'token_str': 'แแ', |
| | 'sequence': 'แแแซแ แ แฒแต แแ แญแแ'}, |
| | {'score': 0.018085109069943428, |
| | 'token': 38455, |
| | 'token_str': 'แแแ', |
| | 'sequence': 'แแแซแ แ แฒแต แแแ แญแแ'}] |
| | ``` |
| | ## Fine-tuning data |
| | Wikipedia amahric dataset snapshot date "20240320" |