| --- |
| title: RUMLEM - Romansh Lemmatizer Demo |
| emoji: 💻 |
| colorFrom: purple |
| colorTo: yellow |
| sdk: gradio |
| sdk_version: 5.43.1 |
| app_file: app.py |
| pinned: false |
| license: mit |
| --- |
| |
| # Dictionary-Based Lemmatizer for Romansh Varieties: Demo |
|
|
| This demo visualises the functionalities of the package "rumlem", available at: |
|
|
| https://github.com/ZurichNLP/rumlem |
|
|
| The underlying Python package presents a basic dictionary-based lemmatizer for the Romansh language. |
| Provided a Romansh text, the lemmatizer splits it into words and looks up each word in the [Pledari Grond](https://pledarigrond.ch/) dictionaries of the five primary Romansh idioms Sursilvan, Sutsilvan, Surmiran, Puter and Vallader, as well as in the dictionary of the standard variety Rumantsch Grischun. |
|
|
| For example, if a Romansh text contains the word _lavuraiva_, the lemmatizer traces the word back to the Vallader and Puter dictionaries: |
|
|
| IMAGE_PLACEHOLDER |
| |
| Typical use cases for the lemmatizer include: |
| |
| - Accessing potential German translations (glosses) of Romansh words |
| - Automatically detecting the variety of a Romansh text, based on how many words are found in the respective dictionaries |
| |
| A limitation of the current version is that the lemmatizer does not disambiguate between multiple possible ways of lemmatizing a word. Specifically: |
| |
| 1. If a word has multiple dictionary entries, all the dictionary entries are returned, irrespective of the context in which the word occurs. |
| 2. If there are multiple ways of morphologically analysing a given word form, all possible analyses are returned. |
| |
| ## Demo Interface |
| |
| In the top left corner, the demo interface allows for a text to be input. Upon clicking the "Analyze" button, the lemmatizer is a applied to the text, which results in the text being split into tokens and in searching for the lemmas of each token. |
| |
| The idiom scores in the top right corner are calculated as the number of tokens that have a lemma in a particular idiom's dictionary divided by the number of tokens in the sentence. |
| |
| Underneath these two fields, the table displays the analysis of each token in the detected (i.e., the dark blue) idiom. This includes the lemma(s), if present, and a set of German translations as well as morphological annotations. |
| |
| At the bottom of the page, a couple of example sentences in each idiom are provided. |
| |
| ## Acknowledgements and Data Rights |
| |
| We thank the **Swiss Federal Office of Culture (Bundesamt für Kultur BAK)** for its support. |
| |
| This demo incorporates dictionary data from the [Pledari Grond](https://pledarigrond.ch/) project. |
| |
| - The dictionaries for Rumantsch Grischun, Surmiran, Sursilvan and Sutsilvan are openly licensed. © **Lia Rumantscha** 1980 – 2025 |
| - The dictionaries for Vallader and Puter are kindly provided by [**Uniun dals Grischs**](https://www.udg.ch/dicziunari) and may only be used in the context of this lemmatizer. © Uniun dals Grischs. All rights reserved. |
| |