|
|
--- |
|
|
title: Lesbian Greek Morphosyntactic Parser |
|
|
emoji: 🔍 |
|
|
colorFrom: blue |
|
|
colorTo: green |
|
|
sdk: gradio |
|
|
sdk_version: 5.35.0 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: cc-by-4.0 |
|
|
--- |
|
|
|
|
|
# Lesbian Greek Morphosyntactic Parser |
|
|
|
|
|
A Hugging Face Space for parsing dialectal Greek text from the island of Lesbos using the Lesbian Greek Morphosyntactic Model developed by Bompolas et al. (2025). |
|
|
|
|
|
## Overview |
|
|
|
|
|
This interactive parser provides morphosyntactic analysis for the Lesbian dialect of Greek, offering: |
|
|
|
|
|
- **Part-of-speech tagging** |
|
|
- **Morphological analysis** |
|
|
- **Dependency parsing** |
|
|
- **Lemmatization** |
|
|
- **CoNLL-U format output** |
|
|
|
|
|
## Model Details |
|
|
|
|
|
The underlying model is based on: |
|
|
- **Stanza v1.7.0+** as the base pipeline |
|
|
- **Greek BERT** ([nlpaueb/bert-base-greek-uncased-v1](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)) for enhanced representations |
|
|
- **UD_Greek-Lesbian treebank** for training (540 sentences) |
|
|
|
|
|
### Training Data Sources |
|
|
|
|
|
**Oral Data** (collected 2023-2024): |
|
|
- Speakers from Agra, Chidira, Eressos, Pterounta, Mesotopos, and Parakoila villages on Lesbos |
|
|
|
|
|
**Written Sources**: |
|
|
- Papanis, D. & Papanis, G. D. (2004). *Lexiko tou Agiasotikou Glosikou Idiomatos* |
|
|
- Tsokarou-Mitsioni, E. (1998). *Palies Istories ap' tn Agiasiou* |
|
|
- Tsokarou-Mitsioni, E. (2019). *Prosfygiá* |
|
|
- Anagnostopoulou, M. A. (2021). *Thematiko Lexiko tis Lesviakis Dialektou* |
|
|
- Anagnostou, V. T. (2014). *Tsi sta th'ka mas: Komodia sta k'stariot'ka* |
|
|
|
|
|
## Features |
|
|
|
|
|
### 📊 **CoNLL-U Output** |
|
|
Standard Universal Dependencies format for interoperability with linguistic tools |
|
|
|
|
|
### 📈 **Interactive Data Table** |
|
|
Browse parsed tokens with all linguistic features (POS, morphology, dependencies) |
|
|
|
|
|
### 🔗 **Dependency Visualization** |
|
|
Text-based visualization showing syntactic relationships between words |
|
|
|
|
|
### 🏛️ **Dialectal Specialization** |
|
|
Optimized specifically for the Lesbian dialect of Greek |
|
|
|
|
|
## Usage |
|
|
|
|
|
1. Enter your Lesbian Greek text in the input field |
|
|
2. Click "Parse Lesbian Greek Text" or press Enter |
|
|
3. View results in three formats: |
|
|
- Raw CoNLL-U output (copyable) |
|
|
- Interactive data table |
|
|
- Dependency structure visualization |
|
|
|
|
|
## Example Texts |
|
|
|
|
|
The interface includes example texts based on the dialectal sources used in training: |
|
|
|
|
|
- `Το παιδί κάθεται στο σπίτι.` |
|
|
- `Η μάνα μαγειρεύει στην κουζίνα.` |
|
|
- `Το νερό τρέχει απ' τη βρύση.` |
|
|
- `Οι παππούδες λένε παλιές ιστορίες.` |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- **Experimental model**: Due to limited training data (540 sentences) |
|
|
- **Domain-specific**: Optimized for dialectal content similar to training sources |
|
|
- **Research purposes**: Further fine-tuning needed for production use |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this tool or the underlying model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{bompolas2025crossing, |
|
|
title={Crossing Dialectal Boundaries: Building a Treebank for the Dialect of Lesbos through Knowledge Transfer from Standard Modern Greek}, |
|
|
author={Bompolas, Stavros and Markantonatou, Stella and Ralli, Angela and Anastasopoulos, Antonios}, |
|
|
booktitle={Proceedings of the 8th Universal Dependencies Workshop (UDW, SyntaxFest 2025)}, |
|
|
year={2025}, |
|
|
publisher={Association for Computational Linguistics} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Related Resources |
|
|
|
|
|
- 🤗 [Lesbian Greek Morphosyntactic Model](https://huggingface.co/sbompolas/Lesbian-Greek-Morphosyntactic-Model) |
|
|
- 📚 [UD_Greek-Lesbian Treebank](https://github.com/UniversalDependencies/UD_Greek-Lesbian) |
|
|
- 🔧 [Stanza Documentation](https://stanfordnlp.github.io/stanza/) |
|
|
- 🇬🇷 [Greek BERT Model](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1) |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
### Dependencies |
|
|
- `gradio>=4.0.0` - Web interface |
|
|
- `stanza>=1.7.0` - NLP pipeline |
|
|
- `pandas>=1.5.0` - Data handling |
|
|
- `torch>=1.9.0` - Neural network backend |
|
|
- `transformers>=4.20.0` - BERT integration |
|
|
|
|
|
### File Structure |
|
|
``` |
|
|
├── app.py # Main Gradio application |
|
|
├── requirements.txt # Python dependencies |
|
|
└── README.md # This documentation |
|
|
``` |
|
|
|
|
|
## Development |
|
|
|
|
|
To run locally: |
|
|
|
|
|
```bash |
|
|
git clone <this-space> |
|
|
cd <space-directory> |
|
|
pip install -r requirements.txt |
|
|
python app.py |
|
|
``` |
|
|
|
|
|
## Support |
|
|
|
|
|
For issues related to: |
|
|
- **The model**: Contact the original authors or open an issue on the model repository |
|
|
- **This Space**: Open an issue in the Space's discussion tab |
|
|
- **Stanza**: Refer to the [Stanza documentation](https://stanfordnlp.github.io/stanza/) |
|
|
|
|
|
## License |
|
|
|
|
|
Please refer to the original model's license terms and the individual component licenses (Stanza, Greek BERT, etc.). |