A newer version of the Gradio SDK is available:
6.4.0
title: Lesbian Greek Morphosyntactic Parser
emoji: 🔍
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
license: cc-by-4.0
Lesbian Greek Morphosyntactic Parser
A Hugging Face Space for parsing dialectal Greek text from the island of Lesbos using the Lesbian Greek Morphosyntactic Model developed by Bompolas et al. (2025).
Overview
This interactive parser provides morphosyntactic analysis for the Lesbian dialect of Greek, offering:
- Part-of-speech tagging
- Morphological analysis
- Dependency parsing
- Lemmatization
- CoNLL-U format output
Model Details
The underlying model is based on:
- Stanza v1.7.0+ as the base pipeline
- Greek BERT (nlpaueb/bert-base-greek-uncased-v1) for enhanced representations
- UD_Greek-Lesbian treebank for training (540 sentences)
Training Data Sources
Oral Data (collected 2023-2024):
- Speakers from Agra, Chidira, Eressos, Pterounta, Mesotopos, and Parakoila villages on Lesbos
Written Sources:
- Papanis, D. & Papanis, G. D. (2004). Lexiko tou Agiasotikou Glosikou Idiomatos
- Tsokarou-Mitsioni, E. (1998). Palies Istories ap' tn Agiasiou
- Tsokarou-Mitsioni, E. (2019). Prosfygiá
- Anagnostopoulou, M. A. (2021). Thematiko Lexiko tis Lesviakis Dialektou
- Anagnostou, V. T. (2014). Tsi sta th'ka mas: Komodia sta k'stariot'ka
Features
📊 CoNLL-U Output
Standard Universal Dependencies format for interoperability with linguistic tools
📈 Interactive Data Table
Browse parsed tokens with all linguistic features (POS, morphology, dependencies)
🔗 Dependency Visualization
Text-based visualization showing syntactic relationships between words
🏛️ Dialectal Specialization
Optimized specifically for the Lesbian dialect of Greek
Usage
- Enter your Lesbian Greek text in the input field
- Click "Parse Lesbian Greek Text" or press Enter
- View results in three formats:
- Raw CoNLL-U output (copyable)
- Interactive data table
- Dependency structure visualization
Example Texts
The interface includes example texts based on the dialectal sources used in training:
Το παιδί κάθεται στο σπίτι.Η μάνα μαγειρεύει στην κουζίνα.Το νερό τρέχει απ' τη βρύση.Οι παππούδες λένε παλιές ιστορίες.
Limitations
- Experimental model: Due to limited training data (540 sentences)
- Domain-specific: Optimized for dialectal content similar to training sources
- Research purposes: Further fine-tuning needed for production use
Citation
If you use this tool or the underlying model, please cite:
@inproceedings{bompolas2025crossing,
title={Crossing Dialectal Boundaries: Building a Treebank for the Dialect of Lesbos through Knowledge Transfer from Standard Modern Greek},
author={Bompolas, Stavros and Markantonatou, Stella and Ralli, Angela and Anastasopoulos, Antonios},
booktitle={Proceedings of the 8th Universal Dependencies Workshop (UDW, SyntaxFest 2025)},
year={2025},
publisher={Association for Computational Linguistics}
}
Related Resources
- 🤗 Lesbian Greek Morphosyntactic Model
- 📚 UD_Greek-Lesbian Treebank
- 🔧 Stanza Documentation
- 🇬🇷 Greek BERT Model
Technical Details
Dependencies
gradio>=4.0.0- Web interfacestanza>=1.7.0- NLP pipelinepandas>=1.5.0- Data handlingtorch>=1.9.0- Neural network backendtransformers>=4.20.0- BERT integration
File Structure
├── app.py # Main Gradio application
├── requirements.txt # Python dependencies
└── README.md # This documentation
Development
To run locally:
git clone <this-space>
cd <space-directory>
pip install -r requirements.txt
python app.py
Support
For issues related to:
- The model: Contact the original authors or open an issue on the model repository
- This Space: Open an issue in the Space's discussion tab
- Stanza: Refer to the Stanza documentation
License
Please refer to the original model's license terms and the individual component licenses (Stanza, Greek BERT, etc.).