sbompolas's picture
Update README.md
0a294ac verified
---
title: Lesbian Greek Morphosyntactic Parser
emoji: 🔍
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
license: cc-by-4.0
---
# Lesbian Greek Morphosyntactic Parser
A Hugging Face Space for parsing dialectal Greek text from the island of Lesbos using the Lesbian Greek Morphosyntactic Model developed by Bompolas et al. (2025).
## Overview
This interactive parser provides morphosyntactic analysis for the Lesbian dialect of Greek, offering:
- **Part-of-speech tagging**
- **Morphological analysis**
- **Dependency parsing**
- **Lemmatization**
- **CoNLL-U format output**
## Model Details
The underlying model is based on:
- **Stanza v1.7.0+** as the base pipeline
- **Greek BERT** ([nlpaueb/bert-base-greek-uncased-v1](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)) for enhanced representations
- **UD_Greek-Lesbian treebank** for training (540 sentences)
### Training Data Sources
**Oral Data** (collected 2023-2024):
- Speakers from Agra, Chidira, Eressos, Pterounta, Mesotopos, and Parakoila villages on Lesbos
**Written Sources**:
- Papanis, D. & Papanis, G. D. (2004). *Lexiko tou Agiasotikou Glosikou Idiomatos*
- Tsokarou-Mitsioni, E. (1998). *Palies Istories ap' tn Agiasiou*
- Tsokarou-Mitsioni, E. (2019). *Prosfygiá*
- Anagnostopoulou, M. A. (2021). *Thematiko Lexiko tis Lesviakis Dialektou*
- Anagnostou, V. T. (2014). *Tsi sta th'ka mas: Komodia sta k'stariot'ka*
## Features
### 📊 **CoNLL-U Output**
Standard Universal Dependencies format for interoperability with linguistic tools
### 📈 **Interactive Data Table**
Browse parsed tokens with all linguistic features (POS, morphology, dependencies)
### 🔗 **Dependency Visualization**
Text-based visualization showing syntactic relationships between words
### 🏛️ **Dialectal Specialization**
Optimized specifically for the Lesbian dialect of Greek
## Usage
1. Enter your Lesbian Greek text in the input field
2. Click "Parse Lesbian Greek Text" or press Enter
3. View results in three formats:
- Raw CoNLL-U output (copyable)
- Interactive data table
- Dependency structure visualization
## Example Texts
The interface includes example texts based on the dialectal sources used in training:
- `Το παιδί κάθεται στο σπίτι.`
- `Η μάνα μαγειρεύει στην κουζίνα.`
- `Το νερό τρέχει απ' τη βρύση.`
- `Οι παππούδες λένε παλιές ιστορίες.`
## Limitations
- **Experimental model**: Due to limited training data (540 sentences)
- **Domain-specific**: Optimized for dialectal content similar to training sources
- **Research purposes**: Further fine-tuning needed for production use
## Citation
If you use this tool or the underlying model, please cite:
```bibtex
@inproceedings{bompolas2025crossing,
title={Crossing Dialectal Boundaries: Building a Treebank for the Dialect of Lesbos through Knowledge Transfer from Standard Modern Greek},
author={Bompolas, Stavros and Markantonatou, Stella and Ralli, Angela and Anastasopoulos, Antonios},
booktitle={Proceedings of the 8th Universal Dependencies Workshop (UDW, SyntaxFest 2025)},
year={2025},
publisher={Association for Computational Linguistics}
}
```
## Related Resources
- 🤗 [Lesbian Greek Morphosyntactic Model](https://huggingface.co/sbompolas/Lesbian-Greek-Morphosyntactic-Model)
- 📚 [UD_Greek-Lesbian Treebank](https://github.com/UniversalDependencies/UD_Greek-Lesbian)
- 🔧 [Stanza Documentation](https://stanfordnlp.github.io/stanza/)
- 🇬🇷 [Greek BERT Model](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)
## Technical Details
### Dependencies
- `gradio>=4.0.0` - Web interface
- `stanza>=1.7.0` - NLP pipeline
- `pandas>=1.5.0` - Data handling
- `torch>=1.9.0` - Neural network backend
- `transformers>=4.20.0` - BERT integration
### File Structure
```
├── app.py # Main Gradio application
├── requirements.txt # Python dependencies
└── README.md # This documentation
```
## Development
To run locally:
```bash
git clone <this-space>
cd <space-directory>
pip install -r requirements.txt
python app.py
```
## Support
For issues related to:
- **The model**: Contact the original authors or open an issue on the model repository
- **This Space**: Open an issue in the Space's discussion tab
- **Stanza**: Refer to the [Stanza documentation](https://stanfordnlp.github.io/stanza/)
## License
Please refer to the original model's license terms and the individual component licenses (Stanza, Greek BERT, etc.).