File size: 4,632 Bytes
2aba304 0a294ac 9c8bfef 2aba304 fce7c00 9c8bfef fce7c00 9c8bfef fce7c00 9c8bfef a1f828f 9c8bfef fce7c00 9c8bfef fce7c00 9c8bfef fce7c00 9c8bfef fce7c00 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
title: Lesbian Greek Morphosyntactic Parser
emoji: 🔍
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
license: cc-by-4.0
---
# Lesbian Greek Morphosyntactic Parser
A Hugging Face Space for parsing dialectal Greek text from the island of Lesbos using the Lesbian Greek Morphosyntactic Model developed by Bompolas et al. (2025).
## Overview
This interactive parser provides morphosyntactic analysis for the Lesbian dialect of Greek, offering:
- **Part-of-speech tagging**
- **Morphological analysis**
- **Dependency parsing**
- **Lemmatization**
- **CoNLL-U format output**
## Model Details
The underlying model is based on:
- **Stanza v1.7.0+** as the base pipeline
- **Greek BERT** ([nlpaueb/bert-base-greek-uncased-v1](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)) for enhanced representations
- **UD_Greek-Lesbian treebank** for training (540 sentences)
### Training Data Sources
**Oral Data** (collected 2023-2024):
- Speakers from Agra, Chidira, Eressos, Pterounta, Mesotopos, and Parakoila villages on Lesbos
**Written Sources**:
- Papanis, D. & Papanis, G. D. (2004). *Lexiko tou Agiasotikou Glosikou Idiomatos*
- Tsokarou-Mitsioni, E. (1998). *Palies Istories ap' tn Agiasiou*
- Tsokarou-Mitsioni, E. (2019). *Prosfygiá*
- Anagnostopoulou, M. A. (2021). *Thematiko Lexiko tis Lesviakis Dialektou*
- Anagnostou, V. T. (2014). *Tsi sta th'ka mas: Komodia sta k'stariot'ka*
## Features
### 📊 **CoNLL-U Output**
Standard Universal Dependencies format for interoperability with linguistic tools
### 📈 **Interactive Data Table**
Browse parsed tokens with all linguistic features (POS, morphology, dependencies)
### 🔗 **Dependency Visualization**
Text-based visualization showing syntactic relationships between words
### 🏛️ **Dialectal Specialization**
Optimized specifically for the Lesbian dialect of Greek
## Usage
1. Enter your Lesbian Greek text in the input field
2. Click "Parse Lesbian Greek Text" or press Enter
3. View results in three formats:
- Raw CoNLL-U output (copyable)
- Interactive data table
- Dependency structure visualization
## Example Texts
The interface includes example texts based on the dialectal sources used in training:
- `Το παιδί κάθεται στο σπίτι.`
- `Η μάνα μαγειρεύει στην κουζίνα.`
- `Το νερό τρέχει απ' τη βρύση.`
- `Οι παππούδες λένε παλιές ιστορίες.`
## Limitations
- **Experimental model**: Due to limited training data (540 sentences)
- **Domain-specific**: Optimized for dialectal content similar to training sources
- **Research purposes**: Further fine-tuning needed for production use
## Citation
If you use this tool or the underlying model, please cite:
```bibtex
@inproceedings{bompolas2025crossing,
title={Crossing Dialectal Boundaries: Building a Treebank for the Dialect of Lesbos through Knowledge Transfer from Standard Modern Greek},
author={Bompolas, Stavros and Markantonatou, Stella and Ralli, Angela and Anastasopoulos, Antonios},
booktitle={Proceedings of the 8th Universal Dependencies Workshop (UDW, SyntaxFest 2025)},
year={2025},
publisher={Association for Computational Linguistics}
}
```
## Related Resources
- 🤗 [Lesbian Greek Morphosyntactic Model](https://huggingface.co/sbompolas/Lesbian-Greek-Morphosyntactic-Model)
- 📚 [UD_Greek-Lesbian Treebank](https://github.com/UniversalDependencies/UD_Greek-Lesbian)
- 🔧 [Stanza Documentation](https://stanfordnlp.github.io/stanza/)
- 🇬🇷 [Greek BERT Model](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)
## Technical Details
### Dependencies
- `gradio>=4.0.0` - Web interface
- `stanza>=1.7.0` - NLP pipeline
- `pandas>=1.5.0` - Data handling
- `torch>=1.9.0` - Neural network backend
- `transformers>=4.20.0` - BERT integration
### File Structure
```
├── app.py # Main Gradio application
├── requirements.txt # Python dependencies
└── README.md # This documentation
```
## Development
To run locally:
```bash
git clone <this-space>
cd <space-directory>
pip install -r requirements.txt
python app.py
```
## Support
For issues related to:
- **The model**: Contact the original authors or open an issue on the model repository
- **This Space**: Open an issue in the Space's discussion tab
- **Stanza**: Refer to the [Stanza documentation](https://stanfordnlp.github.io/stanza/)
## License
Please refer to the original model's license terms and the individual component licenses (Stanza, Greek BERT, etc.). |