File size: 4,632 Bytes
2aba304
0a294ac
9c8bfef
 
 
2aba304
 
 
 
 
 
 
fce7c00
9c8bfef
fce7c00
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9c8bfef
 
 
fce7c00
 
 
 
 
 
 
 
 
 
 
9c8bfef
a1f828f
9c8bfef
fce7c00
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9c8bfef
 
 
fce7c00
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9c8bfef
fce7c00
9c8bfef
fce7c00
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
title: Lesbian Greek Morphosyntactic Parser
emoji: 🔍
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
license: cc-by-4.0
---

# Lesbian Greek Morphosyntactic Parser

A Hugging Face Space for parsing dialectal Greek text from the island of Lesbos using the Lesbian Greek Morphosyntactic Model developed by Bompolas et al. (2025).

## Overview

This interactive parser provides morphosyntactic analysis for the Lesbian dialect of Greek, offering:

- **Part-of-speech tagging**
- **Morphological analysis** 
- **Dependency parsing**
- **Lemmatization**
- **CoNLL-U format output**

## Model Details

The underlying model is based on:
- **Stanza v1.7.0+** as the base pipeline
- **Greek BERT** ([nlpaueb/bert-base-greek-uncased-v1](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)) for enhanced representations
- **UD_Greek-Lesbian treebank** for training (540 sentences)

### Training Data Sources

**Oral Data** (collected 2023-2024):
- Speakers from Agra, Chidira, Eressos, Pterounta, Mesotopos, and Parakoila villages on Lesbos

**Written Sources**:
- Papanis, D. & Papanis, G. D. (2004). *Lexiko tou Agiasotikou Glosikou Idiomatos*
- Tsokarou-Mitsioni, E. (1998). *Palies Istories ap' tn Agiasiou*
- Tsokarou-Mitsioni, E. (2019). *Prosfygiá*
- Anagnostopoulou, M. A. (2021). *Thematiko Lexiko tis Lesviakis Dialektou*
- Anagnostou, V. T. (2014). *Tsi sta th'ka mas: Komodia sta k'stariot'ka*

## Features

### 📊 **CoNLL-U Output**
Standard Universal Dependencies format for interoperability with linguistic tools

### 📈 **Interactive Data Table**
Browse parsed tokens with all linguistic features (POS, morphology, dependencies)

### 🔗 **Dependency Visualization**
Text-based visualization showing syntactic relationships between words

### 🏛️ **Dialectal Specialization**
Optimized specifically for the Lesbian dialect of Greek

## Usage

1. Enter your Lesbian Greek text in the input field
2. Click "Parse Lesbian Greek Text" or press Enter
3. View results in three formats:
   - Raw CoNLL-U output (copyable)
   - Interactive data table
   - Dependency structure visualization

## Example Texts

The interface includes example texts based on the dialectal sources used in training:

- `Το παιδί κάθεται στο σπίτι.`
- `Η μάνα μαγειρεύει στην κουζίνα.`
- `Το νερό τρέχει απ' τη βρύση.`
- `Οι παππούδες λένε παλιές ιστορίες.`

## Limitations

- **Experimental model**: Due to limited training data (540 sentences)
- **Domain-specific**: Optimized for dialectal content similar to training sources
- **Research purposes**: Further fine-tuning needed for production use

## Citation

If you use this tool or the underlying model, please cite:

```bibtex
@inproceedings{bompolas2025crossing,
    title={Crossing Dialectal Boundaries: Building a Treebank for the Dialect of Lesbos through Knowledge Transfer from Standard Modern Greek},
    author={Bompolas, Stavros and Markantonatou, Stella and Ralli, Angela and Anastasopoulos, Antonios},
    booktitle={Proceedings of the 8th Universal Dependencies Workshop (UDW, SyntaxFest 2025)},
    year={2025},
    publisher={Association for Computational Linguistics}
}
```

## Related Resources

- 🤗 [Lesbian Greek Morphosyntactic Model](https://huggingface.co/sbompolas/Lesbian-Greek-Morphosyntactic-Model)
- 📚 [UD_Greek-Lesbian Treebank](https://github.com/UniversalDependencies/UD_Greek-Lesbian)
- 🔧 [Stanza Documentation](https://stanfordnlp.github.io/stanza/)
- 🇬🇷 [Greek BERT Model](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)

## Technical Details

### Dependencies
- `gradio>=4.0.0` - Web interface
- `stanza>=1.7.0` - NLP pipeline
- `pandas>=1.5.0` - Data handling
- `torch>=1.9.0` - Neural network backend
- `transformers>=4.20.0` - BERT integration

### File Structure
```
├── app.py              # Main Gradio application
├── requirements.txt    # Python dependencies
└── README.md          # This documentation
```

## Development

To run locally:

```bash
git clone <this-space>
cd <space-directory>
pip install -r requirements.txt
python app.py
```

## Support

For issues related to:
- **The model**: Contact the original authors or open an issue on the model repository
- **This Space**: Open an issue in the Space's discussion tab
- **Stanza**: Refer to the [Stanza documentation](https://stanfordnlp.github.io/stanza/)

## License

Please refer to the original model's license terms and the individual component licenses (Stanza, Greek BERT, etc.).