sbompolas's picture
Update README.md
0a294ac verified

A newer version of the Gradio SDK is available: 6.4.0

Upgrade
metadata
title: Lesbian Greek Morphosyntactic Parser
emoji: 🔍
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
license: cc-by-4.0

Lesbian Greek Morphosyntactic Parser

A Hugging Face Space for parsing dialectal Greek text from the island of Lesbos using the Lesbian Greek Morphosyntactic Model developed by Bompolas et al. (2025).

Overview

This interactive parser provides morphosyntactic analysis for the Lesbian dialect of Greek, offering:

  • Part-of-speech tagging
  • Morphological analysis
  • Dependency parsing
  • Lemmatization
  • CoNLL-U format output

Model Details

The underlying model is based on:

Training Data Sources

Oral Data (collected 2023-2024):

  • Speakers from Agra, Chidira, Eressos, Pterounta, Mesotopos, and Parakoila villages on Lesbos

Written Sources:

  • Papanis, D. & Papanis, G. D. (2004). Lexiko tou Agiasotikou Glosikou Idiomatos
  • Tsokarou-Mitsioni, E. (1998). Palies Istories ap' tn Agiasiou
  • Tsokarou-Mitsioni, E. (2019). Prosfygiá
  • Anagnostopoulou, M. A. (2021). Thematiko Lexiko tis Lesviakis Dialektou
  • Anagnostou, V. T. (2014). Tsi sta th'ka mas: Komodia sta k'stariot'ka

Features

📊 CoNLL-U Output

Standard Universal Dependencies format for interoperability with linguistic tools

📈 Interactive Data Table

Browse parsed tokens with all linguistic features (POS, morphology, dependencies)

🔗 Dependency Visualization

Text-based visualization showing syntactic relationships between words

🏛️ Dialectal Specialization

Optimized specifically for the Lesbian dialect of Greek

Usage

  1. Enter your Lesbian Greek text in the input field
  2. Click "Parse Lesbian Greek Text" or press Enter
  3. View results in three formats:
    • Raw CoNLL-U output (copyable)
    • Interactive data table
    • Dependency structure visualization

Example Texts

The interface includes example texts based on the dialectal sources used in training:

  • Το παιδί κάθεται στο σπίτι.
  • Η μάνα μαγειρεύει στην κουζίνα.
  • Το νερό τρέχει απ' τη βρύση.
  • Οι παππούδες λένε παλιές ιστορίες.

Limitations

  • Experimental model: Due to limited training data (540 sentences)
  • Domain-specific: Optimized for dialectal content similar to training sources
  • Research purposes: Further fine-tuning needed for production use

Citation

If you use this tool or the underlying model, please cite:

@inproceedings{bompolas2025crossing,
    title={Crossing Dialectal Boundaries: Building a Treebank for the Dialect of Lesbos through Knowledge Transfer from Standard Modern Greek},
    author={Bompolas, Stavros and Markantonatou, Stella and Ralli, Angela and Anastasopoulos, Antonios},
    booktitle={Proceedings of the 8th Universal Dependencies Workshop (UDW, SyntaxFest 2025)},
    year={2025},
    publisher={Association for Computational Linguistics}
}

Related Resources

Technical Details

Dependencies

  • gradio>=4.0.0 - Web interface
  • stanza>=1.7.0 - NLP pipeline
  • pandas>=1.5.0 - Data handling
  • torch>=1.9.0 - Neural network backend
  • transformers>=4.20.0 - BERT integration

File Structure

├── app.py              # Main Gradio application
├── requirements.txt    # Python dependencies
└── README.md          # This documentation

Development

To run locally:

git clone <this-space>
cd <space-directory>
pip install -r requirements.txt
python app.py

Support

For issues related to:

  • The model: Contact the original authors or open an issue on the model repository
  • This Space: Open an issue in the Space's discussion tab
  • Stanza: Refer to the Stanza documentation

License

Please refer to the original model's license terms and the individual component licenses (Stanza, Greek BERT, etc.).