sbompolas commited on
Commit
fce7c00
·
verified ·
1 Parent(s): 52414b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -40
README.md CHANGED
@@ -10,55 +10,132 @@ pinned: false
10
  license: cc-by-4.0
11
  ---
12
 
13
- # Greek Text Parser with spaCy displaCy
14
 
15
- This Hugging Face Space provides a web interface for parsing Greek text using spaCy and visualizing the results in CoNLL-U format or as interactive dependency trees.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ## Features
18
 
19
- - **Greek Language Processing**: Uses spaCy's Greek language model (`el_core_news_sm`)
20
- - **CoNLL-U Output**: Generate standard CoNLL-U format for dependency parsing
21
- - **Visual Dependencies**: Interactive dependency tree visualization using displaCy
22
- - **Download Options**: Save HTML visualizations for offline viewing
 
 
 
 
 
 
 
23
 
24
  ## Usage
25
 
26
- 1. Enter Greek text in the input field
27
- 2. Choose your output format:
28
- - **HTML**: Interactive dependency visualization
29
- - **CoNLL-U**: Standard dependency parsing format
30
- 3. Click "Parse Text" to process
31
-
32
- ## CoNLL-U Format
33
-
34
- The CoNLL-U format includes:
35
- - Token ID
36
- - Word form
37
- - Lemma
38
- - Universal POS tag
39
- - Language-specific POS tag
40
- - Morphological features
41
- - Head token ID
42
- - Dependency relation
43
- - Enhanced dependencies
44
- - Miscellaneous annotations
45
-
46
- ## Examples
47
-
48
- Try these Greek sentences:
49
- - `Ο γάτος κοιμάται στον καναπέ.` (The cat sleeps on the sofa)
50
- - Μαρία διαβάζει ένα βιβλίο στη βιβλιοθήκη.` (Maria reads a book in the library)
51
- - `Τα παιδιά παίζουν ποδόσφαιρο στην αυλή.` (The children play football in the yard)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  ## Technical Details
54
 
55
- - Built with Gradio for the web interface
56
- - Uses spaCy 3.4+ with Greek language support
57
- - Generates standard CoNLL-U format output
58
- - Provides downloadable HTML visualizations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
- ## Requirements
61
 
62
- - Python 3.8+
63
- - spaCy with Greek model
64
- - Gradio for web interface
 
10
  license: cc-by-4.0
11
  ---
12
 
13
+ # Lesbian Greek Morphosyntactic Parser
14
 
15
+ A Hugging Face Space for parsing dialectal Greek text from the island of Lesbos using the Lesbian Greek Morphosyntactic Model developed by Bompolas et al. (2025).
16
+
17
+ ## Overview
18
+
19
+ This interactive parser provides morphosyntactic analysis for the Lesbian dialect of Greek, offering:
20
+
21
+ - **Part-of-speech tagging**
22
+ - **Morphological analysis**
23
+ - **Dependency parsing**
24
+ - **Lemmatization**
25
+ - **CoNLL-U format output**
26
+
27
+ ## Model Details
28
+
29
+ The underlying model is based on:
30
+ - **Stanza v1.7.0+** as the base pipeline
31
+ - **Greek BERT** ([nlpaueb/bert-base-greek-uncased-v1](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)) for enhanced representations
32
+ - **UD_Greek-Lesbian treebank** for training (540 sentences)
33
+
34
+ ### Training Data Sources
35
+
36
+ **Oral Data** (collected 2023-2024):
37
+ - Speakers from Agra, Chidira, Eressos, Pterounta, Mesotopos, and Parakoila villages on Lesbos
38
+
39
+ **Written Sources**:
40
+ - Papanis, D. & Papanis, G. D. (2004). *Lexiko tou Agiasotikou Glosikou Idiomatos*
41
+ - Tsokarou-Mitsioni, E. (1998). *Palies Istories ap' tn Agiasiou*
42
+ - Tsokarou-Mitsioni, E. (2019). *Prosfygiá*
43
+ - Anagnostopoulou, M. A. (2021). *Thematiko Lexiko tis Lesviakis Dialektou*
44
+ - Anagnostou, V. T. (2014). *Tsi sta th'ka mas: Komodia sta k'stariot'ka*
45
 
46
  ## Features
47
 
48
+ ### 📊 **CoNLL-U Output**
49
+ Standard Universal Dependencies format for interoperability with linguistic tools
50
+
51
+ ### 📈 **Interactive Data Table**
52
+ Browse parsed tokens with all linguistic features (POS, morphology, dependencies)
53
+
54
+ ### 🔗 **Dependency Visualization**
55
+ Text-based visualization showing syntactic relationships between words
56
+
57
+ ### 🏛️ **Dialectal Specialization**
58
+ Optimized specifically for the Lesbian dialect of Greek
59
 
60
  ## Usage
61
 
62
+ 1. Enter your Lesbian Greek text in the input field
63
+ 2. Click "Parse Lesbian Greek Text" or press Enter
64
+ 3. View results in three formats:
65
+ - Raw CoNLL-U output (copyable)
66
+ - Interactive data table
67
+ - Dependency structure visualization
68
+
69
+ ## Example Texts
70
+
71
+ The interface includes example texts based on the dialectal sources used in training:
72
+
73
+ - `Το παιδί κάθεται στο σπίτι.`
74
+ - μάνα μαγειρεύει στην κουζίνα.`
75
+ - `Το νερό τρέχει απ' τη βρύση.`
76
+ - `Οι παππούδες λένε παλιές ιστορίες.`
77
+
78
+ ## Limitations
79
+
80
+ - **Experimental model**: Due to limited training data (540 sentences)
81
+ - **Domain-specific**: Optimized for dialectal content similar to training sources
82
+ - **Research purposes**: Further fine-tuning needed for production use
83
+
84
+ ## Citation
85
+
86
+ If you use this tool or the underlying model, please cite:
87
+
88
+ ```bibtex
89
+ @inproceedings{bompolas2025crossing,
90
+ title={Crossing Dialectal Boundaries: Building a Treebank for the Dialect of Lesbos through Knowledge Transfer from Standard Modern Greek},
91
+ author={Bompolas, Stavros and Markantonatou, Stella and Ralli, Angela and Anastasopoulos, Antonios},
92
+ booktitle={Proceedings of the 8th Universal Dependencies Workshop (UDW, SyntaxFest 2025)},
93
+ year={2025},
94
+ publisher={Association for Computational Linguistics}
95
+ }
96
+ ```
97
+
98
+ ## Related Resources
99
+
100
+ - 🤗 [Lesbian Greek Morphosyntactic Model](https://huggingface.co/sbompolas/Lesbian-Greek-Morphosyntactic-Model)
101
+ - 📚 [UD_Greek-Lesbian Treebank](https://github.com/UniversalDependencies/UD_Greek-Lesbian)
102
+ - 🔧 [Stanza Documentation](https://stanfordnlp.github.io/stanza/)
103
+ - 🇬🇷 [Greek BERT Model](https://huggingface.co/nlpaueb/bert-base-greek-uncased-v1)
104
 
105
  ## Technical Details
106
 
107
+ ### Dependencies
108
+ - `gradio>=4.0.0` - Web interface
109
+ - `stanza>=1.7.0` - NLP pipeline
110
+ - `pandas>=1.5.0` - Data handling
111
+ - `torch>=1.9.0` - Neural network backend
112
+ - `transformers>=4.20.0` - BERT integration
113
+
114
+ ### File Structure
115
+ ```
116
+ ├── app.py # Main Gradio application
117
+ ├── requirements.txt # Python dependencies
118
+ └── README.md # This documentation
119
+ ```
120
+
121
+ ## Development
122
+
123
+ To run locally:
124
+
125
+ ```bash
126
+ git clone <this-space>
127
+ cd <space-directory>
128
+ pip install -r requirements.txt
129
+ python app.py
130
+ ```
131
+
132
+ ## Support
133
+
134
+ For issues related to:
135
+ - **The model**: Contact the original authors or open an issue on the model repository
136
+ - **This Space**: Open an issue in the Space's discussion tab
137
+ - **Stanza**: Refer to the [Stanza documentation](https://stanfordnlp.github.io/stanza/)
138
 
139
+ ## License
140
 
141
+ Please refer to the original model's license terms and the individual component licenses (Stanza, Greek BERT, etc.).