Update README.md
Browse files
README.md
CHANGED
|
@@ -6,20 +6,54 @@ colorTo: yellow
|
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
|
|
|
| 9 |
|
| 10 |
-
|
| 11 |
|
| 12 |
-
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
+
# 🏛️ What is GlossAPI?
|
| 10 |
|
| 11 |
+
**GlossAPI** is an open-source infrastructure developed by **Open Technologies Alliance (GFOSS)** to transform raw Greek text from public consultation, science, education, literature, and culture into clean, well-documented, AI-ready data.
|
| 12 |
|
| 13 |
+
As the Greek language remains underrepresented in large-scale AI datasets, GlossAPI provides the tools and workflows needed to create high-quality linguistic resources that are **openly accessible** and **fully reproducible**.
|
| 14 |
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
## 🚀 Infrastructure & Pipeline
|
| 18 |
+
|
| 19 |
+
The project builds a foundational infrastructure for Greek Natural Language Processing (NLP) by combining a robust, modular processing pipeline with a strong commitment to open standards.
|
| 20 |
+
|
| 21 |
+
The pipeline supports multiple file formats while preserving structure and metadata, covering every stage of document processing:
|
| 22 |
+
|
| 23 |
+
* **📥 Automated Downloading:** Systematic retrieval of texts.
|
| 24 |
+
* **📄 Text Extraction:** Parsing raw files into readable text.
|
| 25 |
+
* **✂️ Section Segmentation:** Structuring content logically.
|
| 26 |
+
* **🏷️ Classification & Annotation:** Enriching data for machine learning.
|
| 27 |
+
|
| 28 |
+
## 🌍 Impact & Usage
|
| 29 |
+
|
| 30 |
+
High-quality datasets produced by GlossAPI are currently available here on Hugging Face, enabling:
|
| 31 |
+
* Research and Education
|
| 32 |
+
* Digital Humanities
|
| 33 |
+
* NLP Applications
|
| 34 |
+
* Development of Greek Language Models (LLMs)
|
| 35 |
|
| 36 |
+
GlossAPI is also utilized in European projects to improve the understanding and processing of the Greek language in real-world contexts.
|
| 37 |
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
## 🤝 Community & Ecosystem
|
| 41 |
+
|
| 42 |
+
Beyond a tool, **GlossAPI is a community**. Researchers, developers, linguists, and students collaborate in an open, participatory, and ethically aligned ecosystem for Greek language technology.
|
| 43 |
+
|
| 44 |
+
Whether you are training models, building smarter search systems, or exploring Greek digital heritage, GlossAPI provides the foundations to build scalable, transparent, and socially responsible AI applications.
|
| 45 |
+
|
| 46 |
+
> **Open Source Commitment:** All datasets are released under **Creative Commons** licenses, and the source code is openly available on GitHub.
|
| 47 |
+
|
| 48 |
+
---
|
| 49 |
|
| 50 |
+
## 🔗 Links and Contact
|
| 51 |
|
| 52 |
+
| Platform | Link |
|
| 53 |
+
| :--- | :--- |
|
| 54 |
+
| 🌐 **Website** | [glossapi.gr](https://glossapi.gr) |
|
| 55 |
+
| 💠 **Blog** | [blog.glossapi.gr](https://blog.glossapi.gr/) |
|
| 56 |
+
| 🤗 **Dataset Repository** | [huggingface.co/glossAPI](https://huggingface.co/glossAPI) |
|
| 57 |
+
| 💻 **Code & Documentation** | [github.com/eellak/glossAPI](https://github.com/eellak/glossAPI) |
|
| 58 |
+
| 🖼️ **Join our team** | [Become part of GlossAPI](https://blog.glossapi.gr/en/become-part-of-glossapi/) |
|
| 59 |
+
| 📧 **Contact** | [glossapi.team@eellak.gr](mailto:glossapi.team@eellak.gr) |
|