nikosteskos commited on
Commit
039119f
·
verified ·
1 Parent(s): 844359e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -10
README.md CHANGED
@@ -6,20 +6,54 @@ colorTo: yellow
6
  sdk: static
7
  pinned: false
8
  ---
 
9
 
10
- # **GlossAPI**
11
 
12
- GlossAPI is a project by [GFOSS Open Technologies Alliance](https://gfoss.eu), focused on building foundational infrastructure for Greek Natural Language Processing. Our work centers on the **creation of high-quality, open-access datasets** and the development of a robust, modular **processing pipeline** tailored for academic and domain-specific documents.
13
 
14
- We aim to lay the groundwork for **open, collaborative, and reproducible NLP research** in the Greek language, supporting researchers, students, and developers in the digital humanities, computational linguistics, and AI communities.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- Our pipeline covers every stage of document processing—from **automated downloading and text extraction**, to **section segmentation, classification**, and **annotation**. It supports documents in multiple formats and includes dedicated tools for Greek-language content, preserving structure and metadata throughout.
17
 
18
- GlossAPI contributes to the long-term vision of a sustainable, open ecosystem for Greek NLP by:
19
- - Publishing open-source tools and datasets under permissive licenses
20
- - Promoting interoperability and data transparency
21
- - Encouraging community contributions and reuse
 
 
 
 
 
 
 
22
 
23
- 📂 All datasets are released under **Creative Commons licenses**, and our source code is publicly available on [GitHub](https://github.com/eellak/glossapi).
24
 
25
- 📬 Contact: glossapi.team@eellak.gr
 
 
 
 
 
 
 
 
6
  sdk: static
7
  pinned: false
8
  ---
9
+ # 🏛️ What is GlossAPI?
10
 
11
+ **GlossAPI** is an open-source infrastructure developed by **Open Technologies Alliance (GFOSS)** to transform raw Greek text from public consultation, science, education, literature, and culture into clean, well-documented, AI-ready data.
12
 
13
+ As the Greek language remains underrepresented in large-scale AI datasets, GlossAPI provides the tools and workflows needed to create high-quality linguistic resources that are **openly accessible** and **fully reproducible**.
14
 
15
+ ---
16
+
17
+ ## 🚀 Infrastructure & Pipeline
18
+
19
+ The project builds a foundational infrastructure for Greek Natural Language Processing (NLP) by combining a robust, modular processing pipeline with a strong commitment to open standards.
20
+
21
+ The pipeline supports multiple file formats while preserving structure and metadata, covering every stage of document processing:
22
+
23
+ * **📥 Automated Downloading:** Systematic retrieval of texts.
24
+ * **📄 Text Extraction:** Parsing raw files into readable text.
25
+ * **✂️ Section Segmentation:** Structuring content logically.
26
+ * **🏷️ Classification & Annotation:** Enriching data for machine learning.
27
+
28
+ ## 🌍 Impact & Usage
29
+
30
+ High-quality datasets produced by GlossAPI are currently available here on Hugging Face, enabling:
31
+ * Research and Education
32
+ * Digital Humanities
33
+ * NLP Applications
34
+ * Development of Greek Language Models (LLMs)
35
 
36
+ GlossAPI is also utilized in European projects to improve the understanding and processing of the Greek language in real-world contexts.
37
 
38
+ ---
39
+
40
+ ## 🤝 Community & Ecosystem
41
+
42
+ Beyond a tool, **GlossAPI is a community**. Researchers, developers, linguists, and students collaborate in an open, participatory, and ethically aligned ecosystem for Greek language technology.
43
+
44
+ Whether you are training models, building smarter search systems, or exploring Greek digital heritage, GlossAPI provides the foundations to build scalable, transparent, and socially responsible AI applications.
45
+
46
+ > **Open Source Commitment:** All datasets are released under **Creative Commons** licenses, and the source code is openly available on GitHub.
47
+
48
+ ---
49
 
50
+ ## 🔗 Links and Contact
51
 
52
+ | Platform | Link |
53
+ | :--- | :--- |
54
+ | 🌐 **Website** | [glossapi.gr](https://glossapi.gr) |
55
+ | 💠 **Blog** | [blog.glossapi.gr](https://blog.glossapi.gr/) |
56
+ | 🤗 **Dataset Repository** | [huggingface.co/glossAPI](https://huggingface.co/glossAPI) |
57
+ | 💻 **Code & Documentation** | [github.com/eellak/glossAPI](https://github.com/eellak/glossAPI) |
58
+ | 🖼️ **Join our team** | [Become part of GlossAPI](https://blog.glossapi.gr/en/become-part-of-glossapi/) |
59
+ | 📧 **Contact** | [glossapi.team@eellak.gr](mailto:glossapi.team@eellak.gr) |