Spaces:
Paused
Paused
| title: Word Enc De | |
| emoji: ⚡ | |
| colorFrom: red | |
| colorTo: red | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app.py | |
| pinned: false | |
| license: cc-by-sa-4.0 | |
| # 🏛️ Word Encyclopedia: German Linguistics Hub | |
| A consolidated Gradio application that combines NLP tools into a single web interface for deep (mostly: German) linguistic analysis. Does both broad non-contextual analysis of individual words and deep contextual analysis of full sentences. | |
| [](https://creativecommons.org/licenses/by-sa/4.0/) | |
| ## 🌟 Features | |
| ### 📖 Word Encyclopedia (DE) - Non-Contextual Analysis | |
| The flagship tool for comprehensive word analysis. Enter a single German word and discover all its possible grammatical roles. | |
| **Key Features:** | |
| - **Multi-Engine Architecture**: Uses `HanTa` (primary) with automatic fallback to `spacy-iwnlp` | |
| - **Artifact-Free**: Cross-validates all grammatical roles with OdeNet to eliminate false inflections | |
| - **Comprehensive Coverage**: Handles ambiguous words (e.g., "Lauf" as noun vs "laufen" as verb, "See" as masculine/neuter) | |
| - **Rich Data**: | |
| - Complete inflection tables (declension, conjugation) via `pattern.de` | |
| - Morpheme analysis via `HanTa` | |
| - Semantic senses from `OdeNet` (German WordNet) | |
| - Conceptual relations from `ConceptNet` | |
| **Example:** Input "Lauf" → Finds both noun ("der Lauf" - the run) and verb ("laufen" - to run) with complete inflections. | |
| ### 🚀 Comprehensive Analyzer (DE) - Contextual Analysis | |
| Deep sentence-level analysis with context-aware semantic ranking. | |
| **Key Features:** | |
| - Lemma-by-lemma analysis of entire sentences | |
| - Context-aware semantic ranking using spaCy sentence vectors | |
| - Subject-Verb-Agreement (SVA) validation | |
| - Grammar checking via LanguageTool | |
| - Filters semantic senses by relevance to sentence context | |
| **Example:** In "Der schnelle Hund läuft", ranks "fast" senses of "schnell" higher than "quick" based on context. | |
| ### 🔬 spaCy Analyzer - Multi-lingual | |
| Direct access to morpho-syntactic parsing for multiple languages. | |
| **Supported Languages:** | |
| - German (`de_core_news_md`) | |
| - English (`en_core_web_md`) | |
| - Spanish (`es_core_news_md`) | |
| - Ancient Greek (7 greCy models: PROIEL/Perseus, TRF/LG/SM variants) | |
| **Outputs:** | |
| - Dependency parsing visualization | |
| - Named Entity Recognition (NER) | |
| - Morphological analysis tables | |
| - JSON export | |
| ### ✅ Grammar Check (DE) | |
| Professional-grade grammar and style checking powered by `language-tool-python`. | |
| ### 📚 Inflections (DE) | |
| Direct access to complete German inflection generation via `pattern.de`. | |
| ### 📖 Thesaurus (DE) | |
| Query interface for OdeNet (German WordNet) with: | |
| - Synonyms, antonyms | |
| - Hypernyms, hyponyms | |
| - Holonyms, meronyms | |
| - Multiple sense disambiguation | |
| ### 🌐 ConceptNet | |
| Direct API access to ConceptNet 5 knowledge graph with robust parser that filters self-referential results. | |
| ## 🏗️ Architecture | |
| ``` | |
| ┌─────────────────────────────────────────────────────┐ | |
| │ Gradio Interface │ | |
| └─────────────────────────────────────────────────────┘ | |
| │ | |
| ┌────────────────┼────────────────┐ | |
| │ │ │ | |
| ┌───────▼────────┐ ┌─────▼──────┐ ┌──────▼───────┐ | |
| │ Word │ │Comprehensive│ │ spaCy │ | |
| │ Encyclopedia │ │ Analyzer │ │ Analyzer │ | |
| │ (Non-Context) │ │ (Contextual)│ │(Multi-lingual)│ | |
| └───────┬────────┘ └─────┬──────┘ └──────┬───────┘ | |
| │ │ │ | |
| ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ | |
| │ HanTa │ │ spaCy │ │ spaCy │ | |
| │(Primary)│ │Sentence │ │ Models │ | |
| └────┬────┘ │ Vectors │ └─────────┘ | |
| │ └────┬────┘ | |
| ┌────▼────┐ │ | |
| │spaCy- │ │ | |
| │ IWNLP │ │ | |
| │(Fallback)│ │ | |
| └────┬────┘ │ | |
| │ │ | |
| └────────┬───────┘ | |
| │ | |
| ┌────────────┼────────────┐ | |
| │ │ │ | |
| ┌───▼────┐ ┌───▼────┐ ┌───▼────┐ | |
| │Pattern │ │ OdeNet │ │ConceptNet│ | |
| │ .de │ │(WordNet)│ │ API │ | |
| └────────┘ └────────┘ └─────────┘ | |
| ``` | |
| ## 🔌 API Endpoints | |
| The application exposes Gradio API endpoints: | |
| - `/api/get_morphology` - spaCy analysis | |
| - `/api/check_grammar` - Grammar checking | |
| - `/api/get_thesaurus` - OdeNet queries | |
| - `/api/get_all_inflections` - Pattern.de inflections | |
| - `/api/get_conceptnet` - ConceptNet queries | |
| - `/api/comprehensive_analysis` - Full contextual analysis | |
| - `/api/analyze_word` - Word encyclopedia | |
| Access via Gradio Client: | |
| ```python | |
| from gradio_client import Client | |
| client = Client("http://localhost:7860") | |
| result = client.predict("Lauf", 3, api_name="/analyze_word") | |
| ``` | |
| ## 🐛 Troubleshooting | |
| ### "HanTa model file missing" | |
| ```bash | |
| pip uninstall HanTa | |
| pip install HanTa --no-cache-dir | |
| ``` | |
| ### "LanguageTool failed to initialize" | |
| Ensure Java is installed (required by LanguageTool): | |
| ```bash | |
| java -version # Should show Java 8+ | |
| ``` | |
| ### "OdeNet worker failed" | |
| Check internet connection. The app downloads OdeNet data on first run. | |
| ### spaCy model not found | |
| ```bash | |
| python -m spacy download de_core_news_md --force | |
| ``` | |
| ## ⚖️ License and Attribution | |
| This application is licensed under the **Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0)** due to the ShareAlike requirements of its core data dependencies. | |
| ### Why CC BY-SA 4.0? | |
| - **OdeNet Data**: CC BY-SA 4.0 (German WordNet) | |
| - **ConceptNet 5 Data**: CC BY-SA 4.0 | |
| The ShareAlike clause requires derivative works to use the same license. | |
| ### Your Obligations | |
| Under CC BY-SA 4.0, you must: | |
| 1. **Give credit**: Cite this work and all dependencies | |
| 2. **Indicate changes**: Note any modifications you make | |
| 3. **Share-Alike**: Distribute derivative works under CC BY-SA 4.0 | |
| 4. **No additional restrictions**: Cannot add DRM or extra legal terms | |
| ## 🙏 Acknowledgments | |
| This project would not be possible without: | |
| - **[HanTa](https://github.com/wartaal/HanTa)** - High-accuracy morphological analysis | |
| - **[spaCy](https://spacy.io/)** - Industrial-strength NLP framework | |
| - **[IWNLP](https://www.iwnlp.com/)** - Comprehensive German lemmatization | |
| - **[OdeNet](https://github.com/hdaSprachtechnologie/odenet)** - Open German WordNet | |
| - **[pattern.de](https://github.com/clips/pattern)** - German linguistics tools | |
| - **[ConceptNet](https://conceptnet.io/)** - Multilingual knowledge graph | |
| - **[LanguageTool](https://languagetool.org/)** - Grammar checking | |
| - **[greCy](https://github.com/CrispStrobe/greCy)** - Ancient Greek NLP models | |