Spaces:
Paused
A newer version of the Gradio SDK is available:
6.1.0
title: Word Enc De
emoji: ⚡
colorFrom: red
colorTo: red
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: cc-by-sa-4.0
🏛️ Word Encyclopedia: German Linguistics Hub
A consolidated Gradio application that combines NLP tools into a single web interface for deep (mostly: German) linguistic analysis. Does both broad non-contextual analysis of individual words and deep contextual analysis of full sentences.
🌟 Features
📖 Word Encyclopedia (DE) - Non-Contextual Analysis
The flagship tool for comprehensive word analysis. Enter a single German word and discover all its possible grammatical roles.
Key Features:
- Multi-Engine Architecture: Uses
HanTa(primary) with automatic fallback tospacy-iwnlp - Artifact-Free: Cross-validates all grammatical roles with OdeNet to eliminate false inflections
- Comprehensive Coverage: Handles ambiguous words (e.g., "Lauf" as noun vs "laufen" as verb, "See" as masculine/neuter)
- Rich Data:
- Complete inflection tables (declension, conjugation) via
pattern.de - Morpheme analysis via
HanTa - Semantic senses from
OdeNet(German WordNet) - Conceptual relations from
ConceptNet
- Complete inflection tables (declension, conjugation) via
Example: Input "Lauf" → Finds both noun ("der Lauf" - the run) and verb ("laufen" - to run) with complete inflections.
🚀 Comprehensive Analyzer (DE) - Contextual Analysis
Deep sentence-level analysis with context-aware semantic ranking.
Key Features:
- Lemma-by-lemma analysis of entire sentences
- Context-aware semantic ranking using spaCy sentence vectors
- Subject-Verb-Agreement (SVA) validation
- Grammar checking via LanguageTool
- Filters semantic senses by relevance to sentence context
Example: In "Der schnelle Hund läuft", ranks "fast" senses of "schnell" higher than "quick" based on context.
🔬 spaCy Analyzer - Multi-lingual
Direct access to morpho-syntactic parsing for multiple languages.
Supported Languages:
- German (
de_core_news_md) - English (
en_core_web_md) - Spanish (
es_core_news_md) - Ancient Greek (7 greCy models: PROIEL/Perseus, TRF/LG/SM variants)
Outputs:
- Dependency parsing visualization
- Named Entity Recognition (NER)
- Morphological analysis tables
- JSON export
✅ Grammar Check (DE)
Professional-grade grammar and style checking powered by language-tool-python.
📚 Inflections (DE)
Direct access to complete German inflection generation via pattern.de.
📖 Thesaurus (DE)
Query interface for OdeNet (German WordNet) with:
- Synonyms, antonyms
- Hypernyms, hyponyms
- Holonyms, meronyms
- Multiple sense disambiguation
🌐 ConceptNet
Direct API access to ConceptNet 5 knowledge graph with robust parser that filters self-referential results.
🏗️ Architecture
┌─────────────────────────────────────────────────────┐
│ Gradio Interface │
└─────────────────────────────────────────────────────┘
│
┌────────────────┼────────────────┐
│ │ │
┌───────▼────────┐ ┌─────▼──────┐ ┌──────▼───────┐
│ Word │ │Comprehensive│ │ spaCy │
│ Encyclopedia │ │ Analyzer │ │ Analyzer │
│ (Non-Context) │ │ (Contextual)│ │(Multi-lingual)│
└───────┬────────┘ └─────┬──────┘ └──────┬───────┘
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│ HanTa │ │ spaCy │ │ spaCy │
│(Primary)│ │Sentence │ │ Models │
└────┬────┘ │ Vectors │ └─────────┘
│ └────┬────┘
┌────▼────┐ │
│spaCy- │ │
│ IWNLP │ │
│(Fallback)│ │
└────┬────┘ │
│ │
└────────┬───────┘
│
┌────────────┼────────────┐
│ │ │
┌───▼────┐ ┌───▼────┐ ┌───▼────┐
│Pattern │ │ OdeNet │ │ConceptNet│
│ .de │ │(WordNet)│ │ API │
└────────┘ └────────┘ └─────────┘
🔌 API Endpoints
The application exposes Gradio API endpoints:
/api/get_morphology- spaCy analysis/api/check_grammar- Grammar checking/api/get_thesaurus- OdeNet queries/api/get_all_inflections- Pattern.de inflections/api/get_conceptnet- ConceptNet queries/api/comprehensive_analysis- Full contextual analysis/api/analyze_word- Word encyclopedia
Access via Gradio Client:
from gradio_client import Client
client = Client("http://localhost:7860")
result = client.predict("Lauf", 3, api_name="/analyze_word")
🐛 Troubleshooting
"HanTa model file missing"
pip uninstall HanTa
pip install HanTa --no-cache-dir
"LanguageTool failed to initialize"
Ensure Java is installed (required by LanguageTool):
java -version # Should show Java 8+
"OdeNet worker failed"
Check internet connection. The app downloads OdeNet data on first run.
spaCy model not found
python -m spacy download de_core_news_md --force
⚖️ License and Attribution
This application is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0) due to the ShareAlike requirements of its core data dependencies.
Why CC BY-SA 4.0?
- OdeNet Data: CC BY-SA 4.0 (German WordNet)
- ConceptNet 5 Data: CC BY-SA 4.0
The ShareAlike clause requires derivative works to use the same license.
Your Obligations
Under CC BY-SA 4.0, you must:
- Give credit: Cite this work and all dependencies
- Indicate changes: Note any modifications you make
- Share-Alike: Distribute derivative works under CC BY-SA 4.0
- No additional restrictions: Cannot add DRM or extra legal terms
🙏 Acknowledgments
This project would not be possible without:
- HanTa - High-accuracy morphological analysis
- spaCy - Industrial-strength NLP framework
- IWNLP - Comprehensive German lemmatization
- OdeNet - Open German WordNet
- pattern.de - German linguistics tools
- ConceptNet - Multilingual knowledge graph
- LanguageTool - Grammar checking
- greCy - Ancient Greek NLP models