Spaces:
Running
title: WiktionaryDE
emoji: π
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: cc-by-sa-3.0
π©πͺ WiktionaryDE - German Linguistics Hub
An advanced multi-tool for German linguistic analysis that combines German Wiktionary database query with multiple morphological engines and semantic knowledge bases into a single, comprehensive interface.
π― Overview
This Space aggregates multiple German NLP tools and databases to provide:
- Deep morphological analysis of German words
- Contextual sentence analysis with semantic ranking
- Full inflection tables (declensions and conjugations)
- Thesaurus and semantic relation discovery
- Grammar and spelling checking
π οΈ Tools & Data Sources
Core Databases
- Wiktionary Database: 3.7GB
cstr/de-wiktionary-sqlite-normalizeddatabase providing ground truth for lemmas, inflected forms, definitions, examples, and pronunciation - OdeNet (WordNet): German thesaurus for synonyms, antonyms, hypernyms, etc.
- ConceptNet: Multilingual knowledge graph for semantic relations
Morphological Engines
- DWDSmor: High-precision FST-based analyzer from
zentrum-lexikographie/dwdsmor-open - HanTa: Hanover Tagger for robust morphological analysis and lemmatization
- spaCy-IWNLP:
de_core_news_mdcombined with IWNLP for spaCy-based analysis - Pattern.de: Full inflection table generation
Additional Tools
- LanguageTool: German grammar and spelling checks
π Main Features
1. Word Encyclopedia (DE)
The primary non-contextual tool for analyzing single words.
What it does:
- Finds all possible analyses (e.g., "Lauf" as noun vs. "lauf" as verb)
- Aggregates data from all engines and databases
- Cross-validates results to filter out artifacts
- Provides complete morphological, semantic, and inflectional information
Engine Options:
- Wiktionary (Default): Most accurate, database-driven
- DWDSmor: High-precision formal grammar
- HanTa: Robust tagger-based
- IWNLP: spaCy-based analysis
The engine selector automatically falls back to other engines if no result is found.
2. Comprehensive Analyzer (DE)
Full sentence analysis with contextual disambiguation.
Features:
- Uses spaCy to parse sentences and extract lemmas
- Runs full Word Encyclopedia analysis on each lemma
- Contextual Ranking: Uses sentence similarity to rank semantic senses by relevance to the full sentence
- Provides integrated analysis of all words in context
3. Individual Engine Tabs
Direct access to raw outputs from:
- Wiktionary
- DWDSmor
- HanTa
- IWNLP
Useful for comparing individual engine outputs.
4. Component Tools
Raw access to specialized tools:
- spaCy: Dependency parsing and NER
- Grammar: LanguageTool checking
- Inflections: Pattern.de inflection tables
- Thesaurus: OdeNet relations
- ConceptNet: Semantic knowledge graph
βοΈ Technical Details
- SDK: Gradio 4.31.0
- Database Size: 3.7GB (Wiktionary sqlite)
- Processing: Multi-engine pipeline with intelligent fallback
- (basic) Quality Control: Cross-validation between engines to filter artifacts
π License
The code for this Gradio interface is licensed under CC-BY-SA-3.0.
The underlying models and data sources retain their original licenses:
- Wiktionary: CC-BY-SA
- DWDSmor: Open license (zentrum-lexikographie)
- HanTa: Various open licenses
- spaCy models: MIT License
- OdeNet: CC-BY-SA
- ConceptNet: CC-BY-SA
Note: This is a simple educational tool and work-in-progress. Many results will not be consistent and faulty.