word_enc_de / README.md
cstr's picture
Update README.md
0f3b652 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Word Enc De
emoji: 
colorFrom: red
colorTo: red
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: cc-by-sa-4.0

🏛️ Word Encyclopedia: German Linguistics Hub

A consolidated Gradio application that combines NLP tools into a single web interface for deep (mostly: German) linguistic analysis. Does both broad non-contextual analysis of individual words and deep contextual analysis of full sentences.

License: CC BY-SA 4.0

🌟 Features

📖 Word Encyclopedia (DE) - Non-Contextual Analysis

The flagship tool for comprehensive word analysis. Enter a single German word and discover all its possible grammatical roles.

Key Features:

  • Multi-Engine Architecture: Uses HanTa (primary) with automatic fallback to spacy-iwnlp
  • Artifact-Free: Cross-validates all grammatical roles with OdeNet to eliminate false inflections
  • Comprehensive Coverage: Handles ambiguous words (e.g., "Lauf" as noun vs "laufen" as verb, "See" as masculine/neuter)
  • Rich Data:
    • Complete inflection tables (declension, conjugation) via pattern.de
    • Morpheme analysis via HanTa
    • Semantic senses from OdeNet (German WordNet)
    • Conceptual relations from ConceptNet

Example: Input "Lauf" → Finds both noun ("der Lauf" - the run) and verb ("laufen" - to run) with complete inflections.

🚀 Comprehensive Analyzer (DE) - Contextual Analysis

Deep sentence-level analysis with context-aware semantic ranking.

Key Features:

  • Lemma-by-lemma analysis of entire sentences
  • Context-aware semantic ranking using spaCy sentence vectors
  • Subject-Verb-Agreement (SVA) validation
  • Grammar checking via LanguageTool
  • Filters semantic senses by relevance to sentence context

Example: In "Der schnelle Hund läuft", ranks "fast" senses of "schnell" higher than "quick" based on context.

🔬 spaCy Analyzer - Multi-lingual

Direct access to morpho-syntactic parsing for multiple languages.

Supported Languages:

  • German (de_core_news_md)
  • English (en_core_web_md)
  • Spanish (es_core_news_md)
  • Ancient Greek (7 greCy models: PROIEL/Perseus, TRF/LG/SM variants)

Outputs:

  • Dependency parsing visualization
  • Named Entity Recognition (NER)
  • Morphological analysis tables
  • JSON export

✅ Grammar Check (DE)

Professional-grade grammar and style checking powered by language-tool-python.

📚 Inflections (DE)

Direct access to complete German inflection generation via pattern.de.

📖 Thesaurus (DE)

Query interface for OdeNet (German WordNet) with:

  • Synonyms, antonyms
  • Hypernyms, hyponyms
  • Holonyms, meronyms
  • Multiple sense disambiguation

🌐 ConceptNet

Direct API access to ConceptNet 5 knowledge graph with robust parser that filters self-referential results.

🏗️ Architecture

┌─────────────────────────────────────────────────────┐
│                  Gradio Interface                    │
└─────────────────────────────────────────────────────┘
                         │
        ┌────────────────┼────────────────┐
        │                │                │
┌───────▼────────┐ ┌─────▼──────┐ ┌──────▼───────┐
│ Word           │ │Comprehensive│ │   spaCy      │
│ Encyclopedia   │ │  Analyzer   │ │  Analyzer    │
│ (Non-Context)  │ │ (Contextual)│ │(Multi-lingual)│
└───────┬────────┘ └─────┬──────┘ └──────┬───────┘
        │                │                │
   ┌────▼────┐      ┌────▼────┐     ┌────▼────┐
   │  HanTa  │      │ spaCy   │     │ spaCy   │
   │(Primary)│      │Sentence │     │ Models  │
   └────┬────┘      │ Vectors │     └─────────┘
        │           └────┬────┘
   ┌────▼────┐           │
   │spaCy-   │           │
   │ IWNLP   │           │
   │(Fallback)│          │
   └────┬────┘           │
        │                │
        └────────┬───────┘
                 │
    ┌────────────┼────────────┐
    │            │            │
┌───▼────┐  ┌───▼────┐  ┌───▼────┐
│Pattern │  │ OdeNet │  │ConceptNet│
│  .de   │  │(WordNet)│  │   API   │
└────────┘  └────────┘  └─────────┘

🔌 API Endpoints

The application exposes Gradio API endpoints:

  • /api/get_morphology - spaCy analysis
  • /api/check_grammar - Grammar checking
  • /api/get_thesaurus - OdeNet queries
  • /api/get_all_inflections - Pattern.de inflections
  • /api/get_conceptnet - ConceptNet queries
  • /api/comprehensive_analysis - Full contextual analysis
  • /api/analyze_word - Word encyclopedia

Access via Gradio Client:

from gradio_client import Client

client = Client("http://localhost:7860")
result = client.predict("Lauf", 3, api_name="/analyze_word")

🐛 Troubleshooting

"HanTa model file missing"

pip uninstall HanTa
pip install HanTa --no-cache-dir

"LanguageTool failed to initialize"

Ensure Java is installed (required by LanguageTool):

java -version  # Should show Java 8+

"OdeNet worker failed"

Check internet connection. The app downloads OdeNet data on first run.

spaCy model not found

python -m spacy download de_core_news_md --force

⚖️ License and Attribution

This application is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0) due to the ShareAlike requirements of its core data dependencies.

Why CC BY-SA 4.0?

  • OdeNet Data: CC BY-SA 4.0 (German WordNet)
  • ConceptNet 5 Data: CC BY-SA 4.0

The ShareAlike clause requires derivative works to use the same license.

Your Obligations

Under CC BY-SA 4.0, you must:

  1. Give credit: Cite this work and all dependencies
  2. Indicate changes: Note any modifications you make
  3. Share-Alike: Distribute derivative works under CC BY-SA 4.0
  4. No additional restrictions: Cannot add DRM or extra legal terms

🙏 Acknowledgments

This project would not be possible without:

  • HanTa - High-accuracy morphological analysis
  • spaCy - Industrial-strength NLP framework
  • IWNLP - Comprehensive German lemmatization
  • OdeNet - Open German WordNet
  • pattern.de - German linguistics tools
  • ConceptNet - Multilingual knowledge graph
  • LanguageTool - Grammar checking
  • greCy - Ancient Greek NLP models