Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -9,5 +9,562 @@ pinned: false
|
|
| 9 |
license: apache-2.0
|
| 10 |
sdk_version: 1.51.0
|
| 11 |
---
|
|
|
|
|
|
|
| 12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
| 9 |
license: apache-2.0
|
| 10 |
sdk_version: 1.51.0
|
| 11 |
---
|
| 12 |
+
# Grammify - Intelligent Grammar Correction System
|
| 13 |
+
## AI-Powered Grammar Error Detection Using Transformer Models
|
| 14 |
|
| 15 |
+
<div align="center">
|
| 16 |
+
|
| 17 |
+

|
| 18 |
+

|
| 19 |
+

|
| 20 |
+

|
| 21 |
+
|
| 22 |
+
[](https://huggingface.co/spaces/Abdullahrasheed45/Grammify)
|
| 23 |
+
[](https://opensource.org/licenses/Apache-2.0)
|
| 24 |
+
[]()
|
| 25 |
+
|
| 26 |
+
**NLP Application Project** | Deployed on Hugging Face Spaces
|
| 27 |
+
|
| 28 |
+
</div>
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
> An intelligent grammar correction application leveraging state-of-the-art Seq2Seq transformer models to detect and correct grammatical errors with real-time visual feedback and detailed linguistic error analysis.
|
| 33 |
+
|
| 34 |
+
## Overview
|
| 35 |
+
|
| 36 |
+
Grammify implements an advanced grammar correction system designed to enhance written communication across professional, academic, and personal contexts. Built on the Gramformer library and powered by a custom T5-based model, the system processes natural language input through a transformer architecture to identify and correct diverse grammatical errors with high accuracy and contextual awareness.
|
| 37 |
+
|
| 38 |
+
**Technical Context:** Full-stack NLP application integrating FastAPI microservices, Streamlit frontend, and Hugging Face Transformers for production-grade grammar correction.
|
| 39 |
+
|
| 40 |
+
---
|
| 41 |
+
|
| 42 |
+
## Key Features
|
| 43 |
+
|
| 44 |
+
### Transformer-Based Architecture
|
| 45 |
+
- **Seq2Seq Deep Learning:** T5-based encoder-decoder architecture processes grammatical correction as sequence-to-sequence translation
|
| 46 |
+
- **Production Deployment:** FastAPI inference server with uvicorn workers for concurrent request handling
|
| 47 |
+
- **Real-time Processing:** ~2-3 second inference latency per sentence
|
| 48 |
+
|
| 49 |
+
### Grammar Error Coverage
|
| 50 |
+
The system corrects 15+ grammatical error types with high linguistic precision:
|
| 51 |
+
|
| 52 |
+
| Error Type | Description | Example Correction |
|
| 53 |
+
|------------|-------------|-------------------|
|
| 54 |
+
| Subject-Verb Agreement | Verb conjugation matching subject | "Matt like fish" → "Matt likes fish" |
|
| 55 |
+
| Verb Tense Consistency | Temporal coherence in narratives | "I walk to the store and I bought milk" → "I walked to the store and bought milk" |
|
| 56 |
+
| Article Usage | Determiner selection (a/an/the) | Missing or incorrect articles |
|
| 57 |
+
| Pronoun Errors | Possessive vs. contraction | "They're house" → "Their house" |
|
| 58 |
+
| Preposition Selection | Contextual preposition choice | "Feel free reach out" → "Feel free to reach out" |
|
| 59 |
+
| Word Form | Part-of-speech corrections | "Life is shortest" → "Life is short" |
|
| 60 |
+
| Auxiliary Verbs | Modal and helping verb errors | "what be the reason" → "what is the reason" |
|
| 61 |
+
| Gerund/Infinitive | Verb form following verbs | "everyone leave" → "everyone leaving" |
|
| 62 |
+
| Pronoun Case | Subject/object pronoun usage | "How is you?" → "How are you?" |
|
| 63 |
+
| Punctuation | Apostrophes, commas, periods | "Its going to rain" → "It's going to rain" |
|
| 64 |
+
|
| 65 |
+
### Interactive Visualization
|
| 66 |
+
- **Color-Coded Annotations:** Visual highlighting system distinguishes error types
|
| 67 |
+
- **Red (Deletion):** Words/characters to remove
|
| 68 |
+
- **Green (Addition):** Missing words/characters
|
| 69 |
+
- **Yellow (Change):** Word replacements or modifications
|
| 70 |
+
- **Detailed Edit Tables:** Structured breakdown of each grammatical correction with token positions
|
| 71 |
+
- **Linguistic Error Classification:** ERRANT-based error type identification (morphology, syntax, orthography)
|
| 72 |
+
|
| 73 |
+
---
|
| 74 |
+
|
| 75 |
+
## System Performance
|
| 76 |
+
|
| 77 |
+
### Model Specifications
|
| 78 |
+
```
|
| 79 |
+
Model Architecture: T5-based Seq2Seq Transformer
|
| 80 |
+
Model Tag: Custom fine-tuned model
|
| 81 |
+
Tokenizer: AutoTokenizer (SentencePiece)
|
| 82 |
+
Maximum Sequence: 128 tokens
|
| 83 |
+
Sampling Strategy: Top-k (50) + Top-p (0.95)
|
| 84 |
+
Temperature: 1.0 (diverse generation)
|
| 85 |
+
Device: CPU (GPU compatible)
|
| 86 |
+
Inference Latency: ~2-3 seconds per sentence
|
| 87 |
+
Model Size: ~220MB (full precision)
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
### Generation Parameters
|
| 91 |
+
```python
|
| 92 |
+
Generation Configuration:
|
| 93 |
+
├── do_sample: True # Stochastic sampling enabled
|
| 94 |
+
├── max_length: 128 # Maximum output tokens
|
| 95 |
+
├── top_k: 50 # Top-k sampling threshold
|
| 96 |
+
├── top_p: 0.95 # Nucleus sampling probability
|
| 97 |
+
├── early_stopping: True # Stop at first EOS token
|
| 98 |
+
└── num_return_sequences: 1 # Single best candidate
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
### System Architecture Performance
|
| 102 |
+
| Component | Performance Metric |
|
| 103 |
+
|-----------|-------------------|
|
| 104 |
+
| FastAPI Server | Multi-worker uvicorn deployment |
|
| 105 |
+
| Startup Time | ~15-20 seconds (model loading) |
|
| 106 |
+
| Concurrent Requests | Handles 2+ simultaneous corrections |
|
| 107 |
+
| Port Configuration | 8080 (inference server) |
|
| 108 |
+
| Health Check | Socket-based port availability monitoring |
|
| 109 |
+
|
| 110 |
+
---
|
| 111 |
+
|
| 112 |
+
## Technical Architecture
|
| 113 |
+
|
| 114 |
+
### Seq2Seq Transformer Pipeline
|
| 115 |
+
|
| 116 |
+
```python
|
| 117 |
+
Input Text: "what be the reason for everyone leave the company"
|
| 118 |
+
↓
|
| 119 |
+
Preprocessing: Add task prefix → "gec: what be the reason..."
|
| 120 |
+
↓
|
| 121 |
+
Tokenization: SentencePiece encoding → Token IDs
|
| 122 |
+
↓
|
| 123 |
+
T5 Encoder: Contextualized embeddings (512 dimensions)
|
| 124 |
+
↓
|
| 125 |
+
T5 Decoder: Autoregressive generation with beam search
|
| 126 |
+
↓
|
| 127 |
+
Sampling: Top-k (50) + Top-p (0.95) filtering
|
| 128 |
+
↓
|
| 129 |
+
Detokenization: Token IDs → "what is the reason for everyone leaving the company"
|
| 130 |
+
↓
|
| 131 |
+
Post-processing: Remove special tokens, strip whitespace
|
| 132 |
+
↓
|
| 133 |
+
Output: Corrected sentence + confidence score
|
| 134 |
+
```
|
| 135 |
+
|
| 136 |
+
**Key Technical Design:**
|
| 137 |
+
- **Task Prefix:** `"gec: "` signals grammar error correction task to T5 model
|
| 138 |
+
- **Encoder-Decoder:** Bidirectional attention in encoder, causal attention in decoder
|
| 139 |
+
- **Sampling Strategy:** Balances diversity (top-p) and quality (top-k) for natural corrections
|
| 140 |
+
- **Early Stopping:** Terminates generation at first end-of-sequence token for efficiency
|
| 141 |
+
|
| 142 |
+
### Error Analysis Pipeline
|
| 143 |
+
|
| 144 |
+
```python
|
| 145 |
+
Original Sentence → spaCy Tokenization
|
| 146 |
+
↓
|
| 147 |
+
Corrected Sentence → spaCy Tokenization
|
| 148 |
+
↓
|
| 149 |
+
ERRANT Alignment
|
| 150 |
+
↓
|
| 151 |
+
Edit Extraction & Classification
|
| 152 |
+
↓
|
| 153 |
+
┌──────────────┬──────────────┐
|
| 154 |
+
│ Highlights │ Edit Table │
|
| 155 |
+
│ (Visual) │ (Tabular) │
|
| 156 |
+
└──────────────┴──────────────┘
|
| 157 |
+
```
|
| 158 |
+
|
| 159 |
+
**ERRANT Framework Integration:**
|
| 160 |
+
- **Parse Trees:** spaCy dependency parsing for syntactic structure
|
| 161 |
+
- **Token Alignment:** Levenshtein-based sequence alignment
|
| 162 |
+
- **Edit Operations:** Insertions, deletions, substitutions, and transpositions
|
| 163 |
+
- **Linguistic Classification:** Maps edits to error taxonomy (VERB:TENSE, DET, PREP, etc.)
|
| 164 |
+
|
| 165 |
+
### System Architecture
|
| 166 |
+
|
| 167 |
+
```
|
| 168 |
+
┌─────────────────────────────────────────────────────────────┐
|
| 169 |
+
│ Streamlit Frontend │
|
| 170 |
+
│ • Interactive text input interface │
|
| 171 |
+
│ • Pre-loaded example selector │
|
| 172 |
+
│ • Visual error highlighting display │
|
| 173 |
+
│ • Expandable edit table components │
|
| 174 |
+
└─────────────────┬───────────────────────────────────────────┘
|
| 175 |
+
│ HTTP POST
|
| 176 |
+
┌─────────────────▼───────────────────────────────────────────┐
|
| 177 |
+
│ FastAPI Inference Server │
|
| 178 |
+
│ • uvicorn ASGI server (port 8080) │
|
| 179 |
+
│ • Multi-worker request handling │
|
| 180 |
+
│ • Health check and monitoring │
|
| 181 |
+
└─────────────────┬───────────────────────────────────────────┘
|
| 182 |
+
│
|
| 183 |
+
┌─────────────────▼───────────────────────────────────────────┐
|
| 184 |
+
│ Grammar Correction Engine │
|
| 185 |
+
│ ┌─────────────────────┬─────────────────────┐ │
|
| 186 |
+
│ │ T5 Transformer │ ERRANT Analyzer │ │
|
| 187 |
+
│ │ • Tokenization │ • spaCy NLP │ │
|
| 188 |
+
│ │ • Seq2Seq Gen │ • Error taxonomy │ │
|
| 189 |
+
│ └─────────────────────┴─────────────────────┘ │
|
| 190 |
+
└─────────────────────────────────────────────────────────────┘
|
| 191 |
+
```
|
| 192 |
+
|
| 193 |
+
**Microservices Design:**
|
| 194 |
+
- **Frontend Layer (Streamlit):** User interaction and visualization
|
| 195 |
+
- **API Layer (FastAPI):** Stateless request processing
|
| 196 |
+
- **Model Layer (Transformers):** Core correction logic
|
| 197 |
+
- **Analysis Layer (ERRANT):** Linguistic error identification
|
| 198 |
+
|
| 199 |
+
---
|
| 200 |
+
|
| 201 |
+
## Installation
|
| 202 |
+
|
| 203 |
+
### Prerequisites
|
| 204 |
+
```bash
|
| 205 |
+
Python 3.8+
|
| 206 |
+
4GB RAM minimum
|
| 207 |
+
Internet connection (initial model download)
|
| 208 |
+
```
|
| 209 |
+
|
| 210 |
+
### Backend Setup (FastAPI + Transformers)
|
| 211 |
+
|
| 212 |
+
```bash
|
| 213 |
+
# Clone repository
|
| 214 |
+
git clone https://huggingface.co/spaces/Abdullahrasheed45/Grammify
|
| 215 |
+
cd Grammify
|
| 216 |
+
|
| 217 |
+
# Create virtual environment
|
| 218 |
+
python3 -m venv venv
|
| 219 |
+
source venv/bin/activate # Windows: venv\Scripts\activate
|
| 220 |
+
|
| 221 |
+
# Install Python dependencies
|
| 222 |
+
pip install -r requirements.txt
|
| 223 |
+
|
| 224 |
+
# Download spaCy language model
|
| 225 |
+
python -m spacy download en_core_web_sm
|
| 226 |
+
|
| 227 |
+
# Start FastAPI inference server (automatic on first run)
|
| 228 |
+
# Server launches at http://0.0.0.0:8080
|
| 229 |
+
|
| 230 |
+
# Start Streamlit application
|
| 231 |
+
streamlit run app.py
|
| 232 |
+
# Application available at http://localhost:8501
|
| 233 |
+
```
|
| 234 |
+
|
| 235 |
+
### Docker Deployment (Optional)
|
| 236 |
+
|
| 237 |
+
```bash
|
| 238 |
+
# Build Docker image
|
| 239 |
+
docker build -t grammify:latest .
|
| 240 |
+
|
| 241 |
+
# Run container
|
| 242 |
+
docker run -p 8501:8501 -p 8080:8080 grammify:latest
|
| 243 |
+
```
|
| 244 |
+
|
| 245 |
+
### Hugging Face Spaces Deployment
|
| 246 |
+
|
| 247 |
+
```bash
|
| 248 |
+
# Configure space metadata in README.md
|
| 249 |
+
---
|
| 250 |
+
title: Grammify
|
| 251 |
+
emoji: ⚡
|
| 252 |
+
colorFrom: gray
|
| 253 |
+
colorTo: blue
|
| 254 |
+
sdk: streamlit
|
| 255 |
+
app_file: app.py
|
| 256 |
+
pinned: false
|
| 257 |
+
license: apache-2.0
|
| 258 |
+
sdk_version: 1.51.0
|
| 259 |
+
---
|
| 260 |
+
|
| 261 |
+
# Push to Hugging Face Hub
|
| 262 |
+
git push https://huggingface.co/spaces/YOUR_USERNAME/Grammify main
|
| 263 |
+
```
|
| 264 |
+
|
| 265 |
+
---
|
| 266 |
+
|
| 267 |
+
## Usage
|
| 268 |
+
|
| 269 |
+
### Interactive Web Application
|
| 270 |
+
|
| 271 |
+
The system provides a Streamlit-based interface with the following workflow:
|
| 272 |
+
|
| 273 |
+
**Basic Correction:**
|
| 274 |
+
1. **Choose Example** - Select from 14 pre-loaded grammatical error examples
|
| 275 |
+
2. **Custom Input** - Enter your own sentence in the text input field
|
| 276 |
+
3. **Automatic Processing** - Correction triggers on non-empty input
|
| 277 |
+
4. **View Results** - Corrected text displayed in success banner
|
| 278 |
+
5. **Analyze Errors** - Expand "Show highlights" for color-coded annotations
|
| 279 |
+
6. **Inspect Edits** - Expand "Show edits" for detailed error breakdown
|
| 280 |
+
|
| 281 |
+
**Example Workflow:**
|
| 282 |
+
|
| 283 |
+
```python
|
| 284 |
+
# Input
|
| 285 |
+
"Matt like fish"
|
| 286 |
+
|
| 287 |
+
# Output (Success Banner)
|
| 288 |
+
"Matt likes fish"
|
| 289 |
+
|
| 290 |
+
# Highlights (Expandable)
|
| 291 |
+
Matt [like → likes (VERB:SVA)] fish
|
| 292 |
+
|
| 293 |
+
# Edit Table (Expandable)
|
| 294 |
+
| Type | Original | Pos | Corrected | Pos |
|
| 295 |
+
|------|----------|-----|-----------|-----|
|
| 296 |
+
| VERB:SVA | like | 1-2 | likes | 1-2 |
|
| 297 |
+
```
|
| 298 |
+
|
| 299 |
+
### API Integration
|
| 300 |
+
|
| 301 |
+
For programmatic access, use the FastAPI endpoint:
|
| 302 |
+
|
| 303 |
+
```python
|
| 304 |
+
import requests
|
| 305 |
+
|
| 306 |
+
# Make correction request
|
| 307 |
+
response = requests.get(
|
| 308 |
+
"http://0.0.0.0:8080/correct",
|
| 309 |
+
params={"input_sentence": "They're house is on fire"}
|
| 310 |
+
)
|
| 311 |
+
|
| 312 |
+
# Parse response
|
| 313 |
+
result = response.json()
|
| 314 |
+
corrected_text = result["scored_corrected_sentence"][0]
|
| 315 |
+
confidence = result["scored_corrected_sentence"][1]
|
| 316 |
+
|
| 317 |
+
print(f"Corrected: {corrected_text}")
|
| 318 |
+
# Output: "Their house is on fire"
|
| 319 |
+
```
|
| 320 |
+
|
| 321 |
+
### Python Library Integration
|
| 322 |
+
|
| 323 |
+
```python
|
| 324 |
+
# Direct model usage (without server)
|
| 325 |
+
from gramformer import Gramformer
|
| 326 |
+
|
| 327 |
+
# Initialize model
|
| 328 |
+
gf = Gramformer(models=1, use_gpu=False)
|
| 329 |
+
|
| 330 |
+
# Correct sentence
|
| 331 |
+
corrections = gf.correct(
|
| 332 |
+
"Feel free reach out to me",
|
| 333 |
+
max_candidates=1
|
| 334 |
+
)
|
| 335 |
+
|
| 336 |
+
for corrected in corrections:
|
| 337 |
+
print(corrected)
|
| 338 |
+
# Output: "Feel free to reach out to me"
|
| 339 |
+
```
|
| 340 |
+
|
| 341 |
+
---
|
| 342 |
+
|
| 343 |
+
## Technical Implementation
|
| 344 |
+
|
| 345 |
+
### File Structure
|
| 346 |
+
```
|
| 347 |
+
Grammify/
|
| 348 |
+
├── app.py # Main Streamlit application
|
| 349 |
+
├── InferenceServer.py # FastAPI inference server
|
| 350 |
+
├── requirements.txt # Python dependencies
|
| 351 |
+
├── .gitattributes # Git LFS configuration
|
| 352 |
+
└── README.md # This documentation
|
| 353 |
+
```
|
| 354 |
+
|
| 355 |
+
### Core Dependencies
|
| 356 |
+
|
| 357 |
+
**requirements.txt Analysis:**
|
| 358 |
+
|
| 359 |
+
```python
|
| 360 |
+
# NLP & Deep Learning
|
| 361 |
+
transformers # Hugging Face model hub
|
| 362 |
+
torch # PyTorch backend
|
| 363 |
+
sentencepiece # Tokenization
|
| 364 |
+
|
| 365 |
+
# Web Frameworks
|
| 366 |
+
streamlit # Interactive frontend
|
| 367 |
+
fastapi # API server
|
| 368 |
+
uvicorn # ASGI server
|
| 369 |
+
|
| 370 |
+
# Grammar Analysis
|
| 371 |
+
spacy # Linguistic processing
|
| 372 |
+
errant # Error annotation toolkit
|
| 373 |
+
nltk (>=3.6) # Natural language toolkit
|
| 374 |
+
|
| 375 |
+
# Utilities
|
| 376 |
+
st-annotated-text # Visual highlighting
|
| 377 |
+
bs4 # HTML parsing for annotations
|
| 378 |
+
pandas # Edit table generation
|
| 379 |
+
protobuf (>=3.19.0) # Model serialization
|
| 380 |
+
requests # HTTP client
|
| 381 |
+
```
|
| 382 |
+
|
| 383 |
+
### Key Code Components
|
| 384 |
+
|
| 385 |
+
#### 1. InferenceServer.py - Core Correction Logic
|
| 386 |
+
|
| 387 |
+
```python
|
| 388 |
+
# Model initialization
|
| 389 |
+
correction_model_tag = "custom_grammar_model"
|
| 390 |
+
correction_tokenizer = AutoTokenizer.from_pretrained(correction_model_tag)
|
| 391 |
+
correction_model = AutoModelForSeq2SeqLM.from_pretrained(correction_model_tag)
|
| 392 |
+
|
| 393 |
+
# Correction function
|
| 394 |
+
def correct(input_sentence, max_candidates=1):
|
| 395 |
+
correction_prefix = "gec: "
|
| 396 |
+
input_sentence = correction_prefix + input_sentence
|
| 397 |
+
input_ids = correction_tokenizer.encode(input_sentence, return_tensors='pt')
|
| 398 |
+
|
| 399 |
+
preds = correction_model.generate(
|
| 400 |
+
input_ids,
|
| 401 |
+
do_sample=True,
|
| 402 |
+
max_length=128,
|
| 403 |
+
top_k=50,
|
| 404 |
+
top_p=0.95,
|
| 405 |
+
early_stopping=True,
|
| 406 |
+
num_return_sequences=max_candidates
|
| 407 |
+
)
|
| 408 |
+
|
| 409 |
+
corrected = set()
|
| 410 |
+
for pred in preds:
|
| 411 |
+
corrected.add(correction_tokenizer.decode(pred, skip_special_tokens=True).strip())
|
| 412 |
+
|
| 413 |
+
return (corrected[0], 0) # Corrected sentence, dummy confidence
|
| 414 |
+
```
|
| 415 |
+
|
| 416 |
+
#### 2. app.py - Error Analysis Pipeline
|
| 417 |
+
|
| 418 |
+
```python
|
| 419 |
+
# ERRANT-based edit extraction
|
| 420 |
+
import errant
|
| 421 |
+
import spacy
|
| 422 |
+
|
| 423 |
+
# Initialize annotator
|
| 424 |
+
nlp = spacy.load("en_core_web_sm")
|
| 425 |
+
annotator = errant.load('en', nlp)
|
| 426 |
+
|
| 427 |
+
# Extract edits
|
| 428 |
+
orig = annotator.parse("Matt like fish")
|
| 429 |
+
cor = annotator.parse("Matt likes fish")
|
| 430 |
+
edits = annotator.annotate(orig, cor)
|
| 431 |
+
|
| 432 |
+
# Generate visual highlights and edit tables
|
| 433 |
+
for edit in edits:
|
| 434 |
+
print(f"{edit.o_str} → {edit.c_str} ({edit.type})")
|
| 435 |
+
```
|
| 436 |
+
|
| 437 |
+
---
|
| 438 |
+
|
| 439 |
+
## Applications
|
| 440 |
+
|
| 441 |
+
### Professional Writing
|
| 442 |
+
- Email composition and review
|
| 443 |
+
- Business document proofreading
|
| 444 |
+
- Report and proposal refinement
|
| 445 |
+
- Professional communication enhancement
|
| 446 |
+
|
| 447 |
+
### Academic Support
|
| 448 |
+
- Essay and paper proofreading
|
| 449 |
+
- Research document editing
|
| 450 |
+
- Thesis and dissertation review
|
| 451 |
+
- Assignment quality improvement
|
| 452 |
+
|
| 453 |
+
### Content Creation
|
| 454 |
+
- Blog post editing
|
| 455 |
+
- Social media content refinement
|
| 456 |
+
- Marketing copy correction
|
| 457 |
+
- Documentation writing assistance
|
| 458 |
+
|
| 459 |
+
### Language Learning
|
| 460 |
+
- Grammar error identification for ESL students
|
| 461 |
+
- Writing practice feedback
|
| 462 |
+
- Language proficiency development
|
| 463 |
+
- Real-time correction for learners
|
| 464 |
+
|
| 465 |
+
---
|
| 466 |
+
|
| 467 |
+
## Limitations
|
| 468 |
+
|
| 469 |
+
The system has several constraints and areas for future improvement:
|
| 470 |
+
|
| 471 |
+
1. **Context Window:** Limited to 128 tokens per sentence; longer texts require segmentation
|
| 472 |
+
|
| 473 |
+
2. **Domain Specificity:** Trained primarily on general English; may underperform on highly technical or specialized vocabulary
|
| 474 |
+
|
| 475 |
+
3. **Stylistic Preservation:** Focuses on grammatical correctness rather than maintaining authorial voice or stylistic choices
|
| 476 |
+
|
| 477 |
+
4. **Confidence Scoring:** Current implementation provides binary correction without probabilistic confidence metrics
|
| 478 |
+
|
| 479 |
+
5. **Multi-Sentence Context:** Processes sentences independently; may miss inter-sentence coherence issues
|
| 480 |
+
|
| 481 |
+
---
|
| 482 |
+
|
| 483 |
+
## Future Directions
|
| 484 |
+
|
| 485 |
+
### Technical Enhancements
|
| 486 |
+
- Integration of larger T5 models (T5-large, T5-3B) for improved accuracy
|
| 487 |
+
- Multi-sentence context processing for discourse-level corrections
|
| 488 |
+
- Confidence score implementation using model perplexity
|
| 489 |
+
- GPU acceleration for faster inference
|
| 490 |
+
- Batch processing API for document-level corrections
|
| 491 |
+
|
| 492 |
+
### Feature Expansion
|
| 493 |
+
- Style-aware corrections (formal vs. informal)
|
| 494 |
+
- Domain-specific fine-tuning (legal, medical, technical writing)
|
| 495 |
+
- Multi-language support beyond English
|
| 496 |
+
- Browser extension for real-time writing assistance
|
| 497 |
+
- Mobile application development
|
| 498 |
+
|
| 499 |
+
### Model Optimization
|
| 500 |
+
- Knowledge distillation for smaller deployment footprint
|
| 501 |
+
- Quantization-aware training for edge deployment
|
| 502 |
+
- Adaptive inference based on error density
|
| 503 |
+
- Custom fine-tuning on user-specific writing patterns
|
| 504 |
+
|
| 505 |
+
---
|
| 506 |
+
|
| 507 |
+
## Contributing
|
| 508 |
+
|
| 509 |
+
Contributions are welcome in the following areas:
|
| 510 |
+
|
| 511 |
+
**Technical Development:**
|
| 512 |
+
- Model architecture improvements and optimization
|
| 513 |
+
- Additional error type coverage and linguistic analysis
|
| 514 |
+
- Performance benchmarking and optimization
|
| 515 |
+
- Cross-platform deployment (mobile, browser extensions)
|
| 516 |
+
|
| 517 |
+
**Dataset Contributions:**
|
| 518 |
+
- Domain-specific grammar error corpora
|
| 519 |
+
- Multi-language grammar correction datasets
|
| 520 |
+
- Stylistic variation examples
|
| 521 |
+
- Real-world writing samples for evaluation
|
| 522 |
+
|
| 523 |
+
**Documentation:**
|
| 524 |
+
- Tutorial content and usage examples
|
| 525 |
+
- API documentation expansion
|
| 526 |
+
- Multi-language documentation
|
| 527 |
+
- Educational resources for grammar learning
|
| 528 |
+
|
| 529 |
+
---
|
| 530 |
+
|
| 531 |
+
## Acknowledgments
|
| 532 |
+
|
| 533 |
+
This project leverages several open-source tools and resources:
|
| 534 |
+
|
| 535 |
+
- **Gramformer Library** for the foundational grammar correction framework
|
| 536 |
+
- **Hugging Face Transformers** for model infrastructure and deployment
|
| 537 |
+
- **ERRANT Toolkit** (Bryant et al.) for error annotation and classification
|
| 538 |
+
- **spaCy Team** for linguistic processing capabilities
|
| 539 |
+
- **T5 Model Authors** (Google Research) for the transformer architecture
|
| 540 |
+
- **Hugging Face Spaces** for hosting and deployment infrastructure
|
| 541 |
+
|
| 542 |
+
---
|
| 543 |
+
|
| 544 |
+
## License
|
| 545 |
+
|
| 546 |
+
This project is released under the Apache License 2.0. See LICENSE file for details.
|
| 547 |
+
|
| 548 |
+
---
|
| 549 |
+
|
| 550 |
+
## Contact
|
| 551 |
+
|
| 552 |
+
**Developer:** Muhammad Abdullah Rasheed
|
| 553 |
+
|
| 554 |
+
[](https://techvibes360.com)
|
| 555 |
+
[](https://www.linkedin.com/in/abdullahrasheed-/)
|
| 556 |
+
[](mailto:abdullahrasheed45@gmail.com)
|
| 557 |
+
[](https://huggingface.co/Abdullahrasheed45)
|
| 558 |
+
|
| 559 |
+
For technical questions, collaboration opportunities, or NLP application discussions, please reach out via the channels above.
|
| 560 |
+
|
| 561 |
+
---
|
| 562 |
+
|
| 563 |
+
<div align="center">
|
| 564 |
+
|
| 565 |
+
**Enhancing written communication through accessible AI technology**
|
| 566 |
+
|
| 567 |
+
*"Clear communication begins with correct grammar"*
|
| 568 |
+
|
| 569 |
+
</div>
|
| 570 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|