---
title: Grammify
emoji: ⚡
colorFrom: gray
colorTo: blue
sdk: streamlit
app_file: app.py
pinned: false
license: apache-2.0
sdk_version: 1.51.0
---
# Grammify - Intelligent Grammar Correction System
## AI-Powered Grammar Error Detection Using Transformer Models

<div align="center">

![Python](https://img.shields.io/badge/Python-3776AB?style=for-the-badge&logo=python&logoColor=white)
![FastAPI](https://img.shields.io/badge/FastAPI-009688?style=for-the-badge&logo=FastAPI&logoColor=white)
![Streamlit](https://img.shields.io/badge/Streamlit-FF4B4B?style=for-the-badge&logo=Streamlit&logoColor=white)
![Transformers](https://img.shields.io/badge/🤗%20Transformers-FFD21E?style=for-the-badge)

[![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow?style=for-the-badge)](https://huggingface.co/spaces/Abdullahrasheed45/Grammify)
[![Apache License](https://img.shields.io/badge/License-Apache%202.0-blue.svg?style=for-the-badge)](https://opensource.org/licenses/Apache-2.0)
[![Model](https://img.shields.io/badge/Model-T5--Based-orange?style=for-the-badge)]()

**NLP Application Project** | Deployed on Hugging Face Spaces

</div>

---

> An intelligent grammar correction application leveraging state-of-the-art Seq2Seq transformer models to detect and correct grammatical errors with real-time visual feedback and detailed linguistic error analysis.

## Overview

Grammify implements an advanced grammar correction system designed to enhance written communication across professional, academic, and personal contexts. Built on the Gramformer library and powered by a custom T5-based model, the system processes natural language input through a transformer architecture to identify and correct diverse grammatical errors with high accuracy and contextual awareness.

**Technical Context:** Full-stack NLP application integrating FastAPI microservices, Streamlit frontend, and Hugging Face Transformers for production-grade grammar correction.

---

## Key Features

### Transformer-Based Architecture
- **Seq2Seq Deep Learning:** T5-based encoder-decoder architecture processes grammatical correction as sequence-to-sequence translation
- **Production Deployment:** FastAPI inference server with uvicorn workers for concurrent request handling
- **Real-time Processing:** ~2-3 second inference latency per sentence

### Grammar Error Coverage
The system corrects 15+ grammatical error types with high linguistic precision:

| Error Type | Description | Example Correction |
|------------|-------------|-------------------|
| Subject-Verb Agreement | Verb conjugation matching subject | "Matt like fish" → "Matt likes fish" |
| Verb Tense Consistency | Temporal coherence in narratives | "I walk to the store and I bought milk" → "I walked to the store and bought milk" |
| Article Usage | Determiner selection (a/an/the) | Missing or incorrect articles |
| Pronoun Errors | Possessive vs. contraction | "They're house" → "Their house" |
| Preposition Selection | Contextual preposition choice | "Feel free reach out" → "Feel free to reach out" |
| Word Form | Part-of-speech corrections | "Life is shortest" → "Life is short" |
| Auxiliary Verbs | Modal and helping verb errors | "what be the reason" → "what is the reason" |
| Gerund/Infinitive | Verb form following verbs | "everyone leave" → "everyone leaving" |
| Pronoun Case | Subject/object pronoun usage | "How is you?" → "How are you?" |
| Punctuation | Apostrophes, commas, periods | "Its going to rain" → "It's going to rain" |

### Interactive Visualization
- **Color-Coded Annotations:** Visual highlighting system distinguishes error types
  - **Red (Deletion):** Words/characters to remove
  - **Green (Addition):** Missing words/characters
  - **Yellow (Change):** Word replacements or modifications
- **Detailed Edit Tables:** Structured breakdown of each grammatical correction with token positions
- **Linguistic Error Classification:** ERRANT-based error type identification (morphology, syntax, orthography)

---

## System Performance

### Model Specifications
```
Model Architecture:     T5-based Seq2Seq Transformer
Model Tag:             Custom fine-tuned model
Tokenizer:             AutoTokenizer (SentencePiece)
Maximum Sequence:      128 tokens
Sampling Strategy:     Top-k (50) + Top-p (0.95)
Temperature:           1.0 (diverse generation)
Device:                CPU (GPU compatible)
Inference Latency:     ~2-3 seconds per sentence
Model Size:            ~220MB (full precision)
```

### Generation Parameters
```python
Generation Configuration:
├── do_sample: True          # Stochastic sampling enabled
├── max_length: 128          # Maximum output tokens
├── top_k: 50                # Top-k sampling threshold
├── top_p: 0.95             # Nucleus sampling probability
├── early_stopping: True     # Stop at first EOS token
└── num_return_sequences: 1  # Single best candidate
```

### System Architecture Performance
| Component | Performance Metric |
|-----------|-------------------|
| FastAPI Server | Multi-worker uvicorn deployment |
| Startup Time | ~15-20 seconds (model loading) |
| Concurrent Requests | Handles 2+ simultaneous corrections |
| Port Configuration | 8080 (inference server) |
| Health Check | Socket-based port availability monitoring |

---

## Technical Architecture

### Seq2Seq Transformer Pipeline

```python
Input Text: "what be the reason for everyone leave the company"
    ↓
Preprocessing: Add task prefix → "gec: what be the reason..."
    ↓
Tokenization: SentencePiece encoding → Token IDs
    ↓
T5 Encoder: Contextualized embeddings (512 dimensions)
    ↓
T5 Decoder: Autoregressive generation with beam search
    ↓
Sampling: Top-k (50) + Top-p (0.95) filtering
    ↓
Detokenization: Token IDs → "what is the reason for everyone leaving the company"
    ↓
Post-processing: Remove special tokens, strip whitespace
    ↓
Output: Corrected sentence + confidence score
```

**Key Technical Design:**
- **Task Prefix:** `"gec: "` signals grammar error correction task to T5 model
- **Encoder-Decoder:** Bidirectional attention in encoder, causal attention in decoder
- **Sampling Strategy:** Balances diversity (top-p) and quality (top-k) for natural corrections
- **Early Stopping:** Terminates generation at first end-of-sequence token for efficiency

### Error Analysis Pipeline

```python
Original Sentence → spaCy Tokenization
                          ↓
Corrected Sentence → spaCy Tokenization
                          ↓
              ERRANT Alignment
                          ↓
        Edit Extraction & Classification
                          ↓
    ┌──────────────┬──────────────┐
    │  Highlights  │  Edit Table  │
    │  (Visual)    │  (Tabular)   │
    └──────────────┴──────────────┘
```

**ERRANT Framework Integration:**
- **Parse Trees:** spaCy dependency parsing for syntactic structure
- **Token Alignment:** Levenshtein-based sequence alignment
- **Edit Operations:** Insertions, deletions, substitutions, and transpositions
- **Linguistic Classification:** Maps edits to error taxonomy (VERB:TENSE, DET, PREP, etc.)

### System Architecture

```
┌─────────────────────────────────────────────────────────────┐
│                    Streamlit Frontend                       │
│         • Interactive text input interface                  │
│         • Pre-loaded example selector                       │
│         • Visual error highlighting display                 │
│         • Expandable edit table components                  │
└─────────────────┬───────────────────────────────────────────┘
                  │ HTTP POST
┌─────────────────▼───────────────────────────────────────────┐
│              FastAPI Inference Server                       │
│         • uvicorn ASGI server (port 8080)                   │
│         • Multi-worker request handling                     │
│         • Health check and monitoring                       │
└─────────────────┬───────────────────────────────────────────┘
                  │
┌─────────────────▼───────────────────────────────────────────┐
│           Grammar Correction Engine                         │
│    ┌─────────────────────┬─────────────────────┐            │
│    │   T5 Transformer    │   ERRANT Analyzer   │            │
│    │   • Tokenization    │   • spaCy NLP       │            │
│    │   • Seq2Seq Gen     │   • Error taxonomy  │            │
│    └─────────────────────┴─────────────────────┘            │
└─────────────────────────────────────────────────────────────┘
```

**Microservices Design:**
- **Frontend Layer (Streamlit):** User interaction and visualization
- **API Layer (FastAPI):** Stateless request processing
- **Model Layer (Transformers):** Core correction logic
- **Analysis Layer (ERRANT):** Linguistic error identification

---

## Installation

### Prerequisites
```bash
Python 3.8+
4GB RAM minimum
Internet connection (initial model download)
```

### Backend Setup (FastAPI + Transformers)

```bash
# Clone repository
git clone https://huggingface.co/spaces/Abdullahrasheed45/Grammify
cd Grammify

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install Python dependencies
pip install -r requirements.txt

# Download spaCy language model
python -m spacy download en_core_web_sm

# Start FastAPI inference server (automatic on first run)
# Server launches at http://0.0.0.0:8080

# Start Streamlit application
streamlit run app.py
# Application available at http://localhost:8501
```

### Docker Deployment (Optional)

```bash
# Build Docker image
docker build -t grammify:latest .

# Run container
docker run -p 8501:8501 -p 8080:8080 grammify:latest
```

### Hugging Face Spaces Deployment

```bash
# Configure space metadata in README.md
---
title: Grammify
emoji: ⚡
colorFrom: gray
colorTo: blue
sdk: streamlit
app_file: app.py
pinned: false
license: apache-2.0
sdk_version: 1.51.0
---

# Push to Hugging Face Hub
git push https://huggingface.co/spaces/YOUR_USERNAME/Grammify main
```

---

## Usage

### Interactive Web Application

The system provides a Streamlit-based interface with the following workflow:

**Basic Correction:**
1. **Choose Example** - Select from 14 pre-loaded grammatical error examples
2. **Custom Input** - Enter your own sentence in the text input field
3. **Automatic Processing** - Correction triggers on non-empty input
4. **View Results** - Corrected text displayed in success banner
5. **Analyze Errors** - Expand "Show highlights" for color-coded annotations
6. **Inspect Edits** - Expand "Show edits" for detailed error breakdown

**Example Workflow:**

```python
# Input
"Matt like fish"

# Output (Success Banner)
"Matt likes fish"

# Highlights (Expandable)
Matt [like → likes (VERB:SVA)] fish

# Edit Table (Expandable)
| Type | Original | Pos | Corrected | Pos |
|------|----------|-----|-----------|-----|
| VERB:SVA | like | 1-2 | likes | 1-2 |
```

### API Integration

For programmatic access, use the FastAPI endpoint:

```python
import requests

# Make correction request
response = requests.get(
    "http://0.0.0.0:8080/correct",
    params={"input_sentence": "They're house is on fire"}
)

# Parse response
result = response.json()
corrected_text = result["scored_corrected_sentence"][0]
confidence = result["scored_corrected_sentence"][1]

print(f"Corrected: {corrected_text}")
# Output: "Their house is on fire"
```

### Python Library Integration

```python
# Direct model usage (without server)
from gramformer import Gramformer

# Initialize model
gf = Gramformer(models=1, use_gpu=False)

# Correct sentence
corrections = gf.correct(
    "Feel free reach out to me",
    max_candidates=1
)

for corrected in corrections:
    print(corrected)
# Output: "Feel free to reach out to me"
```

---

## Technical Implementation

### File Structure
```
Grammify/
├── app.py                    # Main Streamlit application
├── InferenceServer.py        # FastAPI inference server
├── requirements.txt          # Python dependencies
├── .gitattributes           # Git LFS configuration
└── README.md                # This documentation
```

### Core Dependencies

**requirements.txt Analysis:**

```python
# NLP & Deep Learning
transformers           # Hugging Face model hub
torch                 # PyTorch backend
sentencepiece         # Tokenization

# Web Frameworks
streamlit             # Interactive frontend
fastapi              # API server
uvicorn              # ASGI server

# Grammar Analysis
spacy                # Linguistic processing
errant               # Error annotation toolkit
nltk (>=3.6)         # Natural language toolkit

# Utilities
st-annotated-text    # Visual highlighting
bs4                  # HTML parsing for annotations
pandas               # Edit table generation
protobuf (>=3.19.0)  # Model serialization
requests             # HTTP client
```

### Key Code Components

#### 1. InferenceServer.py - Core Correction Logic

```python
# Model initialization
correction_model_tag = "custom_grammar_model"
correction_tokenizer = AutoTokenizer.from_pretrained(correction_model_tag)
correction_model = AutoModelForSeq2SeqLM.from_pretrained(correction_model_tag)

# Correction function
def correct(input_sentence, max_candidates=1):
    correction_prefix = "gec: "
    input_sentence = correction_prefix + input_sentence
    input_ids = correction_tokenizer.encode(input_sentence, return_tensors='pt')
    
    preds = correction_model.generate(
        input_ids,
        do_sample=True,
        max_length=128,
        top_k=50,
        top_p=0.95,
        early_stopping=True,
        num_return_sequences=max_candidates
    )
    
    corrected = set()
    for pred in preds:
        corrected.add(correction_tokenizer.decode(pred, skip_special_tokens=True).strip())
    
    return (corrected[0], 0)  # Corrected sentence, dummy confidence
```

#### 2. app.py - Error Analysis Pipeline

```python
# ERRANT-based edit extraction
import errant
import spacy

# Initialize annotator
nlp = spacy.load("en_core_web_sm")
annotator = errant.load('en', nlp)

# Extract edits
orig = annotator.parse("Matt like fish")
cor = annotator.parse("Matt likes fish")
edits = annotator.annotate(orig, cor)

# Generate visual highlights and edit tables
for edit in edits:
    print(f"{edit.o_str} → {edit.c_str} ({edit.type})")
```

---

## Applications

### Professional Writing
- Email composition and review
- Business document proofreading
- Report and proposal refinement
- Professional communication enhancement

### Academic Support
- Essay and paper proofreading
- Research document editing
- Thesis and dissertation review
- Assignment quality improvement

### Content Creation
- Blog post editing
- Social media content refinement
- Marketing copy correction
- Documentation writing assistance

### Language Learning
- Grammar error identification for ESL students
- Writing practice feedback
- Language proficiency development
- Real-time correction for learners

---

## Limitations

The system has several constraints and areas for future improvement:

1. **Context Window:** Limited to 128 tokens per sentence; longer texts require segmentation

2. **Domain Specificity:** Trained primarily on general English; may underperform on highly technical or specialized vocabulary

3. **Stylistic Preservation:** Focuses on grammatical correctness rather than maintaining authorial voice or stylistic choices

4. **Confidence Scoring:** Current implementation provides binary correction without probabilistic confidence metrics

5. **Multi-Sentence Context:** Processes sentences independently; may miss inter-sentence coherence issues

---

## Future Directions

### Technical Enhancements
- Integration of larger T5 models (T5-large, T5-3B) for improved accuracy
- Multi-sentence context processing for discourse-level corrections
- Confidence score implementation using model perplexity
- GPU acceleration for faster inference
- Batch processing API for document-level corrections

### Feature Expansion
- Style-aware corrections (formal vs. informal)
- Domain-specific fine-tuning (legal, medical, technical writing)
- Multi-language support beyond English
- Browser extension for real-time writing assistance
- Mobile application development

### Model Optimization
- Knowledge distillation for smaller deployment footprint
- Quantization-aware training for edge deployment
- Adaptive inference based on error density
- Custom fine-tuning on user-specific writing patterns

---

## Contributing

Contributions are welcome in the following areas:

**Technical Development:**
- Model architecture improvements and optimization
- Additional error type coverage and linguistic analysis
- Performance benchmarking and optimization
- Cross-platform deployment (mobile, browser extensions)

**Dataset Contributions:**
- Domain-specific grammar error corpora
- Multi-language grammar correction datasets
- Stylistic variation examples
- Real-world writing samples for evaluation

**Documentation:**
- Tutorial content and usage examples
- API documentation expansion
- Multi-language documentation
- Educational resources for grammar learning

---

## Acknowledgments

This project leverages several open-source tools and resources:

- **Gramformer Library** for the foundational grammar correction framework
- **Hugging Face Transformers** for model infrastructure and deployment
- **ERRANT Toolkit** (Bryant et al.) for error annotation and classification
- **spaCy Team** for linguistic processing capabilities
- **T5 Model Authors** (Google Research) for the transformer architecture
- **Hugging Face Spaces** for hosting and deployment infrastructure

---

## License

This project is released under the Apache License 2.0. See LICENSE file for details.

---

## Contact

**Developer:** Muhammad Abdullah Rasheed

[![Portfolio](https://img.shields.io/badge/Portfolio-000000?style=for-the-badge&logo=About.me&logoColor=white)](https://techvibes360.com)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/abdullahrasheed-/)
[![Email](https://img.shields.io/badge/Email-D14836?style=for-the-badge&logo=gmail&logoColor=white)](mailto:abdullahrasheed45@gmail.com)
[![Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-Profile-yellow?style=for-the-badge)](https://huggingface.co/Abdullahrasheed45)

For technical questions, collaboration opportunities, or NLP application discussions, please reach out via the channels above.

---

<div align="center">

**Enhancing written communication through accessible AI technology**

*"Clear communication begins with correct grammar"*

</div>
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference