gector-base-2020 / README.md
Meyssa's picture
Upload folder using huggingface_hub
ff9abe8 verified
---
language: en
license: apache-2.0
library_name: transformers.js
pipeline_tag: token-classification
tags:
- grammatical-error-correction
- gector
- onnx
- transformers.js
---
# GECToR Base 2020 (ONNX)
ONNX quantized version of the original GECToR model from Grammarly for browser-based grammatical error correction with [Transformers.js](https://huggingface.co/docs/transformers.js).
## Original Model
- **Source**: [Grammarly GECToR](https://github.com/grammarly/gector)
- **Paper**: [GECToR – Grammatical Error Correction: Tag, Not Rewrite](https://arxiv.org/abs/2005.12592) (BEA Workshop 2020)
- **Architecture**: RoBERTa-Base + token classification head
- **Parameters**: ~125M
## Conversion Details
- **Format**: ONNX
- **Quantization**: INT8 (dynamic quantization)
- **Size**: ~125MB
- **Converted by**: Manual export from PyTorch (AllenNLP format)
## How It Works
GECToR uses a token classification approach - instead of generating corrected text, it predicts edit operations for each token:
- `$KEEP` - Keep token unchanged
- `$DELETE` - Remove token
- `$REPLACE_word` - Replace with specific word
- `$APPEND_word` - Append word after token
- `$TRANSFORM_*` - Apply transformation (case, verb form, etc.)
The model runs iteratively (typically 2-3 passes) until no more edits are predicted.
## Usage with Transformers.js
```javascript
import { pipeline } from '@huggingface/transformers';
const classifier = await pipeline(
'token-classification',
'YOUR_USERNAME/gector-base-2020',
{ dtype: 'q8' }
);
const result = await classifier('He go to school yesterday.');
// Returns token predictions with edit tags
```
## Performance
Faster than the 2024 version with slightly lower accuracy. Good balance of speed and quality.
## License
Apache 2.0 (following original model license)