|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
library_name: transformers.js |
|
|
pipeline_tag: token-classification |
|
|
tags: |
|
|
- grammatical-error-correction |
|
|
- gector |
|
|
- onnx |
|
|
- transformers.js |
|
|
--- |
|
|
|
|
|
# GECToR Base 2020 (ONNX) |
|
|
|
|
|
ONNX quantized version of the original GECToR model from Grammarly for browser-based grammatical error correction with [Transformers.js](https://huggingface.co/docs/transformers.js). |
|
|
|
|
|
## Original Model |
|
|
|
|
|
- **Source**: [Grammarly GECToR](https://github.com/grammarly/gector) |
|
|
- **Paper**: [GECToR – Grammatical Error Correction: Tag, Not Rewrite](https://arxiv.org/abs/2005.12592) (BEA Workshop 2020) |
|
|
- **Architecture**: RoBERTa-Base + token classification head |
|
|
- **Parameters**: ~125M |
|
|
|
|
|
## Conversion Details |
|
|
|
|
|
- **Format**: ONNX |
|
|
- **Quantization**: INT8 (dynamic quantization) |
|
|
- **Size**: ~125MB |
|
|
- **Converted by**: Manual export from PyTorch (AllenNLP format) |
|
|
|
|
|
## How It Works |
|
|
|
|
|
GECToR uses a token classification approach - instead of generating corrected text, it predicts edit operations for each token: |
|
|
|
|
|
- `$KEEP` - Keep token unchanged |
|
|
- `$DELETE` - Remove token |
|
|
- `$REPLACE_word` - Replace with specific word |
|
|
- `$APPEND_word` - Append word after token |
|
|
- `$TRANSFORM_*` - Apply transformation (case, verb form, etc.) |
|
|
|
|
|
The model runs iteratively (typically 2-3 passes) until no more edits are predicted. |
|
|
|
|
|
## Usage with Transformers.js |
|
|
|
|
|
```javascript |
|
|
import { pipeline } from '@huggingface/transformers'; |
|
|
|
|
|
const classifier = await pipeline( |
|
|
'token-classification', |
|
|
'YOUR_USERNAME/gector-base-2020', |
|
|
{ dtype: 'q8' } |
|
|
); |
|
|
|
|
|
const result = await classifier('He go to school yesterday.'); |
|
|
// Returns token predictions with edit tags |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
Faster than the 2024 version with slightly lower accuracy. Good balance of speed and quality. |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 (following original model license) |
|
|
|