gector-base-2020 / README.md

Meyssa

Upload folder using huggingface_hub

ff9abe8 verified 24 days ago

preview code

raw

history blame contribute delete

1.82 kB

metadata

language: en
license: apache-2.0
library_name: transformers.js
pipeline_tag: token-classification
tags:
  - grammatical-error-correction
  - gector
  - onnx
  - transformers.js

GECToR Base 2020 (ONNX)

ONNX quantized version of the original GECToR model from Grammarly for browser-based grammatical error correction with Transformers.js.

Original Model

Source: Grammarly GECToR
Paper: GECToR – Grammatical Error Correction: Tag, Not Rewrite (BEA Workshop 2020)
Architecture: RoBERTa-Base + token classification head
Parameters: ~125M

Conversion Details

Format: ONNX
Quantization: INT8 (dynamic quantization)
Size: ~125MB
Converted by: Manual export from PyTorch (AllenNLP format)

How It Works

GECToR uses a token classification approach - instead of generating corrected text, it predicts edit operations for each token:

$KEEP - Keep token unchanged
$DELETE - Remove token
$REPLACE_word - Replace with specific word
$APPEND_word - Append word after token
$TRANSFORM_* - Apply transformation (case, verb form, etc.)

The model runs iteratively (typically 2-3 passes) until no more edits are predicted.

Usage with Transformers.js

import { pipeline } from '@huggingface/transformers';

const classifier = await pipeline(
  'token-classification',
  'YOUR_USERNAME/gector-base-2020',
  { dtype: 'q8' }
);

const result = await classifier('He go to school yesterday.');
// Returns token predictions with edit tags

Performance

Faster than the 2024 version with slightly lower accuracy. Good balance of speed and quality.

License

Apache 2.0 (following original model license)