Instructions to use gtrivedi/style-guide-checker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use gtrivedi/style-guide-checker with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("gtrivedi/style-guide-checker") model = AutoModelForSeq2SeqLM.from_pretrained("gtrivedi/style-guide-checker") - Notebooks
- Google Colab
- Kaggle
Model Card for Model ID
Summary:
A fine-tuned Flan-T5 model that rewrites English text to conform to the IBM Style Guide by correcting the top three common writing mistakes (passive voice, nominalizations, and verbosity).
Model Details
Model Description
This model was fine-tuned on Google’s flan-t5-base checkpoint to act as an IBM Style Guide writing assistant. Given an input sentence or paragraph, it outputs a rewrite that enforces active voice, reduces nominalizations, and removes unnecessary verbosity according to IBM’s style standards.
- Developed by: Gaurav Trivedi
- Model type: Seq2Seq (text-to-text)
- Language(s): English
- License: Apache-2.0
- Finetuned from: google/flan-t5-base
Model Sources [optional]
Uses
Direct Use
Direct Use
Use this model to rewrite or proofread English technical text to match IBM Style Guide rules. Example:
from transformers import pipeline
rewriter = pipeline(
"text2text-generation",
model="gtrivedi/ibm-style-guide-base",
tokenizer="gtrivedi/ibm-style-guide-base"
)
output = rewriter(
"The deployment logs were stored securely by the operations team.",
max_length=64
)
print(output[0]['generated_text'])
Out-of-Scope Use
Non-English text
Very long documents (>512 tokens) without chunking
Creative or highly idiomatic rewriting beyond technical style guidelines
Bias, Risks, and Limitations
The model enforces IBM style rules but may occasionally alter nuance or omit context. Users should always review critical outputs for meaning and accuracy.
Recommendations
- Validate important passages manually.
- Use alongside human editors in production workflows.
How to Get Started with the Model
Installation and Usage
Install the library and load the model:
pip install transformers
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained("gtrivedi/ibm-style-guide-base")
model = AutoModelForSeq2SeqLM.from_pretrained("gtrivedi/ibm-style-guide-base")
rewriter = pipeline("text2text-generation", model=model, tokenizer=tokenizer)
print(
rewriter(
"When the report was reviewed by the team, no issues were found.",
max_length=128
)[0]["generated_text"]
)
Training Details
Training Data
A custom dataset of ~10000 sentence pairs covering:
- Passive-voice → Active-voice
- Nominalization reductions
- Verbosity simplifications
The full dataset lives in the data/ folder of this repo (train.csv & validation.csv).
Training Procedure
Fine-tuned with the 🤗 Trainer API:
- Epochs: 3
- Batch size: 16
- Learning rate: 5 × 10⁻⁵
- Precision: fp16
Evaluation
Test Data & Metrics
Held-out set of 500 examples:
- BLEU: 45.7
- ROUGE-L: 0.82
Environmental Impact
Estimated via the ML CO₂ Impact calculator:
- Hardware: NVIDIA V100
- Training time: ~2 hours
- Estimated CO₂: 2.1 kg CO₂eq
Citation
@misc{trivedi2025ibm,
title = {Fine-tuned Flan-T5 for IBM Style Guide},
author = {Gaurav Trivedi},
year = {2025},
howpublished = {\url{https://huggingface.co/gtrivedi/ibm-style-guide-base}}
}
Model Card Authors
Gaurav Trivedi (@gtrivedi)
- Downloads last month
- 4