insensitive-language-bert / README.md

rrroby

Update README.md

132930a verified 7 months ago

preview code

raw

history blame contribute delete

2.44 kB

metadata

license: cc-by-nc-4.0
tags:
  - bert
  - text-classification
  - disability
  - inclusive-language
  - academic-writing
datasets:
  - assets
library_name: transformers
language:
  - en

Identifying Disability-Insensitive Language in Scholarly Works

Refer to the code repository and paper here: GitHub - Insensitive-Lang-Detection

Overview

This is a fine-tuned BERT model designed to detect potentially insensitive or non-inclusive language relating to disability, specifically in academic and scholarly writing.

The model helps promote more inclusive and respectful communication, aligning with social models of disability and various international guidelines.

Intended Use

Academic editors and reviewers who want to check abstracts and papers for disability-insensitive language.
Researchers studying accessibility, inclusive design, or language bias.
Automated writing support tools focused on scholarly communication.

Model Details

Architecture: BERT-base (uncased)
Fine-tuned on: Sentences from ASSETS conference papers (1994–2024) and organizational documents (ADA National Network, UN guidelines).
Labels:
- 0: Not insensitive
- 1: Insensitive

Training Data

Extracted and manually annotated sentences referencing disability-related terms.
Supported with data augmentation using OpenAI GPT-4o to balance underrepresented phrases.

License

This model is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.

This means you are free to share and adapt the model for non-commercial purposes, as long as appropriate credit is given. Commercial use is not permitted without explicit permission.

For details, see CC BY-NC 4.0.

How to Use

from transformers import BertForSequenceClassification, BertTokenizer

model = BertForSequenceClassification.from_pretrained("rrroby/insensitive-language-bert")
tokenizer = BertTokenizer.from_pretrained("rrroby/insensitive-language-bert")

text = "This participant was wheelchair-bound and..."
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
outputs = model(**inputs)
logits = outputs.logits
predicted_class = logits.argmax(-1).item()

print("Predicted class:", predicted_class)