YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
Question Difficulty Classification Model
Introduction
This project aims to classify question answer pairs based on it's difficulty as easy,Medium or hard.You can pass a single question-answer pair seperated by comma or a list of question-answer pairs to the model. I have fine tuned bert-base-cased model with pre-trained parameter on Question-Answer Dataset by Carnegie Mellon University for this task
Table of Contents
Model Details
Model Description: This model is a fine-tune checkpoint of bert-base-cased,pretrained on a large corpus of English data in a self-supervised fashion. . This model reaches an accuracy of 95 on the dev set (for comparison, Bert bert-base-uncased version reaches an accuracy of 97).
- Developed by: Hugging Face
- Model Type: Text Classification
- Language(s): English
- License: Apache-2.0
- Parent Model: For more details about lBERT, we encourage users to check out this model card.
- Resources for more information:
Dependencies
- Transformer
- Python 3.7.13
- Numpy
How to use the model
- Import Essential Libraries โโ
from transformers import TFBertModel
from transformers import BertTokenizer
import tensorflow as tf
- Load the Model and Tokenizer
questionclassification_model = tf.keras.models.load_model(<path to the model>)
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
- Essential Functions
def prepare_data(input_text):
token = tokenizer.batch_encode_plus(
input_text,
max_length=256,
truncation=True,
padding='max_length',
add_special_tokens=True,
return_tensors='tf'
)
return {
'input_ids': tf.cast(token['input_ids'], tf.float64),
'attention_mask': tf.cast(token['attention_mask'], tf.float64)
}
def make_prediction(model, processed_data, classes=['Easy', 'Medium', 'Hard']):
outcls=[]
probs = model.predict(processed_data)
s=probs.argmax(axis=1)
for i in range(0,len(probs)):
outcls.append(classes[s[i]])
return outcls,probs;
3.Make predictions on the list of questions-answer pairs
input_text = ["What is gandhi commonly considered to be?,Father of the nation in india","What is the long-term warming of the planets overall temperature called?, Global Warming"]
processed_data = prepare_data(input_text)
result,prob = make_prediction(questionclassification_model, processed_data=processed_data)
for i in range (len(result)):
print(f"{result[i]} : {max(prob[i])}")
Risks, Limitations and Biases
- The predicted outputs have only very less easy category questions.
- 90% of the easy questions in the dataset are yes/no type questions.
- Very few datasets are available in public for question difficulty classification.
- People who are experts in a specific subject can only create a dataset for this task.Otherwise,The model will generate wrong results.
Training
Training Data
I used Question-Answer Dataset by Carnegie Mellon University for this task
Training Procedure
Fine-tuning hyper-parameters
- learning_rate = 1e-5
- decay = 1e-6
- optimizer = adam
- loss function = categorical cross entropy
- max_length = 256
- num_train_epochs = 10
- Downloads last month
- -