|
|
--- |
|
|
datasets: |
|
|
- pietrolesci/gpt3_nli |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
base_model: |
|
|
- distilbert/distilbert-base-uncased |
|
|
library_name: transformers |
|
|
tags: |
|
|
- nli |
|
|
- textclassification |
|
|
- distilbert |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
# Model Card for Model ID |
|
|
|
|
|
This is a Natural Language Inference (NLI) model built by fine-tuning DistilBERT-base-uncased on the GPT-3 NLI dataset. The model performs textual entailment classification - given two pieces of text (a premise and a hypothesis), it determines the logical relationship between them. |
|
|
|
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
What it does: |
|
|
|
|
|
Takes two text inputs: a premise (text_a) and a hypothesis (text_b) |
|
|
|
|
|
Classifies their relationship into one of three categories: |
|
|
|
|
|
**Entailment**: The hypothesis logically follows from the premise |
|
|
|
|
|
**Neutral**: The hypothesis is neither supported nor contradicted by the premise |
|
|
|
|
|
**Contradiction**: The hypothesis contradicts the premise |
|
|
|
|
|
Use Cases: |
|
|
|
|
|
- Reading comprehension tasks |
|
|
|
|
|
- Logical reasoning applications |
|
|
|
|
|
- Question-answering systems |
|
|
|
|
|
- Text coherence analysis |
|
|
|
|
|
- Information verification tasks |
|
|
|
|
|
**Architecture**: DistilBERT-based sequence classification model with 3 output classes, optimized for efficiency while maintaining strong performance on natural language understanding tasks. |
|
|
|
|
|
This type of model is fundamental for applications requiring understanding of logical relationships between text passages, such as fact-checking, automated reasoning, and reading comprehension systems. |
|
|
|
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
``` python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load the model and tokenizer |
|
|
model_name = "gulupgulup/distilbert_nli" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
``` |
|
|
|
|
|
### Usage Example |
|
|
|
|
|
``` python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load the model and tokenizer |
|
|
model_name = "gulupgulup/distilbert_nli" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
# Example premise and hypothesis |
|
|
premise = "A person is riding a bicycle in the park." |
|
|
hypothesis = "Someone is exercising outdoors." |
|
|
|
|
|
# Tokenize the input |
|
|
inputs = tokenizer(premise, hypothesis, return_tensors="pt", truncation=True, padding=True) |
|
|
|
|
|
# Make prediction |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
|
predicted_class = torch.argmax(predictions, dim=-1) |
|
|
|
|
|
# Get the predicted label |
|
|
id2label = {0: "entailment", 1: "neutral", 2: "contradiction"} |
|
|
predicted_label = id2label[predicted_class.item()] |
|
|
|
|
|
print(f"Premise: {premise}") |
|
|
print(f"Hypothesis: {hypothesis}") |
|
|
print(f"Predicted relationship: {predicted_label}") |
|
|
print(f"Confidence scores: {predictions.squeeze().tolist()}") |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
Dataset: \href{https://huggingface.co/datasets/pietrolesci/gpt3_nli}{pietrolesci/gpt3_nli} - A natural language inference dataset containing premise-hypothesis pairs with three-class labels (entailment, neutral, contradiction). The dataset consists of text pairs (text_a and text_b) where the model learns to determine the logical relationship between the premise and hypothesis.ed] |
|
|
|
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
**Base Model**: DistilBERT-base-uncased fine-tuned for sequence classification with 3 output labels for natural language inference. |
|
|
|
|
|
**Training Framework**: Hugging Face Transformers Trainer with Weights & Biases (wandb) integration for experiment tracking. |
|
|
|
|
|
**Data Split**: The original training set was split into train (81%), validation (9%), and test (10%) sets using stratified sampling to maintain label distribution balance across splits. |
|
|
|
|
|
|
|
|
#### Preprocessing [optional] |
|
|
|
|
|
Text pairs are tokenized using DistilBERT's tokenizer with truncation and padding applied. The label column is cast to ClassLabel format with three categories: entailment, neutral, and contradiction. |
|
|
|
|
|
**Data Handling**: Uses DataCollatorWithPadding for dynamic padding during training and tokenizes premise-hypothesis pairs jointly. |
|
|
|
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
**Learning Rate**: 1e-5 |
|
|
|
|
|
**Batch Size**: 64 (both training and evaluation) |
|
|
|
|
|
**Number of Epochs**: 5 |
|
|
|
|
|
**Weight Decay**: 0.01 |
|
|
|
|
|
**Max Gradient Norm**: 1.0 |
|
|
|
|
|
**Optimizer**: AdamW (default) |
|
|
|
|
|
**Evaluation Strategy**: Every epoch |
|
|
|
|
|
**Save Strategy**: Every epoch |
|
|
|
|
|
**Logging Steps**: 100 |
|
|
|
|
|
**Best Model Selection**: Based on validation accuracy (higher is better) |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Metrics |
|
|
|
|
|
**Accuracy**: Primary evaluation metric measuring the percentage of correctly classified premise-hypothesis pairs across all three NLI categories. |
|
|
|
|
|
**Precision** (Macro-averaged): Secondary metric calculating the average precision across all three classes (entailment, neutral, contradiction), giving equal weight to each class regardless of support. This metric is useful for understanding model performance on each NLI relationship type, especially important when dealing with potentially imbalanced class distributions. |
|
|
|
|
|
Both metrics are computed using the evaluate library and rounded to 3 decimal places for reporting. |