--- language: en license: mit tags: - bert - text-classification - sentiment-analysis - english - imdb datasets: - imdb metrics: - accuracy model-index: - name: finetuned-bert-imdb results: - task: type: text-classification name: Text Classification dataset: type: imdb name: IMDB metrics: - type: accuracy value: 0.802 name: Test Accuracy pipeline_tag: text-classification --- # BERT Fine-tuned for IMDB Sentiment Analysis This model is a fine-tuned version of `bert-base-uncased` on the IMDB movie reviews dataset for sentiment analysis (binary classification). It can predict whether a movie review is positive or negative. ## Model description - **Model type:** BERT (bert-base-uncased) - **Language:** English - **Task:** Sentiment Analysis - **Training Dataset:** IMDB Movie Reviews - **License:** MIT ### Training Hyperparameters The model was trained with the following parameters: - Learning rate: 2e-5 - Batch size: 16 - Number of epochs: 3 - Weight decay: 0.01 - Maximum sequence length: 64 - Training samples: 2000 (balanced: 1000 positive, 1000 negative) - Optimizer: AdamW ### Training Results - **Accuracy on test set:** 80.2% - **Training loss:** 0.381 ## Intended uses & limitations ### Intended uses This model is designed for: - Sentiment analysis of movie reviews and similar text content - Binary classification (positive/negative) of English text - Research and educational purposes ### Limitations - The model is trained on movie reviews and might not perform as well on other domains - Limited to English language text - Maximum input length is 512 tokens - May exhibit biases present in the training data ## How to use Here's how to use the model with PyTorch: ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch # Load model and tokenizer model = AutoModelForSequenceClassification.from_pretrained("xanderIV/finetuned-bert-imdb") tokenizer = AutoTokenizer.from_pretrained("xanderIV/finetuned-bert-imdb") # Prepare your text texts = [ "This movie was fantastic! Great acting and amazing plot.", "Terrible waste of time. Poor acting and confusing story." ] # Tokenize the input inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True) # Get predictions with torch.no_grad(): outputs = model(**inputs) predictions = torch.softmax(outputs.logits, dim=1) labels = torch.argmax(predictions, dim=1) # Process results for text, pred, probs in zip(texts, labels, predictions): sentiment = "positive" if pred.item() == 1 else "negative" confidence = probs[pred].item() * 100 print(f"\nText: {text}") print(f"Sentiment: {sentiment} (confidence: {confidence:.1f}%)") ``` ### Example Outputs ``` Text: This movie was fantastic! Great acting and amazing plot. Sentiment: positive (confidence: 97.7%) Text: Terrible waste of time. Poor acting and confusing story. Sentiment: negative (confidence: 98.4%) ``` ## Training Data The model was fine-tuned on a subset of the IMDB dataset: - 2000 training examples (1000 positive, 1000 negative reviews) - 500 test examples - Reviews were truncated to 64 tokens to optimize training speed ## Evaluation Results The model achieved the following results on the test set: - Accuracy: 80.2% - Loss: 0.482 ## Bias & Limitations This model may exhibit biases inherent to the IMDB dataset: - Movie-specific vocabulary and expressions - Cultural biases in movie reviews - English-language bias - Internet and entertainment domain bias ## Citation If you use this model, please cite: ```bibtex @misc{finetuned-bert-imdb, author = {xanderIV}, title = {BERT Fine-tuned for IMDB Sentiment Analysis}, year = {2025}, publisher = {Hugging Face}, journal = {Hugging Face Model Hub}, howpublished = {\url{https://huggingface.co/xanderIV/finetuned-bert-imdb}} } ```