| # DistilBERT Quantized Model for Sentiment Analysis on Yelp Polarity Dataset | |
| This repository hosts a quantized version of the DistilBERT model, fine-tuned for sentiment analysis tasks on the Yelp Polarity dataset. The model has been optimized using post-training quantization to make it suitable for resource-constrained environments while maintaining high accuracy. | |
| ## Model Details | |
| - **Model Architecture:** DistilBERT Base Uncased | |
| - **Task:** Sentiment Analysis | |
| - **Dataset:** Yelp Polarity | |
| - **Quantization:** Dynamic Quantization (INT8 on Linear layers) | |
| - **Fine-tuning Framework:** Hugging Face Transformers | |
| --- | |
| ### Installation | |
| ```sh | |
| pip install transformers datasets evaluate scikit-learn torch | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| import torch | |
| # Load trained model and tokenizer | |
| model = AutoModelForSequenceClassification.from_pretrained("./results") | |
| tokenizer = AutoTokenizer.from_pretrained("./results") | |
| # Set model to eval mode | |
| model.eval() | |
| # 10 Sample review texts | |
| sample_texts = [ | |
| "The food was absolutely wonderful!", | |
| "Terrible experience. I will never come back.", | |
| "Average service, but the food was decent.", | |
| "I loved the ambiance and the staff was super friendly!", | |
| "Worst food I've had in a long time.", | |
| "Highly recommend this place for a date night.", | |
| "The waiter was rude and the food was cold.", | |
| "Amazing pizza, will order again!", | |
| "They took too long to serve and it was overpriced.", | |
| "Best customer service and delicious desserts!" | |
| ] | |
| # Predict and print results | |
| for text in sample_texts: | |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512) | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| prediction = torch.argmax(outputs.logits, dim=-1).item() | |
| sentiment = "Positive" if prediction == 1 else "Negative" | |
| print(f"Text: {text}\\nPredicted Sentiment: {sentiment}\\n") | |
| Quantization | |
| import os | |
| import torch | |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer | |
| # Load the fine-tuned model | |
| model = AutoModelForSequenceClassification.from_pretrained("./results") | |
| # Apply dynamic quantization | |
| quantized_model = torch.quantization.quantize_dynamic( | |
| model, | |
| {torch.nn.Linear}, | |
| dtype=torch.qint8 | |
| ) | |
| # Define path | |
| quantized_model_path = "./results/quantized_model" | |
| # Create directory if it doesn't exist | |
| os.makedirs(quantized_model_path, exist_ok=True) | |
| # Save the quantized model weights | |
| torch.save(quantized_model.state_dict(), f"{quantized_model_path}/pytorch_model.bin") | |
| # Save config and tokenizer | |
| model.config.save_pretrained(quantized_model_path) | |
| tokenizer = AutoTokenizer.from_pretrained("./results") | |
| tokenizer.save_pretrained(quantized_model_path) | |
| print("β Quantized model saved at:", quantized_model_path) | |
| Performance Metrics | |
| Accuracy: Approx. 95% on Yelp Polarity Test Subset | |
| Precision, Recall, F1-score: Computed during evaluation using scikit-learn | |
| Fine-Tuning Details | |
| Dataset | |
| Source: Yelp Polarity (via Hugging Face Datasets) | |
| Train samples used: 50,000 | |
| Test samples used: 10,000 | |
| Training | |
| Number of epochs: 3 | |
| Batch size: 16 | |
| Evaluation strategy: Per epoch | |
| Learning rate: 2e-5 | |
| Weight decay: 0.01 | |
| Repository-Structure | |
| . | |
| βββ results/ # Contains fine-tuned and quantized model files | |
| β βββ pytorch_model.bin # Quantized model weights | |
| β βββ config.json # Model config | |
| β βββ tokenizer/ # Tokenizer files | |
| βββ logs/ # Training logs | |
| βββ README.md # Model documentation | |
| Limitations | |
| The model is trained only on Yelp reviews and may not generalize to other domains. | |
| Post-training quantization may cause minor accuracy degradation compared to full-precision models. | |