kevinkyi
/

Homework2_Finetuning

@@ -1,71 +1,84 @@
 ---
 library_name: transformers
-license: apache-2.0
-base_model: distilbert-base-uncased
 tags:
-- generated_from_trainer
-metrics:
-- accuracy
-- precision
-- recall
-- f1
-model-index:
-- name: Homework2_Finetuning
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# Homework2_Finetuning
-This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the None dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.0030
-- Accuracy: 1.0
-- Precision: 1.0
-- Recall: 1.0
-- F1: 1.0
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 3e-05
-- train_batch_size: 16
-- eval_batch_size: 16
-- seed: 42
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 5
-### Training results
-| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | F1     |
-|:-------------:|:-----:|:----:|:---------------:|:--------:|:---------:|:------:|:------:|
-| 0.1158        | 1.0   | 55   | 0.0315          | 0.9909   | 0.9821    | 1.0    | 0.9910 |
-| 0.0043        | 2.0   | 110  | 0.0083          | 1.0      | 1.0       | 1.0    | 1.0    |
-| 0.002         | 3.0   | 165  | 0.0017          | 1.0      | 1.0       | 1.0    | 1.0    |
-| 0.0014        | 4.0   | 220  | 0.0012          | 1.0      | 1.0       | 1.0    | 1.0    |
-### Framework versions
-- Transformers 4.45.2
-- Pytorch 2.8.0+cu126
-- Datasets 2.21.0
-- Tokenizers 0.20.3

 ---
 library_name: transformers
+pipeline_tag: text-classification
+license: mit
 tags:
+  - distilbert
+  - sentiment
+  - football
+  - fine-tuning
+model_name: DistilBERT Football Sentiment (Positive vs Negative)
+language:
+  - en
 ---
+# DistilBERT Football Sentiment — Positive vs Negative
+## Purpose
+Fine-tune a compact transformer (DistilBERT) to classify short football-related comments as **positive (1)** or **negative (0)**. This supports a course assignment on text modeling and evaluation.
+## Dataset
+- **Source:** `james-kramer/football_news` on Hugging Face.
+- **Schema:** `text` (string), `label` (0/1).
+- **Task:** Binary sentiment classification (`0=negative`, `1=positive`).
+- **Splits:** Stratified **80/10/10** (train/val/test) created in this notebook.
+- **Cleaning:** Strip text, drop empty/NA rows.
+## Preprocessing
+- **Tokenizer:** `distilbert-base-uncased` (uncased), `max_length=256`, truncation.
+- **Label mapping:** `{0: "negative", 1: "positive"}`.
+## Training Setup
+- **Base model:** `distilbert-base-uncased`
+- **Epochs:** 5
+- **Batch size:** 16
+- **Learning rate:** 3e-05
+- **Weight decay:** 0.01
+- **Warmup ratio:** 0.1
+- **Early stopping:** patience = 2 (monitor F1 on validation)
+- **Seed:** 42
+- **Hardware:** Google Colab (GPU)
+## Metrics (Held-out Test)
+```json
+{
+  "eval_loss": 0.0029852271545678377,
+  "eval_accuracy": 1.0,
+  "eval_precision": 1.0,
+  "eval_recall": 1.0,
+  "eval_f1": 1.0,
+  "eval_runtime": 0.3123,
+  "eval_samples_per_second": 352.273,
+  "eval_steps_per_second": 22.417,
+  "epoch": 4.0
+}
+```
+## Confusion Matrix & Errors
+The Colab notebook includes a confusion matrix for validation and test, plus a short error analysis with example misclassifications and hypotheses (e.g., injury news phrased neutrally but labeled negative).
+|           | Pred 0 | Pred 1 |
+|-----------|-------:|-------:|
+| **True 0**|   55   |   0    |
+| **True 1**|   0    |   55   |
+## Brief Error Analysis (Concrete Examples & Hypotheses)
+No misclassifications were observed in the held-out test split (confusion matrix = perfect).
+However, given the very small dataset size (~30 examples), this likely reflects **overfitting** rather than true robustness.
+## Limitations & Ethics
+- Dataset size and labeling style can lead to unstable metrics; neutral/ambiguous tone is hard.
+- Sports injury and team-management news may bias wording and labels.
+- For coursework only; not for production or sensitive decisions.
+## Reproducibility
+- Python: 3.12
+- Transformers: >=4.41
+- Datasets: >=2.19
+- Seed: 42
+## License
+- Code & weights: MIT (adjust per course guidelines)
+- Dataset: see the original dataset's license/terms
+## AI Assistance Disclosure
+- GenAI tools assisted with notebook structure and documentation; modeling choices and evaluation were implemented and verified by the author.