Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +146 -0
inference.py +225 -0
requirements.txt +7 -0
train.py +453 -0

README.md ADDED Viewed

	@@ -0,0 +1,146 @@

+---
+license: mit
+base_model: distilbert-base-uncased
+tags:
+  - text-classification
+  - arxiv
+  - academic-papers
+  - distilbert
+datasets:
+  - ccdv/arxiv-classification
+metrics:
+  - accuracy
+  - f1
+pipeline_tag: text-classification
+---
+# Academic Paper Classifier
+A DistilBERT model fine-tuned to classify academic paper abstracts into arxiv
+subject categories. Given the abstract of a research paper, the model predicts
+which area of computer science or statistics the paper belongs to.
+## Intended Use
+This model is designed for:
+- **Automated paper triage** -- quickly routing new submissions to the
+  appropriate reviewers or reading lists.
+- **Literature search** -- filtering large collections of papers by
+  predicted subject area.
+- **Research tooling** -- as a building block in larger academic-paper
+  analysis pipelines.
+The model is **not** intended for high-stakes decisions such as publication
+acceptance or funding allocation.
+## Labels
+| Id | Label    | Description                       |
+|----|----------|-----------------------------------|
+| 0  | cs.AI    | Artificial Intelligence           |
+| 1  | cs.CL    | Computation and Language (NLP)    |
+| 2  | cs.CV    | Computer Vision                   |
+| 3  | cs.LG    | Machine Learning                  |
+| 4  | cs.NE    | Neural and Evolutionary Computing |
+| 5  | cs.RO    | Robotics                          |
+| 6  | math.ST  | Statistics Theory                 |
+| 7  | stat.ML  | Machine Learning (Statistics)     |
+## Training Procedure
+### Base Model
+[`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased) --
+a distilled version of BERT that is 60% faster while retaining 97% of BERT's
+language-understanding performance.
+### Dataset
+[`ccdv/arxiv-classification`](https://huggingface.co/datasets/ccdv/arxiv-classification)
+-- a curated collection of arxiv paper abstracts with subject category labels.
+### Hyperparameters
+| Parameter              | Value  |
+|------------------------|--------|
+| Learning rate          | 2e-5   |
+| LR scheduler           | Linear with warmup |
+| Warmup ratio           | 0.1    |
+| Weight decay           | 0.01   |
+| Epochs                 | 5      |
+| Batch size (train)     | 16     |
+| Batch size (eval)      | 32     |
+| Max sequence length    | 512    |
+| Early stopping patience| 3      |
+| Seed                   | 42     |
+### Metrics
+The model is evaluated on accuracy, weighted F1, weighted precision, and
+weighted recall. The best checkpoint is selected by weighted F1.
+## How to Use
+### With the `transformers` pipeline
+```python
+from transformers import pipeline
+classifier = pipeline(
+    "text-classification",
+    model="gr8monk3ys/paper-classifier-model",
+)
+abstract = (
+    "We introduce a new method for neural machine translation that uses "
+    "attention mechanisms to align source and target sentences, achieving "
+    "state-of-the-art results on WMT benchmarks."
+)
+result = classifier(abstract)
+print(result)
+# [{'label': 'cs.CL', 'score': 0.95}]
+```
+### With the included inference script
+```bash
+python inference.py \
+    --model_path gr8monk3ys/paper-classifier-model \
+    --abstract "We propose a convolutional neural network for image recognition..."
+```
+### Training from scratch
+```bash
+pip install -r requirements.txt
+python train.py \
+    --num_train_epochs 5 \
+    --learning_rate 2e-5 \
+    --per_device_train_batch_size 16 \
+    --push_to_hub
+```
+## Limitations
+- The model only covers a fixed set of 8 arxiv categories. Papers from other
+  fields will be forced into one of these buckets.
+- Performance may degrade on abstracts that are unusually short, written in a
+  language other than English, or that span multiple subject areas.
+- The model inherits any biases present in the DistilBERT base weights and in
+  the training dataset.
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{scaturchio2025paperclassifier,
+    title  = {Academic Paper Classifier},
+    author = {Lorenzo Scaturchio},
+    year   = {2025},
+    url    = {https://huggingface.co/gr8monk3ys/paper-classifier-model}
+}
+```

inference.py ADDED Viewed

	@@ -0,0 +1,225 @@

+#!/usr/bin/env python3
+"""
+Inference script for the Academic Paper Classifier.
+Loads a fine-tuned DistilBERT model and predicts the arxiv category for a
+given paper abstract. Returns the predicted category along with per-class
+confidence scores.
+Usage examples:
+    # Use a local model directory
+    python inference.py --model_path ./model --abstract "We propose a novel ..."
+    # Use a HuggingFace Hub model
+    python inference.py --model_path gr8monk3ys/paper-classifier-model \
+                        --abstract "We propose a novel ..."
+    # Interactive mode (reads from stdin)
+    python inference.py --model_path ./model
+Author: Lorenzo Scaturchio (gr8monk3ys)
+License: MIT
+"""
+import argparse
+import json
+import logging
+import sys
+from pathlib import Path
+import torch
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+# ---------------------------------------------------------------------------
+# Logging
+# ---------------------------------------------------------------------------
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s [%(levelname)s] %(name)s - %(message)s",
+    handlers=[logging.StreamHandler(sys.stdout)],
+)
+logger = logging.getLogger(__name__)
+# ---------------------------------------------------------------------------
+# Classifier wrapper
+# ---------------------------------------------------------------------------
+class PaperClassifier:
+    """Thin wrapper around a fine-tuned sequence-classification model.
+    Parameters
+    ----------
+    model_path : str
+        Path to a local model directory **or** a HuggingFace Hub model id.
+    device : str | None
+        Target device (``"cpu"``, ``"cuda"``, ``"mps"``).  If *None* the best
+        available device is selected automatically.
+    """
+    def __init__(self, model_path: str, device: str | None = None) -> None:
+        if device is None:
+            if torch.cuda.is_available():
+                device = "cuda"
+            elif torch.backends.mps.is_available():
+                device = "mps"
+            else:
+                device = "cpu"
+        self.device = torch.device(device)
+        logger.info("Loading tokenizer from: %s", model_path)
+        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
+        logger.info("Loading model from: %s", model_path)
+        self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
+        self.model.to(self.device)
+        # Read label mapping stored in the model config
+        self.id2label: dict[int, str] = self.model.config.id2label
+        logger.info("Labels: %s", list(self.id2label.values()))
+    @torch.no_grad()
+    def predict(self, abstract: str, top_k: int | None = None) -> dict:
+        """Classify a single paper abstract.
+        Parameters
+        ----------
+        abstract : str
+            The paper abstract to classify.
+        top_k : int | None
+            If given, only the *top_k* categories (by confidence) are returned
+            in ``scores``.  Pass *None* to return all categories.
+        Returns
+        -------
+        dict
+            ``{"label": str, "confidence": float, "scores": {label: prob}}``
+        """
+        self.model.eval()
+        inputs = self.tokenizer(
+            abstract,
+            return_tensors="pt",
+            truncation=True,
+            padding=True,
+            max_length=512,
+        ).to(self.device)
+        logits = self.model(**inputs).logits
+        probs = torch.softmax(logits, dim=-1).squeeze(0).cpu().numpy()
+        sorted_indices = probs.argsort()[::-1]
+        if top_k is not None:
+            sorted_indices = sorted_indices[:top_k]
+        scores = {
+            self.id2label[int(idx)]: float(probs[idx]) for idx in sorted_indices
+        }
+        best_idx = int(probs.argmax())
+        return {
+            "label": self.id2label[best_idx],
+            "confidence": float(probs[best_idx]),
+            "scores": scores,
+        }
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Classify an academic paper abstract into an arxiv category."
+    )
+    parser.add_argument(
+        "--model_path",
+        type=str,
+        default="./model",
+        help="Path to the fine-tuned model directory or HF Hub id (default: %(default)s).",
+    )
+    parser.add_argument(
+        "--abstract",
+        type=str,
+        default=None,
+        help="Paper abstract text.  If omitted, the script enters interactive mode.",
+    )
+    parser.add_argument(
+        "--top_k",
+        type=int,
+        default=None,
+        help="Only show the top-k predictions (default: show all).",
+    )
+    parser.add_argument(
+        "--device",
+        type=str,
+        default=None,
+        choices=["cpu", "cuda", "mps"],
+        help="Device to run inference on (default: auto-detect).",
+    )
+    parser.add_argument(
+        "--json",
+        action="store_true",
+        default=False,
+        dest="output_json",
+        help="Output raw JSON instead of human-readable text.",
+    )
+    return parser.parse_args()
+def _print_result(result: dict, output_json: bool) -> None:
+    """Pretty-print or JSON-dump a prediction result."""
+    if output_json:
+        print(json.dumps(result, indent=2))
+        return
+    print(f"\n  Predicted category : {result['label']}")
+    print(f"  Confidence         : {result['confidence']:.4f}")
+    print("  ---------------------------------")
+    for label, score in result["scores"].items():
+        bar = "#" * int(score * 40)
+        print(f"  {label:<10s} {score:6.4f}  {bar}")
+    print()
+def main() -> None:
+    args = parse_args()
+    classifier = PaperClassifier(model_path=args.model_path, device=args.device)
+    if args.abstract is not None:
+        result = classifier.predict(args.abstract, top_k=args.top_k)
+        _print_result(result, args.output_json)
+        return
+    # Interactive mode
+    print("Academic Paper Classifier - Interactive Mode")
+    print("Enter a paper abstract (or 'quit' to exit).")
+    print("For multi-line input, end with an empty line.\n")
+    while True:
+        try:
+            lines: list[str] = []
+            prompt = "abstract> " if sys.stdin.isatty() else ""
+            while True:
+                line = input(prompt)
+                if line.strip().lower() == "quit":
+                    logger.info("Exiting.")
+                    return
+                if line == "" and lines:
+                    break
+                lines.append(line)
+                prompt = "... " if sys.stdin.isatty() else ""
+            abstract = " ".join(lines).strip()
+            if not abstract:
+                continue
+            result = classifier.predict(abstract, top_k=args.top_k)
+            _print_result(result, args.output_json)
+        except (EOFError, KeyboardInterrupt):
+            print()
+            logger.info("Exiting.")
+            return
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+transformers>=4.36.0
+datasets>=2.16.0
+torch>=2.1.0
+scikit-learn>=1.3.0
+accelerate>=0.25.0
+evaluate>=0.4.0
+huggingface_hub>=0.20.0

train.py ADDED Viewed

	@@ -0,0 +1,453 @@

+#!/usr/bin/env python3
+"""
+Fine-tune DistilBERT for academic paper abstract classification.
+This script downloads arxiv paper abstracts, preprocesses them, and fine-tunes
+a DistilBERT model for multi-class sequence classification. Supports pushing
+the trained model to the HuggingFace Hub.
+Author: Lorenzo Scaturchio (gr8monk3ys)
+License: MIT
+"""
+import argparse
+import logging
+import os
+import sys
+from pathlib import Path
+import evaluate
+import numpy as np
+import torch
+from datasets import ClassLabel, DatasetDict, load_dataset
+from transformers import (
+    AutoModelForSequenceClassification,
+    AutoTokenizer,
+    EarlyStoppingCallback,
+    Trainer,
+    TrainingArguments,
+    set_seed,
+)
+# ---------------------------------------------------------------------------
+# Logging
+# ---------------------------------------------------------------------------
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s [%(levelname)s] %(name)s - %(message)s",
+    handlers=[logging.StreamHandler(sys.stdout)],
+)
+logger = logging.getLogger(__name__)
+# ---------------------------------------------------------------------------
+# Constants
+# ---------------------------------------------------------------------------
+MODEL_NAME = "distilbert-base-uncased"
+DEFAULT_DATASET = "ccdv/arxiv-classification"
+DEFAULT_OUTPUT_DIR = "./results"
+DEFAULT_MODEL_DIR = "./model"
+# Canonical label order so the id<->label mapping is deterministic.
+LABEL_NAMES = [
+    "cs.AI",
+    "cs.CL",
+    "cs.CV",
+    "cs.LG",
+    "cs.NE",
+    "cs.RO",
+    "math.ST",
+    "stat.ML",
+]
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+def parse_args() -> argparse.Namespace:
+    """Parse command-line arguments for training hyperparameters."""
+    parser = argparse.ArgumentParser(
+        description="Fine-tune DistilBERT on arxiv paper classification."
+    )
+    # Data
+    parser.add_argument(
+        "--dataset_name",
+        type=str,
+        default=DEFAULT_DATASET,
+        help="HuggingFace dataset identifier (default: %(default)s).",
+    )
+    parser.add_argument(
+        "--max_length",
+        type=int,
+        default=512,
+        help="Maximum token length for the tokenizer (default: %(default)s).",
+    )
+    parser.add_argument(
+        "--max_train_samples",
+        type=int,
+        default=None,
+        help="Cap the number of training samples (useful for debugging).",
+    )
+    parser.add_argument(
+        "--max_eval_samples",
+        type=int,
+        default=None,
+        help="Cap the number of evaluation samples (useful for debugging).",
+    )
+    # Training
+    parser.add_argument(
+        "--output_dir",
+        type=str,
+        default=DEFAULT_OUTPUT_DIR,
+        help="Directory for training checkpoints (default: %(default)s).",
+    )
+    parser.add_argument(
+        "--model_dir",
+        type=str,
+        default=DEFAULT_MODEL_DIR,
+        help="Directory where the final model is saved (default: %(default)s).",
+    )
+    parser.add_argument(
+        "--num_train_epochs",
+        type=int,
+        default=5,
+        help="Total number of training epochs (default: %(default)s).",
+    )
+    parser.add_argument(
+        "--per_device_train_batch_size",
+        type=int,
+        default=16,
+        help="Batch size per device during training (default: %(default)s).",
+    )
+    parser.add_argument(
+        "--per_device_eval_batch_size",
+        type=int,
+        default=32,
+        help="Batch size per device during evaluation (default: %(default)s).",
+    )
+    parser.add_argument(
+        "--learning_rate",
+        type=float,
+        default=2e-5,
+        help="Peak learning rate (default: %(default)s).",
+    )
+    parser.add_argument(
+        "--weight_decay",
+        type=float,
+        default=0.01,
+        help="Weight decay coefficient (default: %(default)s).",
+    )
+    parser.add_argument(
+        "--warmup_ratio",
+        type=float,
+        default=0.1,
+        help="Fraction of total steps used for linear warmup (default: %(default)s).",
+    )
+    parser.add_argument(
+        "--seed",
+        type=int,
+        default=42,
+        help="Random seed for reproducibility (default: %(default)s).",
+    )
+    parser.add_argument(
+        "--early_stopping_patience",
+        type=int,
+        default=3,
+        help="Number of evaluations with no improvement before stopping (default: %(default)s).",
+    )
+    parser.add_argument(
+        "--fp16",
+        action="store_true",
+        default=False,
+        help="Use mixed-precision (FP16) training.",
+    )
+    # Hub
+    parser.add_argument(
+        "--push_to_hub",
+        action="store_true",
+        default=False,
+        help="Push the trained model to the HuggingFace Hub.",
+    )
+    parser.add_argument(
+        "--hub_model_id",
+        type=str,
+        default="gr8monk3ys/paper-classifier-model",
+        help="Repository id on the HuggingFace Hub (default: %(default)s).",
+    )
+    return parser.parse_args()
+def build_label_mappings(label_names: list[str]) -> tuple[dict, dict]:
+    """Return (label2id, id2label) dicts for the given label names."""
+    label2id = {label: idx for idx, label in enumerate(label_names)}
+    id2label = {idx: label for idx, label in enumerate(label_names)}
+    return label2id, id2label
+def load_and_prepare_dataset(
+    dataset_name: str,
+    label2id: dict[str, int],
+    max_train_samples: int | None = None,
+    max_eval_samples: int | None = None,
+) -> DatasetDict:
+    """Load the dataset and normalise the label column.
+    The function handles two common dataset layouts:
+      1. The dataset already has train / validation / test splits and a
+         numeric ``label`` column whose values match our ``label2id``.
+      2. The dataset has a string ``label`` column that needs mapping.
+    Returns a ``DatasetDict`` with ``train`` and ``validation`` splits.
+    """
+    logger.info("Loading dataset: %s", dataset_name)
+    raw = load_dataset(dataset_name, trust_remote_code=True)
+    # Determine the text and label column names --------------------------
+    sample_columns = list(next(iter(raw.values())).column_names)
+    text_col = None
+    for candidate in ("text", "abstract", "input", "sentence"):
+        if candidate in sample_columns:
+            text_col = candidate
+            break
+    if text_col is None:
+        # Fall back to the first string-typed column
+        text_col = sample_columns[0]
+    logger.info("Using text column: '%s'", text_col)
+    label_col = None
+    for candidate in ("label", "labels", "category", "class"):
+        if candidate in sample_columns:
+            label_col = candidate
+            break
+    if label_col is None:
+        label_col = sample_columns[-1]
+    logger.info("Using label column: '%s'", label_col)
+    # Rename columns so downstream code can rely on 'text' and 'label' ---
+    def _rename(example):
+        return {"text": str(example[text_col]), "label": example[label_col]}
+    raw = raw.map(_rename, remove_columns=sample_columns)
+    # If labels are strings, map them to ints using label2id -------------
+    sample_label = raw[list(raw.keys())[0]][0]["label"]
+    if isinstance(sample_label, str):
+        logger.info("Mapping string labels to integer ids.")
+        def _map_label(example):
+            lbl = example["label"]
+            if lbl in label2id:
+                example["label"] = label2id[lbl]
+            else:
+                example["label"] = -1  # will be filtered out
+            return example
+        raw = raw.map(_map_label)
+        raw = raw.filter(lambda ex: ex["label"] != -1)
+    # Ensure we have a ClassLabel feature --------------------------------
+    label_feature = ClassLabel(
+        num_classes=len(label2id), names=list(label2id.keys())
+    )
+    raw = raw.cast_column("label", label_feature)
+    # Build train / validation splits ------------------------------------
+    if "validation" not in raw and "test" in raw:
+        raw["validation"] = raw.pop("test")
+    elif "validation" not in raw:
+        split = raw["train"].train_test_split(test_size=0.1, seed=42, stratify_by_column="label")
+        raw = DatasetDict({"train": split["train"], "validation": split["test"]})
+    # Subsample if requested ---------------------------------------------
+    if max_train_samples is not None:
+        raw["train"] = raw["train"].select(range(min(max_train_samples, len(raw["train"]))))
+    if max_eval_samples is not None:
+        raw["validation"] = raw["validation"].select(
+            range(min(max_eval_samples, len(raw["validation"])))
+        )
+    logger.info(
+        "Dataset sizes -> train: %d, validation: %d",
+        len(raw["train"]),
+        len(raw["validation"]),
+    )
+    return raw
+def tokenize_dataset(
+    dataset: DatasetDict,
+    tokenizer: AutoTokenizer,
+    max_length: int,
+) -> DatasetDict:
+    """Tokenize the ``text`` column using the supplied tokenizer."""
+    def _tokenize(batch):
+        return tokenizer(
+            batch["text"],
+            padding="max_length",
+            truncation=True,
+            max_length=max_length,
+        )
+    logger.info("Tokenizing dataset (max_length=%d) ...", max_length)
+    tokenized = dataset.map(_tokenize, batched=True, desc="Tokenizing")
+    tokenized.set_format("torch", columns=["input_ids", "attention_mask", "label"])
+    return tokenized
+def build_compute_metrics_fn():
+    """Return a ``compute_metrics`` callable for the HF Trainer.
+    Loads the ``accuracy``, ``f1``, ``precision`` and ``recall`` evaluate
+    metrics once at creation time to avoid repeated disk access.
+    """
+    acc_metric = evaluate.load("accuracy")
+    f1_metric = evaluate.load("f1")
+    prec_metric = evaluate.load("precision")
+    rec_metric = evaluate.load("recall")
+    def compute_metrics(eval_pred):
+        logits, labels = eval_pred
+        predictions = np.argmax(logits, axis=-1)
+        results = {}
+        results.update(acc_metric.compute(predictions=predictions, references=labels))
+        results.update(
+            f1_metric.compute(
+                predictions=predictions, references=labels, average="weighted"
+            )
+        )
+        results.update(
+            prec_metric.compute(
+                predictions=predictions, references=labels, average="weighted"
+            )
+        )
+        results.update(
+            rec_metric.compute(
+                predictions=predictions, references=labels, average="weighted"
+            )
+        )
+        return results
+    return compute_metrics
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+def main() -> None:
+    args = parse_args()
+    # Reproducibility
+    set_seed(args.seed)
+    logger.info("Seed set to %d", args.seed)
+    # Device info
+    device = "cuda" if torch.cuda.is_available() else ("mps" if torch.backends.mps.is_available() else "cpu")
+    logger.info("Using device: %s", device)
+    # Label mappings
+    label2id, id2label = build_label_mappings(LABEL_NAMES)
+    num_labels = len(LABEL_NAMES)
+    logger.info("Number of labels: %d", num_labels)
+    # Dataset
+    dataset = load_and_prepare_dataset(
+        dataset_name=args.dataset_name,
+        label2id=label2id,
+        max_train_samples=args.max_train_samples,
+        max_eval_samples=args.max_eval_samples,
+    )
+    # Tokenizer
+    logger.info("Loading tokenizer: %s", MODEL_NAME)
+    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
+    tokenized_dataset = tokenize_dataset(dataset, tokenizer, args.max_length)
+    # Model
+    logger.info("Loading model: %s", MODEL_NAME)
+    model = AutoModelForSequenceClassification.from_pretrained(
+        MODEL_NAME,
+        num_labels=num_labels,
+        id2label=id2label,
+        label2id=label2id,
+    )
+    # Training arguments
+    training_args = TrainingArguments(
+        output_dir=args.output_dir,
+        num_train_epochs=args.num_train_epochs,
+        per_device_train_batch_size=args.per_device_train_batch_size,
+        per_device_eval_batch_size=args.per_device_eval_batch_size,
+        learning_rate=args.learning_rate,
+        weight_decay=args.weight_decay,
+        warmup_ratio=args.warmup_ratio,
+        lr_scheduler_type="linear",
+        eval_strategy="epoch",
+        save_strategy="epoch",
+        logging_strategy="steps",
+        logging_steps=50,
+        save_total_limit=2,
+        load_best_model_at_end=True,
+        metric_for_best_model="f1",
+        greater_is_better=True,
+        fp16=args.fp16 and torch.cuda.is_available(),
+        report_to="none",
+        seed=args.seed,
+        push_to_hub=False,  # we push manually after training
+    )
+    # Trainer
+    trainer = Trainer(
+        model=model,
+        args=training_args,
+        train_dataset=tokenized_dataset["train"],
+        eval_dataset=tokenized_dataset["validation"],
+        tokenizer=tokenizer,
+        compute_metrics=build_compute_metrics_fn(),
+        callbacks=[
+            EarlyStoppingCallback(early_stopping_patience=args.early_stopping_patience),
+        ],
+    )
+    # Train
+    logger.info("Starting training ...")
+    train_result = trainer.train()
+    logger.info("Training complete.")
+    # Log final training metrics
+    metrics = train_result.metrics
+    trainer.log_metrics("train", metrics)
+    trainer.save_metrics("train", metrics)
+    # Evaluate
+    logger.info("Running final evaluation ...")
+    eval_metrics = trainer.evaluate()
+    trainer.log_metrics("eval", eval_metrics)
+    trainer.save_metrics("eval", eval_metrics)
+    # Save model + tokenizer
+    model_dir = Path(args.model_dir)
+    model_dir.mkdir(parents=True, exist_ok=True)
+    logger.info("Saving model to %s", model_dir)
+    trainer.save_model(str(model_dir))
+    tokenizer.save_pretrained(str(model_dir))
+    # Push to Hub
+    if args.push_to_hub:
+        logger.info("Pushing model to HuggingFace Hub: %s", args.hub_model_id)
+        try:
+            model.push_to_hub(args.hub_model_id)
+            tokenizer.push_to_hub(args.hub_model_id)
+            logger.info("Model pushed successfully.")
+        except Exception:
+            logger.exception("Failed to push model to Hub.")
+            sys.exit(1)
+    logger.info("All done.")
+if __name__ == "__main__":
+    main()