cnn-imdb-512

WordCNN (Kim, 2014) trained on the IMDB sentiment classification dataset with max_seq_length=512.

Trained as a victim model for adversarial NLP research (TextBugger / TextFooler / DeepWordBug-style attacks). The 512-token window prevents truncation of ~95–98% of IMDB reviews vs. the typical 128-token TextAttack baseline.

Model Details

Architecture: WordCNN (Kim, 2014) with kernel sizes 3, 4, 5; 100 filters per kernel
Embeddings: GloVe 200d (pretrained)
Dropout: 0.3
Max sequence length: 512 tokens (words)
Task: Binary sentiment classification (positive / negative)

Training

Trained from scratch on the IMDB train split (25,000 examples) using TextAttack 0.3.x.

Hyperparameter	Value
Epochs	30 (early stopping after 5 epochs without improvement)
Batch size	64
Learning rate	1e-4
Weight decay	0.01
Warmup steps	500
Random seed	786
Hardware	NVIDIA RTX 3090 (24 GB)

Training command:

textattack train --model-name-or-path cnn \
  --dataset imdb \
  --model-max-length 512 \
  --epochs 30 \
  --early-stopping-epochs 5 \
  --per-device-train-batch-size 64 \
  --learning-rate 1e-4 \
  --save-last \
  --output-dir ./models/cnn-imdb-512

Evaluation

Evaluated on the IMDB test split (25,000 examples) at the best epoch checkpoint:

Metric	Value
Accuracy	86.09%

How to Use

This model uses the TextAttack custom format and requires the textattack library. TextAttack's from_pretrained does not currently resolve Hugging Face Hub IDs — download the snapshot first via huggingface_hub, then pass the local path:

from huggingface_hub import snapshot_download
from textattack.models.helpers import WordCNNForClassification

local_dir = snapshot_download(repo_id="jongador/cnn-imdb-512")
model = WordCNNForClassification.from_pretrained(local_dir)

References

Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. EMNLP.
Morris, J. et al. (2020). TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP. EMNLP.

License

MIT

Downloads last month: 40

jongador
/

cnn-imdb-512