3DCvT on LRW-1000

This repository provides the released checkpoint and evaluation artifacts for an unofficial PyTorch reproduction of:

A Lip Reading Method Based on 3D Convolutional Vision Transformer

Code repository:

Model Summary

  • Task: Chinese word-level lip reading
  • Dataset: LRW-1000
  • Number of classes: 1184 in this processed split
  • Framework: PyTorch
  • Architecture: 3D CNN + CvT + BiGRU

Released Files

  • best_model.pth: released checkpoint
  • sha256.txt: checksum for the checkpoint
  • logs/train.log: selected training log
  • results/per_class_acc_lrw1000_val.csv: per-class validation summary
  • plots/learning_curve.png: learning curve exported from training

Training Setup

Training settings from the released run:

  • GPUs: 1 GPU
  • Per-step batch size: 128
  • Gradient accumulation: 2
  • Effective batch size: 256
  • Epochs: 120
  • Optimizer: Adam
  • Weight decay: 1e-4
  • Learning rate: 6e-4
  • Warmup epochs: 5
  • Mixed precision: AMP enabled
  • torch.compile: disabled

Evaluation Result

Dataset Split Metric Value
LRW-1000 Validation Top-1 Accuracy 55.29%

Intended Use

This checkpoint is intended for:

  • research reproduction
  • benchmark comparison
  • qualitative inference demos

It is not intended as a production-ready commercial lip-reading system.

Limitations

  • Performance depends on using the matching preprocessing pipeline
  • This release does not include the raw LRW-1000 dataset
  • Users must obtain the dataset according to its own terms
  • This processed split uses 1184 classes in the generated vocabulary

Usage

Example inference command:

python inference.py \
  --dataset lrw1000 \
  --pkl_path /path/to/sample.pkl \
  --checkpoint /path/to/best_model.pth \
  --gpu 0

Notes

  • The checkpoint is released for reproducibility
  • Please use the matching code version when possible
  • Local source artifact names were best_model_for_lrw1000.pth and train_lrw1000.log

Citation

If you use this release, please cite the original paper:

@article{wu2022lip,
  title={A Lip Reading Method Based on 3D Convolutional Vision Transformer},
  author={Wu, Jiafeng and others},
  journal={IEEE Access},
  year={2022}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support