3DCvT on LRW-1000

This repository provides the released checkpoint and evaluation artifacts for an unofficial PyTorch reproduction of:

A Lip Reading Method Based on 3D Convolutional Vision Transformer

Code repository:

https://github.com/DPInnovationWorks/3DCvT_LipReading

Model Summary

Task: Chinese word-level lip reading
Dataset: LRW-1000
Number of classes: 1184 in this processed split
Framework: PyTorch
Architecture: 3D CNN + CvT + BiGRU

Released Files

best_model.pth: released checkpoint
sha256.txt: checksum for the checkpoint
logs/train.log: selected training log
results/per_class_acc_lrw1000_val.csv: per-class validation summary
plots/learning_curve.png: learning curve exported from training

Training Setup

Training settings from the released run:

GPUs: 1 GPU
Per-step batch size: 128
Gradient accumulation: 2
Effective batch size: 256
Epochs: 120
Optimizer: Adam
Weight decay: 1e-4
Learning rate: 6e-4
Warmup epochs: 5
Mixed precision: AMP enabled
torch.compile: disabled

Evaluation Result

Dataset	Split	Metric	Value
LRW-1000	Validation	Top-1 Accuracy	55.29%

Intended Use

This checkpoint is intended for:

research reproduction
benchmark comparison
qualitative inference demos

It is not intended as a production-ready commercial lip-reading system.

Limitations

Performance depends on using the matching preprocessing pipeline
This release does not include the raw LRW-1000 dataset
Users must obtain the dataset according to its own terms
This processed split uses 1184 classes in the generated vocabulary

Usage

Example inference command:

python inference.py \
  --dataset lrw1000 \
  --pkl_path /path/to/sample.pkl \
  --checkpoint /path/to/best_model.pth \
  --gpu 0

Notes

The checkpoint is released for reproducibility
Please use the matching code version when possible
Local source artifact names were best_model_for_lrw1000.pth and train_lrw1000.log

Citation

If you use this release, please cite the original paper:

@article{wu2022lip,
  title={A Lip Reading Method Based on 3D Convolutional Vision Transformer},
  author={Wu, Jiafeng and others},
  journal={IEEE Access},
  year={2022}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Video Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support