3DCvT on LRW

This repository provides the released checkpoint and evaluation artifacts for an unofficial PyTorch reproduction of:

A Lip Reading Method Based on 3D Convolutional Vision Transformer

Code repository:

https://github.com/DPInnovationWorks/3DCvT_LipReading

Model Summary

Task: English word-level lip reading
Dataset: LRW
Number of classes: 500
Framework: PyTorch
Architecture: 3D CNN + CvT + BiGRU

Released Files

best_model.pth: released checkpoint
sha256.txt: checksum for the checkpoint
logs/train.log: selected training log
results/per_class_acc_lrw_val.csv: per-class validation summary
plots/learning_curve.png: learning curve exported from training

Training Setup

Training settings from the released run:

GPUs: 1 GPU
Per-step batch size: 64
Gradient accumulation: 4
Effective batch size: 256
Epochs: 150
Optimizer: Adam
Weight decay: 1e-4
Learning rate: 6e-4
Warmup epochs: 5
Mixed precision: AMP enabled
torch.compile: disabled

Evaluation Result

Dataset	Split	Metric	Value
LRW	Validation	Top-1 Accuracy	83.91%

Intended Use

This checkpoint is intended for:

research reproduction
benchmark comparison
qualitative inference demos

It is not intended as a production-ready commercial lip-reading system.

Limitations

Performance may vary across preprocessing pipelines
This release does not include the raw LRW dataset
Users must obtain the dataset according to its own terms

Usage

Example inference command:

python inference.py \
  --dataset lrw \
  --video_path /path/to/sample.mp4 \
  --checkpoint /path/to/best_model.pth \
  --gpu 0

Notes

The checkpoint is released for reproducibility
Please use the matching code version when possible
Local source artifact names were best_model_for_lrw.pth and train_lrw.log

Citation

If you use this release, please cite the original paper:

@article{wu2022lip,
  title={A Lip Reading Method Based on 3D Convolutional Vision Transformer},
  author={Wu, Jiafeng and others},
  journal={IEEE Access},
  year={2022}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Video Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support