3DCvT on LRW-1000
This repository provides the released checkpoint and evaluation artifacts for an unofficial PyTorch reproduction of:
A Lip Reading Method Based on 3D Convolutional Vision Transformer
Code repository:
Model Summary
- Task: Chinese word-level lip reading
- Dataset: LRW-1000
- Number of classes: 1184 in this processed split
- Framework: PyTorch
- Architecture: 3D CNN + CvT + BiGRU
Released Files
best_model.pth: released checkpointsha256.txt: checksum for the checkpointlogs/train.log: selected training logresults/per_class_acc_lrw1000_val.csv: per-class validation summaryplots/learning_curve.png: learning curve exported from training
Training Setup
Training settings from the released run:
- GPUs: 1 GPU
- Per-step batch size: 128
- Gradient accumulation: 2
- Effective batch size: 256
- Epochs: 120
- Optimizer: Adam
- Weight decay: 1e-4
- Learning rate: 6e-4
- Warmup epochs: 5
- Mixed precision: AMP enabled
torch.compile: disabled
Evaluation Result
| Dataset | Split | Metric | Value |
|---|---|---|---|
| LRW-1000 | Validation | Top-1 Accuracy | 55.29% |
Intended Use
This checkpoint is intended for:
- research reproduction
- benchmark comparison
- qualitative inference demos
It is not intended as a production-ready commercial lip-reading system.
Limitations
- Performance depends on using the matching preprocessing pipeline
- This release does not include the raw LRW-1000 dataset
- Users must obtain the dataset according to its own terms
- This processed split uses 1184 classes in the generated vocabulary
Usage
Example inference command:
python inference.py \
--dataset lrw1000 \
--pkl_path /path/to/sample.pkl \
--checkpoint /path/to/best_model.pth \
--gpu 0
Notes
- The checkpoint is released for reproducibility
- Please use the matching code version when possible
- Local source artifact names were
best_model_for_lrw1000.pthandtrain_lrw1000.log
Citation
If you use this release, please cite the original paper:
@article{wu2022lip,
title={A Lip Reading Method Based on 3D Convolutional Vision Transformer},
author={Wu, Jiafeng and others},
journal={IEEE Access},
year={2022}
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support