Paper Review: Sen-1: Vietnamese Text Classification Model
Review Date: February 2, 2026 Reviewer Expertise: Vietnamese NLP, Text Classification Review Type: ACL Rolling Review (ARR) Format
Paper Summary
This technical report describes Sen-1, a Vietnamese text classification model using TF-IDF vectorization combined with Linear SVM. The system is designed as a lightweight baseline compatible with the underthesea Vietnamese NLP toolkit API. The report documents the methodology, implementation details, and provides a demonstration release trained on a small sample dataset (60 training samples).
Summary of Strengths
Clear documentation and reproducibility (Sections 3, 5, 7): The report provides comprehensive implementation details including exact hyperparameters (Table in Section 3.2-3.3), model file structure (Section 5.3), and complete usage examples with code snippets. This level of documentation enables straightforward reproduction.
Practical resource contribution: The model is released on HuggingFace with Apache 2.0 license, providing the community with an accessible baseline. The lightweight design (~103 KB) makes it suitable for resource-constrained environments where transformer-based models are impractical.
Honest limitations disclosure (Section 8): The authors transparently acknowledge the current release's limitations including the minimal training data (60 samples), lack of word segmentation, and single-label constraint. This intellectual honesty helps users set appropriate expectations.
API design consideration (Section 5.2): The integration with the established underthesea ecosystem provides continuity for existing users and lowers the barrier to adoption.
Summary of Weaknesses
Insufficient experimental evaluation (Section 6.2): The reported results (40% validation accuracy, 34.67% F1) are based on only 60 training and 20 validation samples. While the authors note this limitation, the paper would benefit from at least one experiment on the full VNTC dataset or a meaningful subset to demonstrate the method's actual effectiveness. The "Expected Sen-1" results in Table 6.4 (~93-95%) are speculative projections, not empirical measurements.
Incomplete comparison with modern baselines: The related work (Section 2.2) mentions PhoBERT but provides no actual comparison. Given that PhoBERT-based models achieve 65.44% F1 on UIT-VSMEC and 87.35% on extended versions, and TextGraphFuseGAT achieves +4% over PhoBERT baselines, the paper lacks context for where TF-IDF+SVM fits in the current landscape.
Missing word segmentation integration: Vietnamese text classification typically benefits significantly from proper word segmentation (as noted in PhoBERT's use of VnCoreNLP's RDRSegmenter). The paper acknowledges this gap but doesn't provide quantitative analysis of the performance impact, despite underthesea having word segmentation capabilities.
Statistical rigor concerns: No confidence intervals, standard deviations, or statistical significance tests are reported. For the sample predictions (Table 6.3), only 5 cherry-picked examples are shown, all correct. This does not provide a representative view of model behavior.
Scores
Soundness: 2/5 (Poor)
The methodology is technically correct but the experimental evaluation is insufficient. The core claims about model effectiveness cannot be verified with the provided 60-sample experiments. The "expected" benchmark results are projections without empirical backing.
Excitement: 2/5 (Somewhat Boring)
TF-IDF + SVM for text classification is a well-established baseline from nearly two decades ago. While having a documented Vietnamese baseline is useful, the contribution is incremental. The paper does not introduce novel techniques or provide new insights about Vietnamese text classification.
Overall Assessment: 2/5 (Resubmit next cycle)
The paper provides a useful resource but requires substantial revisions: (1) actual benchmark results on VNTC or other standard datasets, (2) comparison with at least one modern baseline (even a simple fine-tuned PhoBERT), and (3) ablation studies on design choices (e.g., with/without word segmentation, n-gram ranges).
Reproducibility: 4/5 (Could mostly reproduce)
The paper provides sufficient detail for reproduction: hyperparameters, dependencies, model files, and code examples. Minor variations may occur due to scikit-learn version differences. Code is publicly available on HuggingFace.
Confidence: 4/5 (High)
I am familiar with Vietnamese NLP and text classification methods. I have reviewed the current state-of-the-art and can assess this work's positioning.
Detailed Comments
Technical Soundness
The TF-IDF + Linear SVM pipeline is correctly implemented and the mathematical formulations (Sections 3.2-3.4) are accurate. However:
Confidence score calculation (Section 3.4): Using absolute value of the decision function
|f(x)|loses information about prediction direction. This is acceptable for confidence but should be clarified.No cross-validation: With such limited data (60 samples), k-fold cross-validation would provide more robust estimates than a single train/validation split.
Class imbalance not addressed: The VNTC dataset has significant class imbalance (5,219 vs 1,820 samples across categories). The report doesn't discuss handling strategies.
Novelty and Contribution
This work is positioned as a "baseline" and "resource paper" rather than a methodological contribution. As such, the novelty expectations are lower. However:
- The TF-IDF + SVM approach has been used for Vietnamese text classification since Vu et al. (2007)
- The primary contribution is the packaged, documented implementation
- No new insights about Vietnamese text classification are provided
Clarity and Presentation
The report is well-organized and clearly written:
- Good use of tables and diagrams (Section 3.1 architecture diagram)
- Comprehensive appendices with label mappings and model card
- Code examples are helpful for practitioners
Minor issues:
- The Abstract could be more specific about intended use cases
- Section 9 (Future Work) is essentially a TODO list rather than research directions
Reproducibility Assessment
Positive aspects:
- All hyperparameters documented
- Model files available on HuggingFace
- Dependencies listed with version constraints
- Clear API documentation
Missing:
- Random seeds for reproducibility
- Training/validation split procedure details
- Exact preprocessing steps applied to text
Limitations and Ethics
The limitations section (Section 8) is present and covers major issues. Could be expanded:
- No discussion of potential biases in news classification
- No analysis of error types or failure modes
- No discussion of environmental impact (though minimal for this model)
Related Work Research
Papers Found
| Paper | Year | Method | Results | Relevance |
|---|---|---|---|---|
| PhoBERT | 2020 | RoBERTa for Vietnamese | SOTA on multiple tasks | Primary modern baseline missing |
| TextGraphFuseGAT | 2025 | PhoBERT + GAT | +4% F1 over PhoBERT | Shows current SOTA direction |
| SMTCE Benchmark | 2022 | Multiple BERTs | 65.44% F1 (VSMEC) | Comprehensive Vietnamese benchmark |
| ViSoBERT | 2023 | Social media BERT | Improved social text | Specialized Vietnamese model |
| Vu et al. | 2007 | SVM, N-gram | 97.1% (VNTC-10) | Original baseline cited |
Missing Citations
The following relevant works should be considered:
- Nguyen & Nguyen (2020) - PhoBERT paper: The primary Vietnamese pre-trained model that sets modern baselines
- Ho et al. (2020) - UIT-VSMEC: Standard emotion recognition corpus mentioned in future work but not cited
- Nguyen et al. (2022) - SMTCE benchmark: Comprehensive evaluation of Vietnamese text classification models
- VnCoreNLP - Word segmentation tool used by most Vietnamese NLP systems
SOTA Verification
- Claimed: SVM Multi achieves 93.4% on VNTC-10 (Vu et al. 2007)
- Actual: This result is from 2007. Modern PhoBERT-based approaches likely exceed this, though direct VNTC comparisons with transformers are limited in literature.
- Assessment: The cited baseline is accurate but dated. The "expected" 93-95% projection is reasonable but unverified.
Questions for Authors
Can you provide results on the full VNTC dataset (33,759 training samples) to validate the "expected" performance claims?
What is the rationale for not incorporating underthesea's word segmentation, given it's already a dependency of the ecosystem?
How does the model perform on out-of-domain text (e.g., social media vs. news articles)?
Were alternative TF-IDF configurations (e.g., different n-gram ranges, vocabulary sizes) explored? If so, what guided the final hyperparameter choices?
Minor Issues
Section 3.3: The SVM formulation shows soft-margin SVM but
loss: hingeis listed as default. LinearSVC uses squared hinge by default; clarify if hinge loss is explicitly set.Table in Section 4.1: Train/test split ratios vary significantly across categories (e.g., Doi song: 3,159/2,036 vs Khoa hoc: 1,820/2,096). Is this the original VNTC split?
Section 6.4: The table header says "Benchmark Comparison" but Sen-1 results are "Expected" not measured.
Typo: Appendix B model size "~103 KB" - verify this is the actual size of the joblib files.
Reference 2: GitHub repository is not a citable publication; consider citing the RIVF'07 paper for the dataset.
Suggestions for Improvement
Essential (for resubmission)
Train and evaluate on full VNTC: Report actual accuracy/F1 on the 33,759/50,373 train/test split to validate baseline claims.
Add at least one modern baseline: Fine-tune PhoBERT-base on VNTC and compare. This contextualizes the resource contribution.
Include ablation study: Test impact of (a) word segmentation, (b) n-gram range, (c) vocabulary size on performance.
Recommended
Add error analysis: Examine which categories are confused, analyze failure cases to provide insights.
Report statistical measures: Use k-fold cross-validation with standard deviations.
Expand related work: Include post-2020 Vietnamese NLP advances (PhoBERT, ViSoBERT, SMTCE benchmark).
Test on additional datasets: UIT-VSMEC or UIT-VSFC would show generalization beyond news classification.
Optional
Add inference speed benchmarks: Quantify the speed advantage over transformer models.
Provide pre-trained model on full VNTC: This would make the resource immediately useful.
Include model interpretability analysis: Show top features per category to leverage SVM's interpretability advantage.
Final Recommendation
Decision: Major Revision Required
The paper provides a useful baseline resource for Vietnamese text classification but currently lacks sufficient experimental validation. The 60-sample demonstration is inadequate for assessing model quality. With experiments on the full VNTC dataset and comparison to at least one modern baseline, this could become a valuable resource paper for the Vietnamese NLP community.
The authors should:
- Train on full VNTC and report actual benchmark results
- Compare against PhoBERT baseline
- Include ablation studies
- Expand related work coverage
After these revisions, the paper would be suitable for a resource/system demonstration track at an *ACL venue.
References Used in This Review
- PhoBERT Pre-trained Models
- VNTC Dataset GitHub
- SMTCE Benchmark
- TextGraphFuseGAT
- UIT-VSMEC Dataset
- Vietnamese NLP Progress
- NLP Progress Vietnamese
- ViSoBERT EMNLP 2023
Review completed following ACL Rolling Review guidelines