| | --- |
| | language: |
| | - en |
| | tags: |
| | - classification |
| | - customs |
| | - trade |
| | - hscode |
| | - product-classification |
| | - pytorch |
| | license: mit |
| | pipeline_tag: text-classification |
| | --- |
| | |
| |  |
| |
|
| | # HS Code Classifier (English) |
| |
|
| | A deep learning model for automatic classification of goods by Harmonized System (HS) codes based on English-language product descriptions. The model predicts HS codes at three levels of granularity: 2-digit (chapter), 4-digit (heading), and 6-digit (subheading). |
| |
|
| | --- |
| |
|
| | ## Overview |
| |
|
| | The Harmonized System is an internationally standardized nomenclature for the classification of traded products. Manual assignment of HS codes is time-consuming and error-prone. This model automates that process from plain English product text, providing multi-level predictions with confidence scores. |
| |
|
| | **Task:** Multi-class text classification |
| | **Input:** English product description (free-form text) |
| | **Output:** HS code predictions at 2-, 4-, and 6-digit levels with confidence scores |
| | **Base model:** `bert-base-uncased` |
| |
|
| | --- |
| |
|
| | ## Performance |
| |
|
| | The model was trained for 25 epochs and evaluated on a held-out validation set. The results below reflect the best checkpoint selected during training. |
| |
|
| | | Level | Granularity | Accuracy | |
| | |-------------|---------------|------------| |
| | | 2-digit | Chapter | **97.74%** | |
| | | 4-digit | Heading | **97.50%** | |
| | | 6-digit | Subheading | **90.12%** | |
| |
|
| | Training and validation loss progression confirmed stable convergence without overfitting, supported by learning rate scheduling and weight averaging over the final epochs. |
| |
|
| | --- |
| |
|
| | ## Training Details |
| |
|
| | | Parameter | Value | |
| | |---------------------|------------------------| |
| | | Training started | 2026-03-09 | |
| | | Total epochs | 25 | |
| | | Final training loss | 0.40 | |
| | | Hardware | GPU | |
| | | Framework | PyTorch + Transformers | |
| |
|
| | --- |
| |
|
| | ## Usage |
| |
|
| | This model uses a custom PyTorch architecture. Loading requires the class definition from the original inference script. Below is a high-level usage example. |
| |
|
| | ### Requirements |
| |
|
| | ```bash |
| | pip install torch transformers sentencepiece safetensors |
| | ``` |
| |
|
| | ### Loading and Running Inference |
| |
|
| | ```python |
| | import torch |
| | import json |
| | from transformers import AutoTokenizer, AutoModel |
| | |
| | # Load configuration |
| | config = json.load(open("model/model_config.json")) |
| | |
| | # Load tokenizer |
| | tokenizer = AutoTokenizer.from_pretrained("model/tokenizer") |
| | |
| | # Load label mappings |
| | label2id_6 = json.load(open("model/label2id_6.json")) |
| | id2label_6 = {v: k for k, v in label2id_6.items()} |
| | |
| | # Run inference using the provided inference script: |
| | # python inference.py |
| | ``` |
| |
|
| | For full inference, use the `inference.py` script included in the repository. It loads all model components, accepts a product description as input, and returns the top-5 HS code candidates with confidence scores at each level. |
| |
|
| | ### Output Format |
| |
|
| | ``` |
| | Input: "wireless bluetooth headphones with noise cancellation" |
| | |
| | Rank | Code | Score | Confidence |
| | -----|------------|-----------|------------------------ |
| | 1 | 851830 | 4.12e-01 | 85.21 -> 91.43 -> 87.62 |
| | 2 | 851890 | 2.87e-01 | 85.21 -> 91.43 -> 72.18 |
| | 3 | 852520 | 1.03e-01 | ... |
| | ``` |
| |
|
| | Each result shows the predicted 6-digit subheading with a chain of probabilities: chapter (2-digit) -> heading (4-digit) -> subheading (6-digit). |
| |
|
| | --- |
| |
|
| | ## Model Files |
| |
|
| | | File | Description | |
| | |-----------------------|------------------------------------| |
| | | `cascaded_best.pt` | Trained model weights | |
| | | `model_config.json` | Model architecture configuration | |
| | | `label2id_2.json` | Chapter-level (2-digit) label map | |
| | | `label2id_4.json` | Heading-level (4-digit) label map | |
| | | `label2id_6.json` | Subheading-level (6-digit) label map | |
| | | `tokenizer/` | Tokenizer files | |
| | | `base_model/` | Fine-tuned base transformer weights | |
| |
|
| | --- |
| |
|
| | ## Limitations |
| |
|
| | - The model was trained on English-language product descriptions. Other languages are not supported. |
| | - Coverage is limited to HS codes present in the training data. Very rare or newly introduced subheadings may not be recognized. |
| | - Confidence scores should be treated as relative rankings rather than calibrated probabilities. |
| | - The model predicts based on text alone. Physical measurements, materials composition, or country-specific tariff rulings are not taken into account. |
| |
|
| | --- |
| |
|
| | ## License |
| |
|
| | This model is released under the MIT License. |
| |
|
| | --- |
| |
|
| | ## Contact |
| |
|
| | Developed by **ENTUM-AI**. |
| | For questions or collaboration, contact us via the Hugging Face profile page. |
| |
|