File size: 4,738 Bytes
6062b2d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | ---
language:
- en
tags:
- classification
- customs
- trade
- hscode
- product-classification
- pytorch
license: mit
pipeline_tag: text-classification
---

# HS Code Classifier (English)
A deep learning model for automatic classification of goods by Harmonized System (HS) codes based on English-language product descriptions. The model predicts HS codes at three levels of granularity: 2-digit (chapter), 4-digit (heading), and 6-digit (subheading).
---
## Overview
The Harmonized System is an internationally standardized nomenclature for the classification of traded products. Manual assignment of HS codes is time-consuming and error-prone. This model automates that process from plain English product text, providing multi-level predictions with confidence scores.
**Task:** Multi-class text classification
**Input:** English product description (free-form text)
**Output:** HS code predictions at 2-, 4-, and 6-digit levels with confidence scores
**Base model:** `bert-base-uncased`
---
## Performance
The model was trained for 25 epochs and evaluated on a held-out validation set. The results below reflect the best checkpoint selected during training.
| Level | Granularity | Accuracy |
|-------------|---------------|------------|
| 2-digit | Chapter | **97.74%** |
| 4-digit | Heading | **97.50%** |
| 6-digit | Subheading | **90.12%** |
Training and validation loss progression confirmed stable convergence without overfitting, supported by learning rate scheduling and weight averaging over the final epochs.
---
## Training Details
| Parameter | Value |
|---------------------|------------------------|
| Training started | 2026-03-09 |
| Total epochs | 25 |
| Final training loss | 0.40 |
| Hardware | GPU |
| Framework | PyTorch + Transformers |
---
## Usage
This model uses a custom PyTorch architecture. Loading requires the class definition from the original inference script. Below is a high-level usage example.
### Requirements
```bash
pip install torch transformers sentencepiece safetensors
```
### Loading and Running Inference
```python
import torch
import json
from transformers import AutoTokenizer, AutoModel
# Load configuration
config = json.load(open("model/model_config.json"))
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("model/tokenizer")
# Load label mappings
label2id_6 = json.load(open("model/label2id_6.json"))
id2label_6 = {v: k for k, v in label2id_6.items()}
# Run inference using the provided inference script:
# python inference.py
```
For full inference, use the `inference.py` script included in the repository. It loads all model components, accepts a product description as input, and returns the top-5 HS code candidates with confidence scores at each level.
### Output Format
```
Input: "wireless bluetooth headphones with noise cancellation"
Rank | Code | Score | Confidence
-----|------------|-----------|------------------------
1 | 851830 | 4.12e-01 | 85.21 -> 91.43 -> 87.62
2 | 851890 | 2.87e-01 | 85.21 -> 91.43 -> 72.18
3 | 852520 | 1.03e-01 | ...
```
Each result shows the predicted 6-digit subheading with a chain of probabilities: chapter (2-digit) -> heading (4-digit) -> subheading (6-digit).
---
## Model Files
| File | Description |
|-----------------------|------------------------------------|
| `cascaded_best.pt` | Trained model weights |
| `model_config.json` | Model architecture configuration |
| `label2id_2.json` | Chapter-level (2-digit) label map |
| `label2id_4.json` | Heading-level (4-digit) label map |
| `label2id_6.json` | Subheading-level (6-digit) label map |
| `tokenizer/` | Tokenizer files |
| `base_model/` | Fine-tuned base transformer weights |
---
## Limitations
- The model was trained on English-language product descriptions. Other languages are not supported.
- Coverage is limited to HS codes present in the training data. Very rare or newly introduced subheadings may not be recognized.
- Confidence scores should be treated as relative rankings rather than calibrated probabilities.
- The model predicts based on text alone. Physical measurements, materials composition, or country-specific tariff rulings are not taken into account.
---
## License
This model is released under the MIT License.
---
## Contact
Developed by **ENTUM-AI**.
For questions or collaboration, contact us via the Hugging Face profile page.
|