File size: 4,738 Bytes
6062b2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---
language:
  - en
tags:
  - classification
  - customs
  - trade
  - hscode
  - product-classification
  - pytorch
license: mit
pipeline_tag: text-classification
---

![HS Code Classifier](hscode_class_entum.webp)

# HS Code Classifier (English)

A deep learning model for automatic classification of goods by Harmonized System (HS) codes based on English-language product descriptions. The model predicts HS codes at three levels of granularity: 2-digit (chapter), 4-digit (heading), and 6-digit (subheading).

---

## Overview

The Harmonized System is an internationally standardized nomenclature for the classification of traded products. Manual assignment of HS codes is time-consuming and error-prone. This model automates that process from plain English product text, providing multi-level predictions with confidence scores.

**Task:** Multi-class text classification
**Input:** English product description (free-form text)
**Output:** HS code predictions at 2-, 4-, and 6-digit levels with confidence scores
**Base model:** `bert-base-uncased`

---

## Performance

The model was trained for 25 epochs and evaluated on a held-out validation set. The results below reflect the best checkpoint selected during training.

| Level       | Granularity   | Accuracy   |
|-------------|---------------|------------|
| 2-digit     | Chapter       | **97.74%** |
| 4-digit     | Heading       | **97.50%** |
| 6-digit     | Subheading    | **90.12%** |

Training and validation loss progression confirmed stable convergence without overfitting, supported by learning rate scheduling and weight averaging over the final epochs.

---

## Training Details

| Parameter           | Value                  |
|---------------------|------------------------|
| Training started    | 2026-03-09             |
| Total epochs        | 25                     |
| Final training loss | 0.40                   |
| Hardware            | GPU                    |
| Framework           | PyTorch + Transformers |

---

## Usage

This model uses a custom PyTorch architecture. Loading requires the class definition from the original inference script. Below is a high-level usage example.

### Requirements

```bash
pip install torch transformers sentencepiece safetensors
```

### Loading and Running Inference

```python
import torch
import json
from transformers import AutoTokenizer, AutoModel

# Load configuration
config = json.load(open("model/model_config.json"))

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("model/tokenizer")

# Load label mappings
label2id_6 = json.load(open("model/label2id_6.json"))
id2label_6 = {v: k for k, v in label2id_6.items()}

# Run inference using the provided inference script:
# python inference.py
```

For full inference, use the `inference.py` script included in the repository. It loads all model components, accepts a product description as input, and returns the top-5 HS code candidates with confidence scores at each level.

### Output Format

```
Input: "wireless bluetooth headphones with noise cancellation"

Rank | Code       | Score     | Confidence
-----|------------|-----------|------------------------
  1  | 851830     | 4.12e-01  | 85.21 -> 91.43 -> 87.62
  2  | 851890     | 2.87e-01  | 85.21 -> 91.43 -> 72.18
  3  | 852520     | 1.03e-01  | ...
```

Each result shows the predicted 6-digit subheading with a chain of probabilities: chapter (2-digit) -> heading (4-digit) -> subheading (6-digit).

---

## Model Files

| File                  | Description                        |
|-----------------------|------------------------------------|
| `cascaded_best.pt`    | Trained model weights              |
| `model_config.json`   | Model architecture configuration   |
| `label2id_2.json`     | Chapter-level (2-digit) label map  |
| `label2id_4.json`     | Heading-level (4-digit) label map  |
| `label2id_6.json`     | Subheading-level (6-digit) label map |
| `tokenizer/`          | Tokenizer files                    |
| `base_model/`         | Fine-tuned base transformer weights |

---

## Limitations

- The model was trained on English-language product descriptions. Other languages are not supported.
- Coverage is limited to HS codes present in the training data. Very rare or newly introduced subheadings may not be recognized.
- Confidence scores should be treated as relative rankings rather than calibrated probabilities.
- The model predicts based on text alone. Physical measurements, materials composition, or country-specific tariff rulings are not taken into account.

---

## License

This model is released under the MIT License.

---

## Contact

Developed by **ENTUM-AI**.
For questions or collaboration, contact us via the Hugging Face profile page.