HS Code Classifier

HS Code Classifier (English)

A deep learning model for automatic classification of goods by Harmonized System (HS) codes based on English-language product descriptions. The model predicts HS codes at three levels of granularity: 2-digit (chapter), 4-digit (heading), and 6-digit (subheading).


Overview

The Harmonized System is an internationally standardized nomenclature for the classification of traded products. Manual assignment of HS codes is time-consuming and error-prone. This model automates that process from plain English product text, providing multi-level predictions with confidence scores.

Task: Multi-class text classification Input: English product description (free-form text) Output: HS code predictions at 2-, 4-, and 6-digit levels with confidence scores Base model: bert-base-uncased


Performance

The model was trained for 25 epochs and evaluated on a held-out validation set. The results below reflect the best checkpoint selected during training.

Level Granularity Accuracy
2-digit Chapter 97.74%
4-digit Heading 97.50%
6-digit Subheading 90.12%

Training and validation loss progression confirmed stable convergence without overfitting, supported by learning rate scheduling and weight averaging over the final epochs.


Training Details

Parameter Value
Training started 2026-03-09
Total epochs 25
Final training loss 0.40
Hardware GPU
Framework PyTorch + Transformers

Usage

This model uses a custom PyTorch architecture. Loading requires the class definition from the original inference script. Below is a high-level usage example.

Requirements

pip install torch transformers sentencepiece safetensors

Loading and Running Inference

import torch
import json
from transformers import AutoTokenizer, AutoModel

# Load configuration
config = json.load(open("model/model_config.json"))

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("model/tokenizer")

# Load label mappings
label2id_6 = json.load(open("model/label2id_6.json"))
id2label_6 = {v: k for k, v in label2id_6.items()}

# Run inference using the provided inference script:
# python inference.py

For full inference, use the inference.py script included in the repository. It loads all model components, accepts a product description as input, and returns the top-5 HS code candidates with confidence scores at each level.

Output Format

Input: "wireless bluetooth headphones with noise cancellation"

Rank | Code       | Score     | Confidence
-----|------------|-----------|------------------------
  1  | 851830     | 4.12e-01  | 85.21 -> 91.43 -> 87.62
  2  | 851890     | 2.87e-01  | 85.21 -> 91.43 -> 72.18
  3  | 852520     | 1.03e-01  | ...

Each result shows the predicted 6-digit subheading with a chain of probabilities: chapter (2-digit) -> heading (4-digit) -> subheading (6-digit).


Model Files

File Description
cascaded_best.pt Trained model weights
model_config.json Model architecture configuration
label2id_2.json Chapter-level (2-digit) label map
label2id_4.json Heading-level (4-digit) label map
label2id_6.json Subheading-level (6-digit) label map
tokenizer/ Tokenizer files
base_model/ Fine-tuned base transformer weights

Limitations

  • The model was trained on English-language product descriptions. Other languages are not supported.
  • Coverage is limited to HS codes present in the training data. Very rare or newly introduced subheadings may not be recognized.
  • Confidence scores should be treated as relative rankings rather than calibrated probabilities.
  • The model predicts based on text alone. Physical measurements, materials composition, or country-specific tariff rulings are not taken into account.

License

This model is released under the MIT License.


Contact

Developed by ENTUM-AI. For questions or collaboration, contact us via the Hugging Face profile page.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support