---
license: gemma
pipeline_tag: text-generation
language:
- en
- km
tags:
- customs
- hs-code
- classification
- cambodia
- gemma
- unsloth
- qlora
base_model:
- unsloth/gemma-4-E4B-it
---

# Gemma‑4 HS Code Classifier (Cambodia Customs)

A **Gemma‑4‑E4B‑it** model fine‑tuned with QLoRA to classify product descriptions into **8‑digit HS codes** and return corresponding Cambodian trade rates (Customs Duty, Special Tax, VAT, Excise Tax).

Built with **[Unsloth](https://github.com/unslothai/unsloth)** for fast, memory‑efficient fine‑tuning on a single T4 GPU.

---

## 🎯 What it does

Given a plain‑English product description, the model generates:

```text
HS Code: 61091000
Unit: PIECE
Customs Duty: 25%
Special Tax: 0%
VAT: 10%
Excise Tax: 0%
```

**⚠️ Important**: The rates in the text are generated by the model and **may be wrong**.  
For production, always use the included **lookup table** (`hs_code_lookup.json`) – see [Production use](#-production-use) below.

---

## 🚀 Quick start (in Colab or locally)

This repository contains **only the LoRA adapter**, not the full model.  
Loading it will automatically download the base model (`unsloth/gemma-4-E4B-it`) and apply the adapter in 4-bit.

```python

# %% [Install]
%%capture
import os, re
# Install everything needed for the T4 Colab environment
!pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
!pip install --no-deps unsloth_zoo bitsandbytes accelerate xformers peft trl triton unsloth
!pip install --no-deps --upgrade "torchao>=0.16.0"
!pip install --no-deps transformers==5.5.0 "tokenizers>=0.22.0,<=0.23.0"
!pip install torchcodec
import torch
torch._dynamo.config.recompile_limit = 64


import warnings

# Suppress the specific PyTorch size check warning from bitsandbytes
warnings.filterwarnings(
    "ignore", 
    category=FutureWarning, 
    message=".*_check_is_size will be removed in a future PyTorch release.*"
)

#------------

from unsloth import FastModel

model, tokenizer = FastModel.from_pretrained(
    "Sothay/gemma4-hscode-classifier",   # LoRA adapter on Hugging Face
    load_in_4bit = True,                 # required – the adapter was trained in 4-bit
    max_seq_length = 1024,
)

# ---------- Inference with the authoritative lookup table (recommended) ----------
import json, re

with open("hs_code_lookup.json") as f:
    rate_lookup = json.load(f)

def predict_hs_code(description: str) -> dict:
    system_prompt = (
        "You are a customs compliance AI. Classify the product description to its "
        "correct 8-digit HS code and output the corresponding trade rates (Customs Duty, "
        "Special Tax, VAT, Excise Tax) and unit."
    )
    messages = [
        {"role": "system", "content": [{"type": "text", "text": system_prompt}]},
        {"role": "user",   "content": [{"type": "text", "text": f"Description: {description}"}]},
    ]
    inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")
    out = model.generate(inputs, max_new_tokens=80, do_sample=False)
    text = tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True)

    m = re.search(r"HS Code:\s*([0-9]{4,10})", text)
    code = m.group(1) if m else None
    if code and code in rate_lookup:
        return {"hs_code": code, "source": "lookup_table", **rate_lookup[code]}
    return {"hs_code": code, "source": "model_only_UNVERIFIED", "raw_output": text}

print(predict_hs_code("Men's cotton knitted T-shirt"))
```

---

## 🔍 Raw model output (debugging)

If you want to see exactly what the model generated (including the rates it predicted) without the lookup table, use the raw‑output function below.  
**Do not** use these rates in production – they are only for debugging or confidence evaluation.

```python
def predict_hs_code_raw(description: str, max_new_tokens=100) -> dict:
    system_prompt = (
        "You are a customs compliance AI. Classify the product description to its "
        "correct 8-digit HS code and output the corresponding trade rates (Customs Duty, "
        "Special Tax, VAT, Excise Tax) and unit."
    )
    messages = [
        {"role": "system", "content": [{"type": "text", "text": system_prompt}]},
        {"role": "user",   "content": [{"type": "text", "text": f"Description: {description}"}]},
    ]
    inputs = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, tokenize=True,
        return_dict=True, return_tensors="pt",
    ).to("cuda")

    out = model.generate(**inputs, max_new_tokens=max_new_tokens, use_cache=True, do_sample=False)
    raw_text = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

    def extract(pattern, text):
        m = re.search(pattern, text)
        return m.group(1).strip() if m else None

    return {
        "hs_code":   extract(r"HS Code:\s*([0-9.]+)", raw_text),
        "unit":      extract(r"Unit:\s*(.*)", raw_text),
        "cd_rate":   extract(r"Customs Duty:\s*([\d.]+)%?", raw_text),
        "st_rate":   extract(r"Special Tax:\s*([\d.]+)%?", raw_text),
        "vat_rate":  extract(r"VAT:\s*([\d.]+)%?", raw_text),
        "et_rate":   extract(r"Excise Tax:\s*([\d.]+)%?", raw_text),
        "raw_output": raw_text
    }

# Example
raw = predict_hs_code_raw("Men's cotton knitted T-shirt")
print(raw["raw_output"])
print(raw["hs_code"])   # model’s guess
```

---

## 🧠 Training details

- **Base model**: `unsloth/gemma-4-E4B-it` (4‑bit QLoRA)
- **Adapter rank**: r=16, alpha=16, targeting all language & attention layers
- **Gradient checkpointing**: Unsloth’s own implementation (avoids Gemma‑4 KV‑shared layer bug)
- **Dataset**: Custom Cambodian HS‑code dataset (`hs_code.csv`) with descriptions, codes, and official rates
  - Cleaned, deduplicated, split into 90/10 train/validation
  - Chat roles fixed to system/user/assistant (Gemma‑4 standard)
- **Training config**: 3 epochs, effective batch size 8, learning rate 2e‑4, linear schedule, eval & save every epoch, best model loaded
- **Hardware**: Google Colab T4 (16 GB) – peak memory ~10 GB thanks to QLoRA
- **Accuracy**: Evaluated on held‑out examples (exact HS‑code match) – see model card for current numbers

---

## ⚖️ Production use

> **Always use the lookup table – never trust the model’s generated rates.**

The model is a **classifier**: description → HS code.  
Rates are fetched deterministically from `hs_code_lookup.json`, a file extracted from the same official tariff data used during training.

Why?  
- A causal LM recalling a rate from memory will occasionally hallucinate – a customs tool with confident, wrong numbers is worse than one that says “I don’t know”.
- The lookup table guarantees 100% accuracy on rates once the HS code is correct.

The `hs_code_lookup.json` file is included in this repository and can be downloaded via:

```python
from huggingface_hub import hf_hub_download
hf_hub_download("Sothay/gemma4-hscode-classifier", "hs_code_lookup.json")
```

---

## 📦 Files in this repository

| File | Description |
|------|-------------|
| `adapter_model.safetensors` | LoRA adapter weights (few MB) |
| `adapter_config.json` | Adapter configuration (references base model) |
| `tokenizer.json`, `tokenizer_config.json` | Tokenizer files |
| `hs_code_lookup.json` | Authoritative rate table for production inference |
| `README.md` | This file |

> **Note**: Only the adapter is stored here – the full Gemma‑4 base model is automatically fetched from Unsloth when you call `FastModel.from_pretrained`.  
> If you need a **merged, full‑precision model** (for vLLM, TGI, etc.), generate it locally with Unsloth:
> ```python
> model.save_pretrained_merged("merged_fp16", tokenizer, save_method="merged_16bit")
> ```

---

## 🦙 Ollama / llama.cpp (GGUF)

Export a quantized GGUF directly from the loaded adapter:

```python
model.save_pretrained_gguf("gguf_model", tokenizer, quantization_method="q4_k_m")
```

Then use with Ollama (see [`Modelfile` example](https://ollama.com) – set temperature 0, deterministic sampling).

---

## 📊 Example predictions

| Description | Predicted HS Code | Unit | CD | ST | VAT | ET |
|-------------|-------------------|------|----|----|-----|----|
| Toyota Hilux pickup, diesel 2.8L | 87042110 | UNIT | 35% | 50% | 10% | 0% |
| iPhone 15 Pro Max 256GB | 85171200 | UNIT | 0% | 0% | 10% | 0% |
| Heineken beer 330ml can | 22030010 | LTR | 35% | 30% | 10% | 0% |

*(Rates from lookup table – not generated by the model.)*

---

## ⚠️ Limitations

- The model may output incorrect HS codes for ambiguous, misspelled, or region‑specific descriptions.
- It was trained on a fixed set of Cambodian HS codes; revisions after the training data cutoff are not covered.
- Duty rates can become outdated – always cross‑check with the latest official tariff schedule.
- The model is a classifier, **not** a legal authority. For binding decisions, consult a customs professional.

---

## 📝 License

This model is a derivative of **Gemma‑4‑E4B‑it** and is subject to the [Gemma license](https://ai.google.dev/gemma/terms).  
The HS‑code dataset and lookup table are the property of their respective owners.

---

## 🙏 Acknowledgments

- [Unsloth](https://github.com/unslothai/unsloth) – made QLoRA + Gemma‑4 on a T4 effortless
- [Google DeepMind](https://deepmind.google) – for the Gemma family of models

---

## 📚 Citation

If you use this model, please cite:

```bibtex
@misc{gemma4-hscode-classifier,
  author = {Sothay},
  title = {Gemma‑4 HS Code Classifier (Cambodia Customs)},
  year = 2025,
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Sothay/gemma4-hscode-classifier}}
}
```

---

**Author**: [Sothay](https://huggingface.co/Sothay)  
**Model card version**: 1.2