🍜 Vietnamese Food Classification — V10 CPU SVM Ensemble

Mô hình phân loại ảnh món ăn Việt Nam sử dụng Ensemble 3 SVM (RBF kernel) kết hợp trích đặc trưng thủ công, chạy hoàn toàn trên CPU (không cần GPU).
Được huấn luyện và thực nghiệm trên Google Colab với scikit-learn.

📌 Thông tin mô hình

Thuộc tính	Giá trị
Kiến trúc	Ensemble 3 × SVM RBF (`scikit-learn`)
Phương pháp dự đoán	Majority voting (hard) + trung bình xác suất (soft)
Siêu tham số	C = 0.3, kernel = RBF, gamma = scale, class_weight = balanced
Giảm chiều	PCA giữ 92% phương sai → 653 components
Kích thước ảnh đầu vào	256 × 256 px
Số lớp	7
File model	`v10_balanced_cpu.pkl` (~176 MB)
Framework	scikit-learn 1.4.2

🎯 Các lớp hỗ trợ

STT	Tên món	Số ảnh train
0	Bánh chưng	354
1	Bánh mì	935
2	Bánh xèo	821
3	Bún bò Huế	1 071
4	Bún đậu mắm tôm	640
5	Chả giò	1 480
6	Cháo lòng	751

Tổng: 6 052 ảnh chia train / val / test theo tỉ lệ 70 / 15 / 15 (stratified).

📊 Kết quả thực nghiệm

Kết quả huấn luyện trên Google Colab (CPU runtime):

Tập dữ liệu	Accuracy	F1-score (weighted)
Train	85.10 %	85.06 %
Validation	67.73 %	67.64 %
Test	70.15 %	70.16 %

Kết quả chi tiết trên tập test (per-class):

Món ăn	Precision	Recall	F1-score	Support
Bánh chưng	0.56	0.57	0.56	53
Bánh mì	0.63	0.68	0.65	141
Bánh xèo	0.74	0.63	0.68	123
Bún bò Huế	0.74	0.65	0.70	161
Bún đậu mắm tôm	0.74	0.81	0.77	96
Chả giò	0.76	0.76	0.76	222
Cháo lòng	0.63	0.72	0.68	112
macro avg	0.69	0.69	0.69	908
weighted avg	0.71	0.70	0.70	908

🔧 Phương pháp trích đặc trưng (Feature Extraction)

Mỗi ảnh được trích thành vector đặc trưng thủ công trước khi đưa vào SVM:

Nhóm đặc trưng	Mô tả	Số chiều
RGB Histogram	32 bins × 3 channels (chuẩn hóa)	96
HSV Histogram	32 bins × 3 channels (H: [0,180], S,V: [0,256])	96
LAB Histogram	24 bins × 3 channels	72
HOG	Ảnh resize 64×64, cell 8×8, block 16×16, 9 orientations	1 764
LBP	Ảnh resize 32×32, 8-neighbor, 26 bins	26
Canny Edge Histogram	threshold 50/150, 8 bins	8
Color Moments	Mean, Std, Skewness × 3 kênh màu	9
Color Ratios	R/(B+G), G/(R+B), B/(R+G)	3
SIFT	50 keypoints, mean + std descriptor → 128 chiều	128
Tổng cộng		~2 202

Sau khi trích đặc trưng: StandardScaler → PCA (92% variance) → SVM.

🛡️ Kỹ thuật chống Overfitting

Mỗi trong 3 SVM của ensemble được huấn luyện với:

Feature Noise (σ = 0.05): thêm nhiễu Gaussian vào feature vector
Feature Dropout (5%): đặt ngẫu nhiên 5% features về 0
Subsampling (85%): chỉ dùng 85% dữ liệu train (random, khác seed mỗi model)
class_weight='balanced': cân bằng trọng số theo tần suất lớp

🚀 Cách sử dụng

Tải model từ Hugging Face

from huggingface_hub import hf_hub_download
import pickle

model_path = hf_hub_download(
    repo_id="jamus0702/vn_food_classification",
    filename="v10_balanced_cpu.pkl"
)

with open(model_path, "rb") as f:
    data = pickle.load(f)

label_encoder   = data["label_encoder"]
scaler          = data["scaler"]
pca             = data["pca"]
ensemble_models = data["ensemble_models"]   # list of 3 sklearn SVC
class_names     = data["class_names"]

Inference trên một ảnh

import cv2
import numpy as np
from scipy.stats import skew, mode

IMG_SIZE = (256, 256)

def extract_sift_features(image):
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    sift = cv2.SIFT_create(nfeatures=50)
    _, descriptors = sift.detectAndCompute(gray, None)
    if descriptors is None or len(descriptors) == 0:
        return np.zeros(128)
    return np.concatenate([np.mean(descriptors, 0), np.std(descriptors, 0)])[:128]

def calculate_lbp(img):
    h, w = img.shape
    lbp = np.zeros((h - 2, w - 2), dtype=np.uint8)
    for i in range(1, h - 1):
        for j in range(1, w - 1):
            c = img[i, j]
            code  = (img[i-1,j-1]>=c)<<7 | (img[i-1,j]>=c)<<6 | (img[i-1,j+1]>=c)<<5
            code |= (img[i,j+1]>=c)<<4   | (img[i+1,j+1]>=c)<<3 | (img[i+1,j]>=c)<<2
            code |= (img[i+1,j-1]>=c)<<1 | (img[i,j-1]>=c)<<0
            lbp[i-1, j-1] = code
    return lbp

def extract_features(image):
    features = []
    # RGB
    for i in range(3):
        h = cv2.normalize(cv2.calcHist([image],[i],None,[32],[0,256]), None).flatten()
        features.extend(h)
    # HSV
    hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    for i in range(3):
        rng = [0,180] if i==0 else [0,256]
        h = cv2.normalize(cv2.calcHist([hsv],[i],None,[32],rng), None).flatten()
        features.extend(h)
    # LAB
    lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
    for i in range(3):
        h = cv2.normalize(cv2.calcHist([lab],[i],None,[24],[0,256]), None).flatten()
        features.extend(h)
    # HOG
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    hog_img = cv2.resize(gray, (64, 64))
    hog = cv2.HOGDescriptor((64,64),(16,16),(8,8),(8,8),9)
    features.extend(hog.compute(hog_img).flatten())
    # LBP
    lbp = calculate_lbp(cv2.resize(gray, (32, 32)))
    h = cv2.normalize(cv2.calcHist([lbp],[0],None,[26],[0,256]), None).flatten()
    features.extend(h)
    # Canny edges
    edges = cv2.Canny(gray, 50, 150)
    h = cv2.normalize(cv2.calcHist([edges],[0],None,[8],[0,256]), None).flatten()
    features.extend(h)
    # Color moments
    for i in range(3):
        ch = image[:,:,i].flatten()
        features += [np.mean(ch)/255, np.std(ch)/255, skew(ch)/10]
    # Color ratios
    b,g,r = cv2.split(image.astype(np.float32)+1e-6)
    features += [np.mean(r/(b+g)), np.mean(g/(r+b)), np.mean(b/(r+g))]
    # SIFT
    features.extend(extract_sift_features(image))
    return np.array(features, dtype=np.float32)

# ── Chạy inference ──
img = cv2.imread("your_food_image.jpg")
img = cv2.resize(img, IMG_SIZE)

feat = extract_features(img).reshape(1, -1)
feat_scaled = scaler.transform(feat)
feat_pca    = pca.transform(feat_scaled.astype(np.float32))

# Ensemble predict (majority vote + average proba)
all_preds = []
all_probs = []
for model in ensemble_models:
    all_preds.append(int(model.predict(feat_pca)[0]))
    all_probs.append(model.predict_proba(feat_pca)[0])

prediction    = int(mode(all_preds, keepdims=False)[0])
probabilities = np.mean(all_probs, axis=0)

predicted_class = label_encoder.inverse_transform([prediction])[0]
confidence      = probabilities[prediction] * 100

print(f"Món ăn: {predicted_class}  ({confidence:.1f}%)")
for name, prob in sorted(zip(class_names, probabilities), key=lambda x: -x[1]):
    print(f"  {name}: {prob*100:.1f}%")

🗂️ Nội dung repository

jamus0702/vn_food_classification
└── v10_balanced_cpu.pkl   # Model đã huấn luyện (~176 MB)

Cấu trúc file `.pkl`

{
    "svm_model":       <sklearn.svm.SVC>,        # model đầu tiên trong ensemble
    "ensemble_models": [SVC, SVC, SVC],           # 3 SVM độc lập
    "label_encoder":   <LabelEncoder>,            # 7 nhãn món ăn
    "scaler":          <StandardScaler>,          # fit trên tập train
    "pca":             <PCA>,                     # 653 components, variance=0.92
    "class_names":     ["Banh chung", ...],       # danh sách tên lớp
    "img_size":        (256, 256),
    "best_params":     {"C": 0.3, "gamma": "scale", "kernel": "rbf"},
    "c_value":         0.3
}

📈 Quá trình huấn luyện

Bước	Chi tiết
Môi trường	Google Colab, CPU runtime, Python 3.10
Phân chia dữ liệu	Stratified split: 70% train / 15% val / 15% test
Tăng cường dữ liệu	Horizontal flip, tăng sáng (+15%), giảm sáng (-15%) — tỉ lệ augment tỉ lệ nghịch với số ảnh của class
Cân bằng lớp	Augmentation thích nghi (không dùng SMOTE) + `class_weight='balanced'`
Chống overfitting	Feature noise (σ=0.05) + Feature dropout (5%) + Subsampling (85%)
Thời gian huấn luyện	~36 phút (Colab CPU)

⚠️ Giới hạn

Mô hình chỉ nhận diện 7 món ăn cố định; ảnh ngoài phân phối (OOD) sẽ bị gán nhãn sai.
Độ chính xác test ~70% — phù hợp cho mục đích học thuật / demo, chưa sẵn sàng production quy mô lớn.
Thời gian inference mỗi ảnh ~0.1–0.3 giây trên CPU thông thường (do tính SIFT và LBP thủ công).
Huấn luyện bằng scikit-learn 1.4.2 — cần đúng phiên bản khi load model để tránh lỗi unpickling.

📦 Yêu cầu môi trường

scikit-learn==1.4.2
opencv-python-headless==4.9.0.80   # hoặc opencv-python
numpy==1.26.4
scipy==1.13.0
huggingface_hub>=0.24.0

📜 License

Dự án phát hành dưới MIT License.
Thực hiện bởi nhóm sinh viên, phục vụ môn Machine Learning — không sử dụng cho mục đích thương mại.

🔗 Liên kết

Source code & backend: github.com/jj4002/vietnamese-food-classification
Notebook huấn luyện: training/V10_CPU/v10_balanced_cpu.ipynb

Downloads last month: -

Evaluation results

Test Accuracy
self-reported

0.702
Test F1 (weighted)
self-reported

0.702