FPGA Pinout Fine-Tuning Kit - Spartan-6

Bộ công cụ để fine-tune LLM trên dữ liệu pinout Xilinx Spartan-6 FPGA.

📦 Files

File	Kích thước	Mục đích	Chạy mấy lần
`build_fpga_db.py`	4KB	Parse JSON spec -> `pin_database.json`	1 lần
`fpga_chatbot.py`	20KB	Chatbot tra cứu (chạy ngay, không cần train)	Luôn dùng
`build_training_data.py`	10KB	Generate `fpga_training_data.jsonl` (5,556 samples)	1 lần
`train_fpga_lora.py`	8KB	Fine-tune Qwen2.5-7B với LoRA	1 lần (train)
`fpga_chatbot_finetuned.py`	9KB	Chatbot dùng model đã fine-tune	Sau khi train

🚀 Quick Start (Không cần train)

# 1. Build database từ JSON spec
python build_fpga_db.py
# Output: pin_database.json (~1.3MB, ~7,000 pins)

# 2. Chạy chatbot ngay
python fpga_chatbot.py --test
python fpga_chatbot.py --gradio  # UI web

🧠 Fine-Tune LLM (Cần GPU)

Yêu cầu phần cứng

GPU	VRAM	Thời gian train	Mode
1x RTX 5090	24GB	~2-3 tiếng	bf16 LoRA
2x RTX 5090	48GB	~1.5-2 tiếng	bf16 LoRA multi-GPU
1x RTX 4090	24GB	~3-4 tiếng	4-bit QLoRA

Bước 1: Chuẩn bị data

# Đã có sẵn fpga_training_data.jsonl (5,556 samples)
# Hoặc tạo lại:
python build_training_data.py

Bước 2: Cài thư viện

pip install transformers datasets peft trl accelerate
# Optional cho 4-bit:
pip install bitsandbytes

Bước 3: Train

# 1x RTX 5090 (bf16, đủ VRAM)
python train_fpga_lora.py --gpu 1

# 2x RTX 5090 (nhanh hơn)
python train_fpga_lora.py --gpu 2

# Nếu VRAM không đủ, dùng 4-bit QLoRA
python train_fpga_lora.py --gpu 1 --4bit

# Train và push lên HuggingFace
python train_fpga_lora.py --gpu 1 --push-to-hub --hub-id your-username/fpga-lora

Tham số training

Tham số	Giá trị	Giải thích
LoRA rank	256	Cao để nhớ facts
Target modules	all-linear	Tất cả linear layers
Learning rate	2e-4	10x cao hơn full FT
Epochs	3	Đủ cho factual data
Batch size	1 x 8 grad accum	Effective batch = 8
Max seq length	2048	Đủ cho Q&A ngắn

Bước 4: Sử dụng model đã train

# Chatbot dùng model fine-tuned
python fpga_chatbot_finetuned.py --gradio

# Hoặc CLI
python fpga_chatbot_finetuned.py
> XC6SLX150T-2FGG484C pin D5

📊 Training Data (5,556 samples)

Format: Conversational (messages)

{
  "messages": [
    {"role": "user", "content": "What is pin D5 on XC6SLX150T-2FGG484C?"},
    {"role": "assistant", "content": "Pin D5 on XC6SLX150T-2FGG484C is **IO_L2N_0** (Bank 0)."}
  ]
}

Các loại câu hỏi trong dataset

Loại	Số lượng	Ví dụ
Pin lookup	~2,000	"Pin A3 trên XC6SLX9 là gì?"
Device info	~300	"LX150T hỗ trợ packages nào?"
Package list	~500	"CS(G)484 có bao nhiêu pin?"
Function search	~1,000	"GCLK pins trên LX150T?"
Part number parse	~200	"XC6SLX150T-2FGG484C là gì?"
Differential pairs	~200	"Differential pair IO_L1 trên LX45?"
Bank summary	~800	"Bank 0 có bao nhiêu pin?"
Tiếng Việt	~1,500	Các câu hỏi trên bằng tiếng Việt

🔍 So sánh: RAG vs Fine-tuned

Capability	RAG (fpga_chatbot.py)	Fine-tuned (train_fpga_lora.py)
Tra cứu pin	⚡ Nhanh, chính xác 100%	⚡ Nhanh, chính xác ~95-98%
Trả lời ngôn ngữ tự nhiên	❌ Chỉ keyword matching	✅ Hiểu "pin A1 là gì"
Suy luận	❌ Không	✅ "A1 và A2 là cặp differential?"
Tốc độ	~50ms	~1-2 giây (LLM generate)
Cần GPU	❌ CPU đủ	✅ Cần GPU inference
Cần train	❌ Không	✅ 2-3 tiếng
Update data mới	Dễ (edit JSON)	Khó (phải retrain)

🎯 Khuyến nghị

Use case	Giải pháp
Chỉ cần tra cứu nhanh, chính xác tuyệt đối	Dùng RAG (`fpga_chatbot.py`)
Muốn AI hiểu ngôn ngữ tự nhiên, trả lời như người	Fine-tune (`train_fpga_lora.py`)
Production, không lỗi được	Hybrid: Fine-tuned model + RAG fallback

🔧 Troubleshooting

OOM khi train

# Giảm batch size hoặc dùng 4-bit
python train_fpga_lora.py --gpu 1 --4bit

Model không trả lời đúng sau train

Tăng epochs lên 5
Tăng LoRA rank lên 512
Kiểm tra dataset có đúng format không

Chạy chậm

# Dùng vLLM cho inference nhanh
pip install vllm
python -m vllm.entrypoints.openai.api_server \
    --model ./fpga-lora-model \
    --gpu-memory-utilization 0.9

📁 Repo

https://huggingface.co/trandangduc0/hdmt-rag-local

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = 'trandangduc0/hdmt-rag-local'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support