Instructions to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Rajasrl/VLSI-SLM-V1-CodeLlama-Full")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Rajasrl/VLSI-SLM-V1-CodeLlama-Full", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Rajasrl/VLSI-SLM-V1-CodeLlama-Full"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rajasrl/VLSI-SLM-V1-CodeLlama-Full",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Rajasrl/VLSI-SLM-V1-CodeLlama-Full

SGLang

How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Rajasrl/VLSI-SLM-V1-CodeLlama-Full" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rajasrl/VLSI-SLM-V1-CodeLlama-Full",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Rajasrl/VLSI-SLM-V1-CodeLlama-Full" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Rajasrl/VLSI-SLM-V1-CodeLlama-Full",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with Docker Model Runner:
```
docker model run hf.co/Rajasrl/VLSI-SLM-V1-CodeLlama-Full
```

VLSI-SLM-V1-CodeLlama-Full / README.md

Rajasrl

Update README.md

daeb5f1 verified 14 days ago

preview code

raw

history blame contribute delete

15.8 kB

	---
	license: mit
	language:
	- en
	tags:
	- vlsi
	- verilog
	- systemverilog
	- code-generation
	- hardware-design
	- eda
	- rtl
	- fine-tuned
	- codellama
	- lora
	- edge-ai
	- jetson-orin
	base_model: codellama/CodeLlama-7b-Instruct-hf
	pipeline_tag: text-generation
	library_name: transformers
	model_type: llama
	---

	# VLSI-SLM V1 — CodeLlama Full Model

	> The first open-source, edge-trained, laptop-deployable Small Language Model specialized for VLSI design.

	A 7B parameter CodeLlama model fine-tuned on 30,354 curated VLSI examples — trained entirely on a NVIDIA Jetson Orin edge device with no cloud compute. Generates syntactically correct Verilog, explains VLSI concepts accurately, and runs offline on a 4GB laptop after quantization.

	---

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base Model \| CodeLlama-7B-Instruct \|
	\| Fine-tuning Method \| LoRA (r=32, α=64) \|
	\| Trainable Parameters \| 82,265,088 (1.21% of 6.82B) \|
	\| Training Hardware \| NVIDIA Jetson Orin 64GB (edge device) \|
	\| Training Time \| ~84 hours wall time \|
	\| Dataset Size \| 30,354 examples (train) / 1,681 (val) \|
	\| Training Epochs \| 3 \|
	\| Final Train Loss \| 0.0122 \|
	\| Best Val Loss \| 0.3892 (step 4000) \|
	\| Precision \| bfloat16 (no quantization during training) \|
	\| License \| MIT \|

	### LoRA Configuration
	```python
	LoraConfig(
	r=32,
	lora_alpha=64,
	target_modules=[
	"q_proj", "k_proj", "v_proj", "o_proj", # Attention
	"gate_proj", "up_proj", "down_proj", # MLP/FFN
	"embed_tokens", "lm_head", # Embeddings
	],
	lora_dropout=0.05,
	bias="none",
	task_type="CAUSAL_LM",
	)
	```

	---

	## Repository Contents

	```
	VLSI-SLM-V1-CodeLlama-Full/
	├── final_model/ ← Merged full model (~14GB, bf16 safetensors)
	├── final_adapter/ ← LoRA adapter only (~200MB)
	├── checkpoint-5000/ ← Training checkpoint
	├── checkpoint-5250/ ← Training checkpoint
	├── checkpoint-5500/ ← Training checkpoint
	├── checkpoint-5691/ ← Final training checkpoint
	├── evaluation/ ← Benchmark results and logs
	├── logs/ ← Full training logs
	├── baseline_pre_ft.json ← Base model responses (pre fine-tuning)
	├── best_checkpoint.txt ← Best validation checkpoint info
	├── heartbeat.json ← Last training state
	└── m4_config_v31.json ← Exact training hyperparameters
	```

	---

	## Evaluation Results

	Evaluated using a semantic scoring system (not rigid keyword matching) with `max_new_tokens=1024`.

	### Standard 50-Question VLSI Benchmark

	\| Metric \| Score \| Target \| Status \|
	\|--------\|-------\|--------\|--------\|
	\| Code Syntax Pass (iverilog) \| 60.0% \| 40–60% \| ✅ PASS \|
	\| Concept Accuracy \| 65.0% \| 85–90% \| 🟡 CLOSE \|
	\| Hallucination Rate \| 0.0% \| <5% \| ✅ PERFECT \|
	\| Code Block Formatting \| 95.0% \| — \| ✅ \|
	\| Debug Accuracy \| 60.0% \| — \| 🟡 \|
	\| Overall \| 72.0% \| — \| ✅ \|

	### Coding Stress Test (50 Progressive Questions)

	\| Difficulty \| Questions \| Pass Rate \| Examples \|
	\|-----------\|-----------\|-----------\|---------\|
	\| Easy \| 10 \| 100% \| AND gate, DFF, counter, decoder \|
	\| Medium \| 15 \| 87% \| FIFO, ALU, FSM, synchronizer \|
	\| Hard \| 13 \| 62% \| Async FIFO, AXI-Lite, SPI master \|
	\| Expert \| 12 \| 42% \| FP adder, MBIST, JTAG TAP controller \|

	The model handles all standard VLSI building blocks cleanly. Expert-level complex modules (1000+ tokens) show truncation artifacts — a known training data issue being addressed in V2.

	---

	## Quick Start

	### Load and Run Inference

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_id = "Rajasrl/VLSI-SLM-V1-CodeLlama-Full"

	tokenizer = AutoTokenizer.from_pretrained(f"{model_id}/final_model")
	model = AutoModelForCausalLM.from_pretrained(
	f"{model_id}/final_model",
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)
	model.eval()

	def ask_vlsi(question: str, code_mode: bool = False) -> str:
	if code_mode:
	system = """You are a Senior VLSI RTL Engineer.
	Rules:
	1. Always wrap code in ```verilog blocks
	2. Use non-blocking assignments (<=) in sequential always blocks
	3. Use blocking assignments (=) in combinational always blocks
	4. Always include complete module with endmodule
	5. Never use reserved keywords as signal names"""
	else:
	system = "You are an expert VLSI engineer. Give accurate, technical answers."

	prompt = f"### System:\n{system}\n\n### Instruction:\n{question}\n\n### Response:\n"

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	output = model.generate(
	**inputs,
	max_new_tokens=1024, # Important: use 1024+ for complete modules
	temperature=0.0 if code_mode else 0.1,
	do_sample=not code_mode,
	repetition_penalty=1.1,
	pad_token_id=tokenizer.eos_token_id,
	)

	response = tokenizer.decode(
	output[0][inputs["input_ids"].shape[1]:],
	skip_special_tokens=True
	)
	return response.strip()

	# Code generation (deterministic)
	print(ask_vlsi(
	"Write a parameterizable 8-bit synchronous counter with reset.",
	code_mode=True
	))

	# Concept explanation
	print(ask_vlsi(
	"Explain clock domain crossing and how to handle it safely.",
	code_mode=False
	))
	```

	### Run with Ollama (Recommended for Laptop Deployment)

	First quantize to GGUF:
	```bash
	# Install llama.cpp
	git clone https://github.com/ggerganov/llama.cpp
	cd llama.cpp && make -j4

	# Convert and quantize
	python convert_hf_to_gguf.py ./final_model --outtype f16 \
	--outfile vlsi_slm_v1_f16.gguf

	./llama-quantize vlsi_slm_v1_f16.gguf vlsi_slm_v1_Q4_K_M.gguf Q4_K_M
	# Output: ~4GB file, runs on any laptop
	```

	Create `Modelfile`:
	```
	FROM ./vlsi_slm_v1_Q4_K_M.gguf

	SYSTEM """You are an expert VLSI and Verilog engineer.
	For code: output only syntactically correct, synthesizable Verilog.
	Use non-blocking assignments (<=) in sequential always blocks.
	Always wrap code in ```verilog blocks.
	Always include endmodule.
	For concepts: give accurate, technical explanations."""

	PARAMETER temperature 0.1
	PARAMETER num_ctx 2048
	```

	```bash
	ollama create vlsi-slm-v1 -f Modelfile
	ollama run vlsi-slm-v1
	```

	---

	## What This Model Can Do ✅

	### Strong Capabilities (Easy–Medium complexity)

	Verilog Code Generation:
	- Flip-flops (D, T, JK) with synchronous/asynchronous reset
	- Counters (binary, Gray code, Johnson, LFSR)
	- Multiplexers, encoders, decoders
	- Shift registers (parameterizable width/depth)
	- State machines (Moore and Mealy FSM)
	- Synchronous SRAM and FIFO
	- Clock dividers and pulse generators
	- Debounce circuits
	- Two-flop CDC synchronizers
	- Basic AXI-Lite and handshake protocols
	- Simple UART, SPI, I2C controllers
	- Testbench templates

	VLSI Concept Explanations:
	- Clock Domain Crossing (CDC) and metastability
	- Setup time and hold time analysis
	- Power reduction: clock gating and power gating
	- Static Timing Analysis (STA) concepts
	- Scan chains and Design for Testability (DFT)
	- SRAM vs DRAM differences
	- Electromigration and IR drop
	- AXI, APB, AHB protocol rules
	- Blocking vs non-blocking assignments
	- Latch inference and how to avoid it

	### Partial Capabilities (Hard complexity)

	- Asynchronous FIFO with Gray code pointers (architecture correct, may miss endmodule)
	- Round-robin arbiters
	- Pipeline structures
	- SPI master/slave controllers
	- Branch predictors
	- Memory BIST controllers

	---

	## Known Limitations ⚠️

	### 1. Truncation Artifact (Primary Known Issue)
	Complex modules exceeding ~800 tokens of output may be cut off before `endmodule`. This is a training data artifact — the dataset was generated using free APIs with 1800-token output limits, and truncated examples leaked through. The model learned this truncation pattern as a behavior.

	Workaround: Always set `max_new_tokens=1024` or higher. If output is still truncated, append `\nendmodule` manually — the logic inside is typically correct.

	Fix in progress: V2 training uses strict `endmodule` validation gates in the data pipeline.

	### 2. Concept Accuracy Gap
	Concept accuracy is 65% vs the 85-90% target. Root cause: PDF textbooks were extracted page-by-page (not paragraph-by-paragraph), causing "semantic blur" where opposing concepts (e.g., Setup vs Hold timing) were mixed in the same training example.

	### 3. Submodule Hallucination
	Occasionally instantiates undefined submodules (`fa fa0(...)` style) when asked for gate-level designs. Best avoided by explicitly requesting "behavioral RTL" in your prompt.

	### 4. Not Trained for SoC-Level Design
	This model is optimized for block-level RTL (FIFOs, arbiters, FSMs, protocol controllers). It is not intended for full SoC or chip-level architecture. Expert-level questions (5-stage RISC pipeline, NoC routers, IEEE 754 FP units) are attempted but may be incomplete.

	### 5. Hardware Constraints on Base Hardware
	Trained on a 64GB Jetson Orin. The merged model requires ~15GB RAM. Use the GGUF Q4_K_M quantized version (~4GB) for laptop deployment.

	---

	## Training Details

	### Hardware
	This model was trained entirely on a NVIDIA Jetson Orin 64GB — an edge computing device, with no cloud GPUs used.

	```
	Device : NVIDIA Jetson Orin (64GB unified RAM)
	CUDA : 12.6 (ARM64)
	OS : Ubuntu 22.04
	PyTorch : 2.5.0a0 nv24.8
	Transformers: 4.44.0
	PEFT : 0.18.1
	TRL : 0.8.6
	```

	Important hardware note: bitsandbytes is not compatible with CUDA 12.6 on Jetson Orin ARM64. Training used pure bfloat16 with `adamw_torch` optimizer. If you attempt to run this model on similar ARM64 Jetson hardware, do not use bitsandbytes or NEFTune.

	### Training Configuration
	```python
	TrainingArguments(
	num_train_epochs=3,
	per_device_train_batch_size=1,
	gradient_accumulation_steps=16, # Effective batch = 16
	learning_rate=2e-5,
	lr_scheduler_type="cosine",
	warmup_ratio=0.03,
	bf16=True,
	fp16=False,
	gradient_checkpointing=True,
	optim="adamw_torch",
	max_grad_norm=1.0,
	save_steps=500,
	eval_steps=500,
	save_total_limit=4,
	group_by_length=True,
	)
	```

	### Thermal Management Innovation
	A custom thermal batching system was implemented:
	- Every 250 training steps: save checkpoint → 5-minute cooldown → resume
	- Table fan added for additional airflow
	- Result: GPU temperature maintained at 44–61°C throughout 84-hour run
	- 6 power outages during training — all recovered via atomic heartbeat checkpointing

	### Dataset
	```
	Source : Curated VLSI examples (code + concept + QA)
	Format : Alpaca instruction tuning
	Train : 30,354 examples
	Validation : 1,681 examples
	Test : 1,681 examples
	Categories : 75.8% code_generation, 23.0% concept, 1.2% QA
	Max seq length : 2048 tokens
	Decontamination : ✅ Zero benchmark leaks verified
	```

	---

	## Comparison: Base vs Fine-tuned

	\| Metric \| Base CodeLlama-7B \| VLSI-SLM V1 \|
	\|--------\|------------------\|-------------\|
	\| Verilog syntax knowledge \| General \| VLSI-specialized \|
	\| VLSI concept depth \| Surface-level \| Detailed and accurate \|
	\| Hallucination rate \| ~10% \| 0.0% \|
	\| Code syntax pass (iverilog) \| ~0% \| 60% \|
	\| Runs offline \| ✅ \| ✅ \|
	\| Deployable on laptop \| ✅ (4GB Q4) \| ✅ (4GB Q4) \|
	\| Cost \| Free \| Free \|

	---

	## Roadmap: What V2 Will Fix

	VLSI-SLM V2 is currently in development with the following improvements:

	\| Issue \| V1 Status \| V2 Fix \|
	\|-------\|-----------\|--------\|
	\| Truncated endmodule \| Present in complex modules \| Strict validation gate in data pipeline \|
	\| Concept accuracy 65% \| Below target \| Layout-aware PDF chunking (paragraph-level) \|
	\| Submodule hallucination \| Occasional \| Anti-submodule prompt in data generation \|
	\| Dataset quality \| Quantity-focused (30K) \| Quality-focused (12K clean) \|
	\| JSON data corruption \| Silent patching \| Strict drop-on-failure \|
	\| EOS alignment \| Not enforced \| EOS token after endmodule \|
	\| Concept/code ratio \| 23%/75% \| 50%/50% balanced \|

	Target V2 metrics:
	- Code Syntax Pass: 65–75%
	- Concept Accuracy: 85–90%
	- Hallucination Rate: <2%

	---

	## How to Contribute / Develop Further

	### 1. Improve the Dataset
	The biggest gains come from data quality, not model size.

	```python
	# The most impactful contribution: add validated Verilog examples
	# Requirements:
	# - Must compile with iverilog
	# - Must end with endmodule/endinterface/endpackage
	# - Must be self-contained (no undefined submodules)
	# - Alpaca format: {"instruction": ..., "input": "", "output": ...}

	# Validate before contributing:
	import subprocess
	result = subprocess.run(["iverilog", "-tnull", "your_file.v"],
	capture_output=True, text=True)
	assert result.returncode == 0, f"Syntax error: {result.stderr}"
	assert "endmodule" in open("your_file.v").read()
	```

	### 2. Fine-tune Further on Your Domain
	Use LoRA to specialize for your specific VLSI area:

	```python
	from peft import LoraConfig, get_peft_model, PeftModel

	# Load V1 as base for V2 fine-tuning
	model = AutoModelForCausalLM.from_pretrained(
	"Rajasrl/VLSI-SLM-V1-CodeLlama-Full/final_model",
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	# Add new LoRA adapters for your domain
	# (FPGA-specific, ASIC timing, formal verification, etc.)
	lora_config = LoraConfig(r=16, lora_alpha=32, ...)
	model = get_peft_model(model, lora_config)
	```

	### 3. Extend to SystemVerilog / UVM
	The model has basic SV knowledge but was primarily trained on Verilog-2001.
	Adding UVM testbench examples and SystemVerilog assertions (SVA) would
	significantly improve verification use cases.

	### 4. Add Image Recognition
	A compelling future direction: multi-modal VLSI assistant that can:
	- Read handwritten schematic photos → generate Verilog
	- Analyze timing diagrams → identify violations
	- Recognize circuit board components → explain connections

	### 5. Build a Retrieval-Augmented Generation (RAG) Layer
	Connect the model to a vector database of VLSI standards (IEEE 1800,
	AMBA AXI spec, IEEE 1149.1 JTAG) for factually grounded answers.

	### 6. Evaluation Contributions
	Add more benchmark questions to `evaluation/` folder — especially:
	- Formal verification questions (SVA, PSL)
	- Physical design (placement, routing, DRC)
	- Analog/mixed-signal interfaces
	- RISC-V specific RTL patterns

	---

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{vlsi-slm-v1-2026,
	title = {VLSI-SLM V1: An Edge-Trained Small Language Model for VLSI Design},
	author = {Rajasrl},
	year = {2026},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/Rajasrl/VLSI-SLM-V1-CodeLlama-Full}},
	note = {Fine-tuned CodeLlama-7B on NVIDIA Jetson Orin edge hardware.
	30,354 curated VLSI examples. Zero cloud compute.}
	}
	```

	---

	## The Story

	This model was trained by a final-year engineering student on borrowed edge
	hardware, with no cloud budget, no research lab, and no team. The training
	ran through 6 power outages, lightning storms, and thermal shutdowns — all
	recovered automatically.

	The goal was simple: build a VLSI assistant that works offline, costs
	nothing to run, and belongs to the community — not behind an API paywall.

	"I built an AI to teach me VLSI."

	---

	## License

	MIT License — free to use, modify, and distribute. See LICENSE for details.

	---

	Model trained: March 29 – April 3, 2026
	Uploaded to Hugging Face: May 2026
	Hardware: NVIDIA Jetson Orin 64GB (edge device, no cloud)