Text Generation
Transformers
Safetensors
English
vlsi
verilog
systemverilog
code-generation
hardware-design
eda
rtl
fine-tuned
codellama
lora
edge-ai
jetson-orin
Instructions to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Rajasrl/VLSI-SLM-V1-CodeLlama-Full")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Rajasrl/VLSI-SLM-V1-CodeLlama-Full", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Rajasrl/VLSI-SLM-V1-CodeLlama-Full" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Rajasrl/VLSI-SLM-V1-CodeLlama-Full", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Rajasrl/VLSI-SLM-V1-CodeLlama-Full
- SGLang
How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Rajasrl/VLSI-SLM-V1-CodeLlama-Full" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Rajasrl/VLSI-SLM-V1-CodeLlama-Full", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Rajasrl/VLSI-SLM-V1-CodeLlama-Full" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Rajasrl/VLSI-SLM-V1-CodeLlama-Full", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Rajasrl/VLSI-SLM-V1-CodeLlama-Full with Docker Model Runner:
docker model run hf.co/Rajasrl/VLSI-SLM-V1-CodeLlama-Full
| license: mit | |
| language: | |
| - en | |
| tags: | |
| - vlsi | |
| - verilog | |
| - systemverilog | |
| - code-generation | |
| - hardware-design | |
| - eda | |
| - rtl | |
| - fine-tuned | |
| - codellama | |
| - lora | |
| - edge-ai | |
| - jetson-orin | |
| base_model: codellama/CodeLlama-7b-Instruct-hf | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| model_type: llama | |
| # VLSI-SLM V1 β CodeLlama Full Model | |
| > **The first open-source, edge-trained, laptop-deployable Small Language Model specialized for VLSI design.** | |
| A 7B parameter CodeLlama model fine-tuned on 30,354 curated VLSI examples β trained entirely on a NVIDIA Jetson Orin edge device with no cloud compute. Generates syntactically correct Verilog, explains VLSI concepts accurately, and runs offline on a 4GB laptop after quantization. | |
| --- | |
| ## Model Details | |
| | Property | Value | | |
| |----------|-------| | |
| | **Base Model** | CodeLlama-7B-Instruct | | |
| | **Fine-tuning Method** | LoRA (r=32, Ξ±=64) | | |
| | **Trainable Parameters** | 82,265,088 (1.21% of 6.82B) | | |
| | **Training Hardware** | NVIDIA Jetson Orin 64GB (edge device) | | |
| | **Training Time** | ~84 hours wall time | | |
| | **Dataset Size** | 30,354 examples (train) / 1,681 (val) | | |
| | **Training Epochs** | 3 | | |
| | **Final Train Loss** | 0.0122 | | |
| | **Best Val Loss** | 0.3892 (step 4000) | | |
| | **Precision** | bfloat16 (no quantization during training) | | |
| | **License** | MIT | | |
| ### LoRA Configuration | |
| ```python | |
| LoraConfig( | |
| r=32, | |
| lora_alpha=64, | |
| target_modules=[ | |
| "q_proj", "k_proj", "v_proj", "o_proj", # Attention | |
| "gate_proj", "up_proj", "down_proj", # MLP/FFN | |
| "embed_tokens", "lm_head", # Embeddings | |
| ], | |
| lora_dropout=0.05, | |
| bias="none", | |
| task_type="CAUSAL_LM", | |
| ) | |
| ``` | |
| --- | |
| ## Repository Contents | |
| ``` | |
| VLSI-SLM-V1-CodeLlama-Full/ | |
| βββ final_model/ β Merged full model (~14GB, bf16 safetensors) | |
| βββ final_adapter/ β LoRA adapter only (~200MB) | |
| βββ checkpoint-5000/ β Training checkpoint | |
| βββ checkpoint-5250/ β Training checkpoint | |
| βββ checkpoint-5500/ β Training checkpoint | |
| βββ checkpoint-5691/ β Final training checkpoint | |
| βββ evaluation/ β Benchmark results and logs | |
| βββ logs/ β Full training logs | |
| βββ baseline_pre_ft.json β Base model responses (pre fine-tuning) | |
| βββ best_checkpoint.txt β Best validation checkpoint info | |
| βββ heartbeat.json β Last training state | |
| βββ m4_config_v31.json β Exact training hyperparameters | |
| ``` | |
| --- | |
| ## Evaluation Results | |
| Evaluated using a semantic scoring system (not rigid keyword matching) with `max_new_tokens=1024`. | |
| ### Standard 50-Question VLSI Benchmark | |
| | Metric | Score | Target | Status | | |
| |--------|-------|--------|--------| | |
| | Code Syntax Pass (iverilog) | **60.0%** | 40β60% | β PASS | | |
| | Concept Accuracy | **65.0%** | 85β90% | π‘ CLOSE | | |
| | Hallucination Rate | **0.0%** | <5% | β PERFECT | | |
| | Code Block Formatting | **95.0%** | β | β | | |
| | Debug Accuracy | **60.0%** | β | π‘ | | |
| | Overall | **72.0%** | β | β | | |
| ### Coding Stress Test (50 Progressive Questions) | |
| | Difficulty | Questions | Pass Rate | Examples | | |
| |-----------|-----------|-----------|---------| | |
| | Easy | 10 | **100%** | AND gate, DFF, counter, decoder | | |
| | Medium | 15 | **87%** | FIFO, ALU, FSM, synchronizer | | |
| | Hard | 13 | **62%** | Async FIFO, AXI-Lite, SPI master | | |
| | Expert | 12 | **42%** | FP adder, MBIST, JTAG TAP controller | | |
| **The model handles all standard VLSI building blocks cleanly. Expert-level complex modules (1000+ tokens) show truncation artifacts β a known training data issue being addressed in V2.** | |
| --- | |
| ## Quick Start | |
| ### Load and Run Inference | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| import torch | |
| model_id = "Rajasrl/VLSI-SLM-V1-CodeLlama-Full" | |
| tokenizer = AutoTokenizer.from_pretrained(f"{model_id}/final_model") | |
| model = AutoModelForCausalLM.from_pretrained( | |
| f"{model_id}/final_model", | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto", | |
| ) | |
| model.eval() | |
| def ask_vlsi(question: str, code_mode: bool = False) -> str: | |
| if code_mode: | |
| system = """You are a Senior VLSI RTL Engineer. | |
| Rules: | |
| 1. Always wrap code in ```verilog blocks | |
| 2. Use non-blocking assignments (<=) in sequential always blocks | |
| 3. Use blocking assignments (=) in combinational always blocks | |
| 4. Always include complete module with endmodule | |
| 5. Never use reserved keywords as signal names""" | |
| else: | |
| system = "You are an expert VLSI engineer. Give accurate, technical answers." | |
| prompt = f"### System:\n{system}\n\n### Instruction:\n{question}\n\n### Response:\n" | |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) | |
| with torch.no_grad(): | |
| output = model.generate( | |
| **inputs, | |
| max_new_tokens=1024, # Important: use 1024+ for complete modules | |
| temperature=0.0 if code_mode else 0.1, | |
| do_sample=not code_mode, | |
| repetition_penalty=1.1, | |
| pad_token_id=tokenizer.eos_token_id, | |
| ) | |
| response = tokenizer.decode( | |
| output[0][inputs["input_ids"].shape[1]:], | |
| skip_special_tokens=True | |
| ) | |
| return response.strip() | |
| # Code generation (deterministic) | |
| print(ask_vlsi( | |
| "Write a parameterizable 8-bit synchronous counter with reset.", | |
| code_mode=True | |
| )) | |
| # Concept explanation | |
| print(ask_vlsi( | |
| "Explain clock domain crossing and how to handle it safely.", | |
| code_mode=False | |
| )) | |
| ``` | |
| ### Run with Ollama (Recommended for Laptop Deployment) | |
| First quantize to GGUF: | |
| ```bash | |
| # Install llama.cpp | |
| git clone https://github.com/ggerganov/llama.cpp | |
| cd llama.cpp && make -j4 | |
| # Convert and quantize | |
| python convert_hf_to_gguf.py ./final_model --outtype f16 \ | |
| --outfile vlsi_slm_v1_f16.gguf | |
| ./llama-quantize vlsi_slm_v1_f16.gguf vlsi_slm_v1_Q4_K_M.gguf Q4_K_M | |
| # Output: ~4GB file, runs on any laptop | |
| ``` | |
| Create `Modelfile`: | |
| ``` | |
| FROM ./vlsi_slm_v1_Q4_K_M.gguf | |
| SYSTEM """You are an expert VLSI and Verilog engineer. | |
| For code: output only syntactically correct, synthesizable Verilog. | |
| Use non-blocking assignments (<=) in sequential always blocks. | |
| Always wrap code in ```verilog blocks. | |
| Always include endmodule. | |
| For concepts: give accurate, technical explanations.""" | |
| PARAMETER temperature 0.1 | |
| PARAMETER num_ctx 2048 | |
| ``` | |
| ```bash | |
| ollama create vlsi-slm-v1 -f Modelfile | |
| ollama run vlsi-slm-v1 | |
| ``` | |
| --- | |
| ## What This Model Can Do β | |
| ### Strong Capabilities (EasyβMedium complexity) | |
| **Verilog Code Generation:** | |
| - Flip-flops (D, T, JK) with synchronous/asynchronous reset | |
| - Counters (binary, Gray code, Johnson, LFSR) | |
| - Multiplexers, encoders, decoders | |
| - Shift registers (parameterizable width/depth) | |
| - State machines (Moore and Mealy FSM) | |
| - Synchronous SRAM and FIFO | |
| - Clock dividers and pulse generators | |
| - Debounce circuits | |
| - Two-flop CDC synchronizers | |
| - Basic AXI-Lite and handshake protocols | |
| - Simple UART, SPI, I2C controllers | |
| - Testbench templates | |
| **VLSI Concept Explanations:** | |
| - Clock Domain Crossing (CDC) and metastability | |
| - Setup time and hold time analysis | |
| - Power reduction: clock gating and power gating | |
| - Static Timing Analysis (STA) concepts | |
| - Scan chains and Design for Testability (DFT) | |
| - SRAM vs DRAM differences | |
| - Electromigration and IR drop | |
| - AXI, APB, AHB protocol rules | |
| - Blocking vs non-blocking assignments | |
| - Latch inference and how to avoid it | |
| ### Partial Capabilities (Hard complexity) | |
| - Asynchronous FIFO with Gray code pointers (architecture correct, may miss endmodule) | |
| - Round-robin arbiters | |
| - Pipeline structures | |
| - SPI master/slave controllers | |
| - Branch predictors | |
| - Memory BIST controllers | |
| --- | |
| ## Known Limitations β οΈ | |
| ### 1. Truncation Artifact (Primary Known Issue) | |
| Complex modules exceeding ~800 tokens of output may be cut off before `endmodule`. This is a **training data artifact** β the dataset was generated using free APIs with 1800-token output limits, and truncated examples leaked through. The model learned this truncation pattern as a behavior. | |
| **Workaround:** Always set `max_new_tokens=1024` or higher. If output is still truncated, append `\nendmodule` manually β the logic inside is typically correct. | |
| **Fix in progress:** V2 training uses strict `endmodule` validation gates in the data pipeline. | |
| ### 2. Concept Accuracy Gap | |
| Concept accuracy is 65% vs the 85-90% target. Root cause: PDF textbooks were extracted page-by-page (not paragraph-by-paragraph), causing "semantic blur" where opposing concepts (e.g., Setup vs Hold timing) were mixed in the same training example. | |
| ### 3. Submodule Hallucination | |
| Occasionally instantiates undefined submodules (`fa fa0(...)` style) when asked for gate-level designs. Best avoided by explicitly requesting "behavioral RTL" in your prompt. | |
| ### 4. Not Trained for SoC-Level Design | |
| This model is optimized for **block-level RTL** (FIFOs, arbiters, FSMs, protocol controllers). It is not intended for full SoC or chip-level architecture. Expert-level questions (5-stage RISC pipeline, NoC routers, IEEE 754 FP units) are attempted but may be incomplete. | |
| ### 5. Hardware Constraints on Base Hardware | |
| Trained on a 64GB Jetson Orin. The merged model requires ~15GB RAM. Use the GGUF Q4_K_M quantized version (~4GB) for laptop deployment. | |
| --- | |
| ## Training Details | |
| ### Hardware | |
| This model was trained entirely on a **NVIDIA Jetson Orin 64GB** β an edge computing device, with no cloud GPUs used. | |
| ``` | |
| Device : NVIDIA Jetson Orin (64GB unified RAM) | |
| CUDA : 12.6 (ARM64) | |
| OS : Ubuntu 22.04 | |
| PyTorch : 2.5.0a0 nv24.8 | |
| Transformers: 4.44.0 | |
| PEFT : 0.18.1 | |
| TRL : 0.8.6 | |
| ``` | |
| **Important hardware note:** bitsandbytes is **not compatible** with CUDA 12.6 on Jetson Orin ARM64. Training used pure bfloat16 with `adamw_torch` optimizer. If you attempt to run this model on similar ARM64 Jetson hardware, do not use bitsandbytes or NEFTune. | |
| ### Training Configuration | |
| ```python | |
| TrainingArguments( | |
| num_train_epochs=3, | |
| per_device_train_batch_size=1, | |
| gradient_accumulation_steps=16, # Effective batch = 16 | |
| learning_rate=2e-5, | |
| lr_scheduler_type="cosine", | |
| warmup_ratio=0.03, | |
| bf16=True, | |
| fp16=False, | |
| gradient_checkpointing=True, | |
| optim="adamw_torch", | |
| max_grad_norm=1.0, | |
| save_steps=500, | |
| eval_steps=500, | |
| save_total_limit=4, | |
| group_by_length=True, | |
| ) | |
| ``` | |
| ### Thermal Management Innovation | |
| A custom thermal batching system was implemented: | |
| - Every 250 training steps: save checkpoint β 5-minute cooldown β resume | |
| - Table fan added for additional airflow | |
| - Result: GPU temperature maintained at 44β61Β°C throughout 84-hour run | |
| - 6 power outages during training β all recovered via atomic heartbeat checkpointing | |
| ### Dataset | |
| ``` | |
| Source : Curated VLSI examples (code + concept + QA) | |
| Format : Alpaca instruction tuning | |
| Train : 30,354 examples | |
| Validation : 1,681 examples | |
| Test : 1,681 examples | |
| Categories : 75.8% code_generation, 23.0% concept, 1.2% QA | |
| Max seq length : 2048 tokens | |
| Decontamination : β Zero benchmark leaks verified | |
| ``` | |
| --- | |
| ## Comparison: Base vs Fine-tuned | |
| | Metric | Base CodeLlama-7B | VLSI-SLM V1 | | |
| |--------|------------------|-------------| | |
| | Verilog syntax knowledge | General | VLSI-specialized | | |
| | VLSI concept depth | Surface-level | Detailed and accurate | | |
| | Hallucination rate | ~10% | **0.0%** | | |
| | Code syntax pass (iverilog) | ~0% | **60%** | | |
| | Runs offline | β | β | | |
| | Deployable on laptop | β (4GB Q4) | β (4GB Q4) | | |
| | Cost | Free | Free | | |
| --- | |
| ## Roadmap: What V2 Will Fix | |
| **VLSI-SLM V2** is currently in development with the following improvements: | |
| | Issue | V1 Status | V2 Fix | | |
| |-------|-----------|--------| | |
| | Truncated endmodule | Present in complex modules | Strict validation gate in data pipeline | | |
| | Concept accuracy 65% | Below target | Layout-aware PDF chunking (paragraph-level) | | |
| | Submodule hallucination | Occasional | Anti-submodule prompt in data generation | | |
| | Dataset quality | Quantity-focused (30K) | Quality-focused (12K clean) | | |
| | JSON data corruption | Silent patching | Strict drop-on-failure | | |
| | EOS alignment | Not enforced | EOS token after endmodule | | |
| | Concept/code ratio | 23%/75% | 50%/50% balanced | | |
| **Target V2 metrics:** | |
| - Code Syntax Pass: 65β75% | |
| - Concept Accuracy: 85β90% | |
| - Hallucination Rate: <2% | |
| --- | |
| ## How to Contribute / Develop Further | |
| ### 1. Improve the Dataset | |
| The biggest gains come from data quality, not model size. | |
| ```python | |
| # The most impactful contribution: add validated Verilog examples | |
| # Requirements: | |
| # - Must compile with iverilog | |
| # - Must end with endmodule/endinterface/endpackage | |
| # - Must be self-contained (no undefined submodules) | |
| # - Alpaca format: {"instruction": ..., "input": "", "output": ...} | |
| # Validate before contributing: | |
| import subprocess | |
| result = subprocess.run(["iverilog", "-tnull", "your_file.v"], | |
| capture_output=True, text=True) | |
| assert result.returncode == 0, f"Syntax error: {result.stderr}" | |
| assert "endmodule" in open("your_file.v").read() | |
| ``` | |
| ### 2. Fine-tune Further on Your Domain | |
| Use LoRA to specialize for your specific VLSI area: | |
| ```python | |
| from peft import LoraConfig, get_peft_model, PeftModel | |
| # Load V1 as base for V2 fine-tuning | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "Rajasrl/VLSI-SLM-V1-CodeLlama-Full/final_model", | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto", | |
| ) | |
| # Add new LoRA adapters for your domain | |
| # (FPGA-specific, ASIC timing, formal verification, etc.) | |
| lora_config = LoraConfig(r=16, lora_alpha=32, ...) | |
| model = get_peft_model(model, lora_config) | |
| ``` | |
| ### 3. Extend to SystemVerilog / UVM | |
| The model has basic SV knowledge but was primarily trained on Verilog-2001. | |
| Adding UVM testbench examples and SystemVerilog assertions (SVA) would | |
| significantly improve verification use cases. | |
| ### 4. Add Image Recognition | |
| A compelling future direction: multi-modal VLSI assistant that can: | |
| - Read handwritten schematic photos β generate Verilog | |
| - Analyze timing diagrams β identify violations | |
| - Recognize circuit board components β explain connections | |
| ### 5. Build a Retrieval-Augmented Generation (RAG) Layer | |
| Connect the model to a vector database of VLSI standards (IEEE 1800, | |
| AMBA AXI spec, IEEE 1149.1 JTAG) for factually grounded answers. | |
| ### 6. Evaluation Contributions | |
| Add more benchmark questions to `evaluation/` folder β especially: | |
| - Formal verification questions (SVA, PSL) | |
| - Physical design (placement, routing, DRC) | |
| - Analog/mixed-signal interfaces | |
| - RISC-V specific RTL patterns | |
| --- | |
| ## Citation | |
| If you use this model in your research, please cite: | |
| ```bibtex | |
| @misc{vlsi-slm-v1-2026, | |
| title = {VLSI-SLM V1: An Edge-Trained Small Language Model for VLSI Design}, | |
| author = {Rajasrl}, | |
| year = {2026}, | |
| publisher = {Hugging Face}, | |
| howpublished = {\url{https://huggingface.co/Rajasrl/VLSI-SLM-V1-CodeLlama-Full}}, | |
| note = {Fine-tuned CodeLlama-7B on NVIDIA Jetson Orin edge hardware. | |
| 30,354 curated VLSI examples. Zero cloud compute.} | |
| } | |
| ``` | |
| --- | |
| ## The Story | |
| This model was trained by a final-year engineering student on borrowed edge | |
| hardware, with no cloud budget, no research lab, and no team. The training | |
| ran through 6 power outages, lightning storms, and thermal shutdowns β all | |
| recovered automatically. | |
| The goal was simple: build a VLSI assistant that works offline, costs | |
| nothing to run, and belongs to the community β not behind an API paywall. | |
| **"I built an AI to teach me VLSI."** | |
| --- | |
| ## License | |
| MIT License β free to use, modify, and distribute. See LICENSE for details. | |
| --- | |
| *Model trained: March 29 β April 3, 2026* | |
| *Uploaded to Hugging Face: May 2026* | |
| *Hardware: NVIDIA Jetson Orin 64GB (edge device, no cloud)* |