clokAI / README.md
ghosthets's picture
Update README.md
8cb7ad4 verified
<div align="center">
```
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•— β–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—
β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘
β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘
β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•— β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘
β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘
β•šβ•β•β•β•β•β•β•šβ•β•β•β•β•β•β• β•šβ•β•β•β•β•β• β•šβ•β• β•šβ•β•β•šβ•β• β•šβ•β•β•šβ•β•
```
# CLOKAI β€” The Spiking-KAN PCB Synthesis Engine
**Circuit Logic Oriented Knowledge AI**
[![Status](https://img.shields.io/badge/Status-Pre--Release%20Alpha-red?style=for-the-badge&logo=rocket)](https://github.com)
[![Architecture](https://img.shields.io/badge/Architecture-ClokArch%20System-blueviolet?style=for-the-badge&logo=buffer)](https://github.com)
[![Parameters](https://img.shields.io/badge/Parameters-~1.5B--1.8B-blue?style=for-the-badge&logo=brain)](https://github.com)
[![Training](https://img.shields.io/badge/Training-2Γ—%20NVIDIA%20T4%20DDP-76b900?style=for-the-badge&logo=nvidia)](https://github.com)
[![Precision](https://img.shields.io/badge/Precision-FP16-orange?style=for-the-badge)](https://github.com)
[![License](https://img.shields.io/badge/License-Apache%202.0-green?style=for-the-badge)](https://github.com)
> *"Not just a language model. A logic engine that thinks in circuits."*
</div>
---
## ⚑ Overview
**CLOKAI** is an experimental heavyweight language model (~1.5B–1.8B parameters), purpose-engineered for the frontier of **Electronic Design Automation (EDA)** and **PCB Logic Synthesis**. Where conventional LLMs predict tokens, CLOKAI extracts logic β€” combining the raw expressivity of Neuromorphic Computing with the mathematical precision of Non-linear Function Approximation.
This is not a fine-tuned chatbot. This is a **ClokArch** β€” a domain-native intelligence forged at the intersection of three revolutionary neural paradigms, designed to make PCB design as intuitive as a conversation.
| | |
|---|---|
| **Datasets** | `Open-Orca/SlimOrca` Β· `Abhishekcr448/Hinglish-Everyday-Conversations-1M` |
| **Languages** | English Β· Hindi (Hinglish) |
| **Task** | Text Generation β†’ Netlist Synthesis Β· Hardware Debugging Β· EDA Reasoning |
| **Model Type** | `clokarch` (Custom Architecture) |
---
## 🧠 Model Architecture β€” *ClokArch*
CLOKAI is a **ClokArch**: a three-architecture fusion that transcends the limitations of standard transformer-based LLMs.
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ CLOKAI ClokArch ENGINE β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ [1] KAN-Integrated Backbone β”‚ β”‚
β”‚ β”‚ Kolmogorov-Arnold Networks β”‚ β”‚
β”‚ β”‚ Learnable Spline Activations β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ ↓ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ [2] Temporal Spiking Attention (TASA) β”‚ β”‚
β”‚ β”‚ SNN Layers + Async Firing Emulation β”‚ β”‚
β”‚ β”‚ Clock-Domain Temporal Processing β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ ↓ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ [3] Neuro-Symbolic Logic Verifier β”‚ β”‚
β”‚ β”‚ KCL / KVL / Ohm's Law Validation β”‚ β”‚
β”‚ β”‚ Latent-Space Constraint Enforcement β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
### 1. KAN-Integrated Backbone *(Kolmogorov-Arnold Networks)*
Standard Multi-Layer Perceptrons have been **surgically replaced** with KAN layers β€” networks built on learnable activation functions defined by B-splines. Instead of fixed activation curves, every neuron in CLOKAI's backbone adapts its own mathematical function during training.
> **Expert Insight:** This grants CLOKAI the ability to **mathematically resolve** hardware logic and parametric circuit constraints β€” not merely predict text patterns associated with them. The model doesn't guess component values; it derives them.
### 2. Temporal Spiking Attention β€” *TASA*
Integrated Spiking Neural Network (SNN) layers emulate the brain's asynchronous firing mechanism at the attention level. The **Time-Aware Spiking Attention (TASA)** mechanism processes information in discrete temporal pulses rather than continuous dense activations.
> **Expert Insight:** TASA enables CLOKAI to process **high-frequency signal integrity** and **clock-domain logic** with genuine temporal accuracy β€” critical for designs where timing is not a suggestion but a constraint.
### 3. Neuro-Symbolic Logic Verifier
Embedded within CLOKAI's latent space is a **Symbolic Verifier** β€” a rule-enforcement layer that intercepts generated outputs and validates them against the immutable laws of electronics: Ohm's Law, Kirchhoff's Current Law (KCL), and Kirchhoff's Voltage Law (KVL).
> **Expert Insight:** This creates a **self-correcting synthesis loop**. CLOKAI doesn't just generate netlists β€” it generates netlists that *pass physical law verification* before they ever leave the model.
---
## πŸ› οΈ Key Capabilities
| Capability | Description |
|---|---|
| πŸ”Œ **Autonomous Netlist Synthesis** | Translate natural language requirements into Altium/KiCad-compatible JSON netlists β€” zero manual schematic entry |
| 🎯 **Component Optimization** | Infer optimal resistor, capacitor, and inductor values from hidden design constraints and circuit context |
| 🌐 **Hinglish Technical Reasoning** | Native-level comprehension and explanation of complex electronics engineering in English and Hinglish |
| πŸ” **Hardware Debugging** | Detect design-rule violations, potential short circuits, and logic conflicts through pure **Logical Inference** β€” no simulation required |
---
## πŸ“Š Technical Specifications
| Parameter | Specification |
|---|---|
| **Parameter Count** | ~1.5 Billion – 1.8 Billion |
| **Architecture** | ClokArch (Custom SNN-KAN Hybrid) |
| **Hidden Dimension** | 1024 |
| **Depth** | 16 Layers |
| **Training Precision** | FP16 with Gradient Checkpointing |
| **Tokenization** | Domain-Specific BPE (VCC, GND, GPIO, PWM, IΒ²C, SPI optimized) |
| **Training Hardware** | 2Γ— NVIDIA T4 GPUs (Distributed Data Parallel) |
| **Languages** | English, Hindi (Hinglish) |
| **License** | Apache 2.0 |
---
## πŸš€ Training & Optimization β€” *The Founder's Secret*
CLOKAI was trained under a bespoke optimization regime on **2Γ— NVIDIA T4 GPUs** in **Distributed Data Parallel (DDP)** mode. Every training decision was made to maximize logic extraction over pattern memorization.
### Entropy Maximization
The data loader employs **high-entropy shuffling** and deliberate **hardware-netlist variability injection**. The training distribution was engineered to be maximally non-repetitive, forcing the model to generalize circuit logic rather than overfit to specific design signatures.
### Warm Restart Schedule
A **Cosine Annealing with Warm Restarts** (SGDR) learning rate schedule was used to aggressively break loss plateaus. Each restart resets the learning rate to escape local minima, progressively narrowing the exploration radius.
### Memory Architecture
Training a ~1.7B parameter ClokArch on constrained VRAM required surgical memory management:
```
Memory Optimization Stack:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FP16 Mixed Precision (Forward Pass) β”‚
β”‚ Activation Checkpointing (Backward) β”‚
β”‚ Bucketed Gradient Sync (DDP Layer) β”‚
β”‚ Dynamic Loss Scaling (Stability) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
↓ Result: ~1.7B params on 2Γ— T4
```
- **Activation Checkpointing** β€” recompute forward activations during backprop instead of storing them
- **Bucketed Gradient Views** β€” DDP gradient communication bucketed for optimal bandwidth utilization
- **FP16 Mixed Precision** β€” half-precision forward passes with FP32 master weights for numerical stability
---
## πŸš€ Quick Start
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Ghosthets/CLOKAI")
model = AutoModelForCausalLM.from_pretrained(
"Ghosthets/CLOKAI",
torch_dtype="auto",
device_map="auto"
)
prompt = "Circuit design for LED with current limiting resistor at 5V:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
do_sample=True,
repetition_penalty=1.1
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## πŸ“¦ Training Data
| Dataset | Purpose |
|---|---|
| [`Open-Orca/SlimOrca`](https://huggingface.co/datasets/Open-Orca/SlimOrca) | General instruction-following and reasoning alignment |
| [`Abhishekcr448/Hinglish-Everyday-Conversations-1M`](https://huggingface.co/datasets/Abhishekcr448/Hinglish-Everyday-Conversations-1M) | Hinglish language comprehension and bilingual dialogue |
> Domain-specific EDA corpora (netlist datasets, schematic descriptions, hardware design documents) were additionally used during training.
---
## πŸ›‘οΈ Pre-Release Status
```
╔══════════════════════════════════════════════════╗
β•‘ ⚠ PRE-RELEASE ALPHA ⚠ β•‘
β•‘ β•‘
β•‘ CLOKAI is currently in active development. β•‘
β•‘ Outputs should be verified before productionβ•‘
β•‘ hardware deployment. β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
```
CLOKAI is in **Pre-Release Alpha**. The architecture is stable; the mission is not yet complete. Current development priorities include expanding the training corpus, refining the Neuro-Symbolic Verifier's constraint ruleset, and optimizing inference latency for real-time PCB design workflows.
The ultimate objective: **redefine AI's role in the EDA industry** β€” making PCB design as natural and accessible as talking to a colleague.
---
## πŸ”­ Roadmap
- [ ] Expand domain-specific tokenizer vocabulary (VHDL, Verilog, SPICE)
- [ ] Release quantized GGUF/AWQ variants for edge deployment
- [ ] Public benchmark suite against baseline EDA-LLMs
- [ ] REST API + KiCad plugin integration
- [ ] Multilingual expansion (Tamil-English, Bangla-English)
- [ ] Full public release with model weights
---
## ⚠️ Limitations & Intended Use
**Intended Use:** CLOKAI is designed for electronics engineers, PCB designers, and EDA researchers working on hardware synthesis, component selection, and circuit debugging tasks.
**Current Limitations:**
- Pre-release alpha β€” outputs must be verified by a qualified engineer before physical hardware deployment
- Complex multi-layer board designs may require iterative prompting
- Symbolic Verifier covers fundamental laws; advanced RF/high-speed signal integrity rules are under active development
---
## πŸ“„ License
This model is released under the **Apache 2.0 License**. See [LICENSE](LICENSE) for full terms.
Training data licenses apply per their respective sources:
- `Open-Orca/SlimOrca` β€” MIT License
- `Abhishekcr448/Hinglish-Everyday-Conversations-1M` β€” See dataset card
---
## πŸ“¬ Citation
If you use CLOKAI in your research or projects, please cite:
```bibtex
@misc{clokai2025,
title = {CLOKAI: The Spiking-KAN PCB Synthesis Engine},
author = {Ghosthets},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/Ghosthets/CLOKAI}},
note = {Pre-Release Alpha β€” ClokArch Architecture}
}
```
---
<div align="center">
```
Made with @Ghosthets. Powered by ClokAI.
```
*CLOKAI β€” Where Neuromorphic Circuits Meet the Language of Design.*
</div>