File size: 13,553 Bytes

<div align="center">

```
 ██████╗██╗      ██████╗ ██╗  ██╗ █████╗ ██╗
██╔════╝██║     ██╔═══██╗██║ ██╔╝██╔══██╗██║
██║     ██║     ██║   ██║█████╔╝ ███████║██║
██║     ██║     ██║   ██║██╔═██╗ ██╔══██║██║
╚██████╗███████╗╚██████╔╝██║  ██╗██║  ██║██║
 ╚═════╝╚══════╝ ╚═════╝ ╚═╝  ╚═╝╚═╝  ╚═╝╚═╝
```

# CLOKAI — The Spiking-KAN PCB Synthesis Engine
**Circuit Logic Oriented Knowledge AI**

[![Status](https://img.shields.io/badge/Status-Pre--Release%20Alpha-red?style=for-the-badge&logo=rocket)](https://github.com)
[![Architecture](https://img.shields.io/badge/Architecture-ClokArch%20System-blueviolet?style=for-the-badge&logo=buffer)](https://github.com)
[![Parameters](https://img.shields.io/badge/Parameters-~1.5B--1.8B-blue?style=for-the-badge&logo=brain)](https://github.com)
[![Training](https://img.shields.io/badge/Training-2×%20NVIDIA%20T4%20DDP-76b900?style=for-the-badge&logo=nvidia)](https://github.com)
[![Precision](https://img.shields.io/badge/Precision-FP16-orange?style=for-the-badge)](https://github.com)
[![License](https://img.shields.io/badge/License-Apache%202.0-green?style=for-the-badge)](https://github.com)

> *"Not just a language model. A logic engine that thinks in circuits."*

</div>

---

## ⚡ Overview

**CLOKAI** is an experimental heavyweight language model (~1.5B–1.8B parameters), purpose-engineered for the frontier of **Electronic Design Automation (EDA)** and **PCB Logic Synthesis**. Where conventional LLMs predict tokens, CLOKAI extracts logic — combining the raw expressivity of Neuromorphic Computing with the mathematical precision of Non-linear Function Approximation.

This is not a fine-tuned chatbot. This is a **ClokArch** — a domain-native intelligence forged at the intersection of three revolutionary neural paradigms, designed to make PCB design as intuitive as a conversation.

| | |
|---|---|
| **Datasets** | `Open-Orca/SlimOrca` · `Abhishekcr448/Hinglish-Everyday-Conversations-1M` |
| **Languages** | English · Hindi (Hinglish) |
| **Task** | Text Generation → Netlist Synthesis · Hardware Debugging · EDA Reasoning |
| **Model Type** | `clokarch` (Custom Architecture) |

---

## 🧠 Model Architecture — *ClokArch*

CLOKAI is a **ClokArch**: a three-architecture fusion that transcends the limitations of standard transformer-based LLMs.

```
┌─────────────────────────────────────────────────────────┐
│                 CLOKAI ClokArch ENGINE             │
│                                                    │
│   ┌───────────────────────────────────────────────┐    │
│   │  [1] KAN-Integrated Backbone              │    │
│   │      Kolmogorov-Arnold Networks           │    │
│   │      Learnable Spline Activations         │    │
│   └───────────────────────────────────────────────┘    │
│                         ↓                          │
│   ┌───────────────────────────────────────────────┐    │
│   │  [2] Temporal Spiking Attention (TASA)    │    │
│   │      SNN Layers + Async Firing Emulation  │    │
│   │      Clock-Domain Temporal Processing     │    │
│   └───────────────────────────────────────────────┘    │
│                         ↓                          │
│   ┌───────────────────────────────────────────────┐    │
│   │  [3] Neuro-Symbolic Logic Verifier        │    │
│   │      KCL / KVL / Ohm's Law Validation     │    │
│   │      Latent-Space Constraint Enforcement  │    │
│   └───────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────┘
```

### 1. KAN-Integrated Backbone *(Kolmogorov-Arnold Networks)*

Standard Multi-Layer Perceptrons have been **surgically replaced** with KAN layers — networks built on learnable activation functions defined by B-splines. Instead of fixed activation curves, every neuron in CLOKAI's backbone adapts its own mathematical function during training.

> **Expert Insight:** This grants CLOKAI the ability to **mathematically resolve** hardware logic and parametric circuit constraints — not merely predict text patterns associated with them. The model doesn't guess component values; it derives them.

### 2. Temporal Spiking Attention — *TASA*

Integrated Spiking Neural Network (SNN) layers emulate the brain's asynchronous firing mechanism at the attention level. The **Time-Aware Spiking Attention (TASA)** mechanism processes information in discrete temporal pulses rather than continuous dense activations.

> **Expert Insight:** TASA enables CLOKAI to process **high-frequency signal integrity** and **clock-domain logic** with genuine temporal accuracy — critical for designs where timing is not a suggestion but a constraint.

### 3. Neuro-Symbolic Logic Verifier

Embedded within CLOKAI's latent space is a **Symbolic Verifier** — a rule-enforcement layer that intercepts generated outputs and validates them against the immutable laws of electronics: Ohm's Law, Kirchhoff's Current Law (KCL), and Kirchhoff's Voltage Law (KVL).

> **Expert Insight:** This creates a **self-correcting synthesis loop**. CLOKAI doesn't just generate netlists — it generates netlists that *pass physical law verification* before they ever leave the model.

---

## 🛠️ Key Capabilities

| Capability | Description |
|---|---|
| 🔌 **Autonomous Netlist Synthesis** | Translate natural language requirements into Altium/KiCad-compatible JSON netlists — zero manual schematic entry |
| 🎯 **Component Optimization** | Infer optimal resistor, capacitor, and inductor values from hidden design constraints and circuit context |
| 🌐 **Hinglish Technical Reasoning** | Native-level comprehension and explanation of complex electronics engineering in English and Hinglish |
| 🔍 **Hardware Debugging** | Detect design-rule violations, potential short circuits, and logic conflicts through pure **Logical Inference** — no simulation required |

---

## 📊 Technical Specifications

| Parameter | Specification |
|---|---|
| **Parameter Count** | ~1.5 Billion – 1.8 Billion |
| **Architecture** | ClokArch (Custom SNN-KAN Hybrid) |
| **Hidden Dimension** | 1024 |
| **Depth** | 16 Layers |
| **Training Precision** | FP16 with Gradient Checkpointing |
| **Tokenization** | Domain-Specific BPE (VCC, GND, GPIO, PWM, I²C, SPI optimized) |
| **Training Hardware** | 2× NVIDIA T4 GPUs (Distributed Data Parallel) |
| **Languages** | English, Hindi (Hinglish) |
| **License** | Apache 2.0 |

---

## 🚀 Training & Optimization — *The Founder's Secret*

CLOKAI was trained under a bespoke optimization regime on **2× NVIDIA T4 GPUs** in **Distributed Data Parallel (DDP)** mode. Every training decision was made to maximize logic extraction over pattern memorization.

### Entropy Maximization
The data loader employs **high-entropy shuffling** and deliberate **hardware-netlist variability injection**. The training distribution was engineered to be maximally non-repetitive, forcing the model to generalize circuit logic rather than overfit to specific design signatures.

### Warm Restart Schedule
A **Cosine Annealing with Warm Restarts** (SGDR) learning rate schedule was used to aggressively break loss plateaus. Each restart resets the learning rate to escape local minima, progressively narrowing the exploration radius.

### Memory Architecture
Training a ~1.7B parameter ClokArch on constrained VRAM required surgical memory management:

```
Memory Optimization Stack:
┌──────────────────────────────────────────┐
│  FP16 Mixed Precision (Forward Pass)  │
│  Activation Checkpointing (Backward)  │
│  Bucketed Gradient Sync (DDP Layer)   │
│  Dynamic Loss Scaling (Stability)     │
└──────────────────────────────────────────┘
         ↓ Result: ~1.7B params on 2× T4
```

- **Activation Checkpointing** — recompute forward activations during backprop instead of storing them
- **Bucketed Gradient Views** — DDP gradient communication bucketed for optimal bandwidth utilization
- **FP16 Mixed Precision** — half-precision forward passes with FP32 master weights for numerical stability

---

## 🚀 Quick Start

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ghosthets/CLOKAI")
model = AutoModelForCausalLM.from_pretrained(
    "Ghosthets/CLOKAI",
    torch_dtype="auto",
    device_map="auto"
)

prompt = "Circuit design for LED with current limiting resistor at 5V:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    do_sample=True,
    repetition_penalty=1.1
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## 📦 Training Data

| Dataset | Purpose |
|---|---|
| [`Open-Orca/SlimOrca`](https://huggingface.co/datasets/Open-Orca/SlimOrca) | General instruction-following and reasoning alignment |
| [`Abhishekcr448/Hinglish-Everyday-Conversations-1M`](https://huggingface.co/datasets/Abhishekcr448/Hinglish-Everyday-Conversations-1M) | Hinglish language comprehension and bilingual dialogue |

> Domain-specific EDA corpora (netlist datasets, schematic descriptions, hardware design documents) were additionally used during training.

---

## 🛡️ Pre-Release Status

```
╔══════════════════════════════════════════════════╗
║           ⚠  PRE-RELEASE ALPHA  ⚠           ║
║                                              ║
║  CLOKAI is currently in active development.  ║
║  Outputs should be verified before production║
║  hardware deployment.                        ║
╚══════════════════════════════════════════════════╝
```

CLOKAI is in **Pre-Release Alpha**. The architecture is stable; the mission is not yet complete. Current development priorities include expanding the training corpus, refining the Neuro-Symbolic Verifier's constraint ruleset, and optimizing inference latency for real-time PCB design workflows.

The ultimate objective: **redefine AI's role in the EDA industry** — making PCB design as natural and accessible as talking to a colleague.

---

## 🔭 Roadmap

- [ ] Expand domain-specific tokenizer vocabulary (VHDL, Verilog, SPICE)
- [ ] Release quantized GGUF/AWQ variants for edge deployment
- [ ] Public benchmark suite against baseline EDA-LLMs
- [ ] REST API + KiCad plugin integration
- [ ] Multilingual expansion (Tamil-English, Bangla-English)
- [ ] Full public release with model weights

---

## ⚠️ Limitations & Intended Use

**Intended Use:** CLOKAI is designed for electronics engineers, PCB designers, and EDA researchers working on hardware synthesis, component selection, and circuit debugging tasks.

**Current Limitations:**
- Pre-release alpha — outputs must be verified by a qualified engineer before physical hardware deployment
- Complex multi-layer board designs may require iterative prompting
- Symbolic Verifier covers fundamental laws; advanced RF/high-speed signal integrity rules are under active development

---

## 📄 License

This model is released under the **Apache 2.0 License**. See [LICENSE](LICENSE) for full terms.

Training data licenses apply per their respective sources:
- `Open-Orca/SlimOrca` — MIT License
- `Abhishekcr448/Hinglish-Everyday-Conversations-1M` — See dataset card

---

## 📬 Citation

If you use CLOKAI in your research or projects, please cite:

```bibtex
@misc{clokai2025,
  title        = {CLOKAI: The Spiking-KAN PCB Synthesis Engine},
  author       = {Ghosthets},
  year         = {2025},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Ghosthets/CLOKAI}},
  note         = {Pre-Release Alpha — ClokArch Architecture}
}
```

---

<div align="center">

```
Made with @Ghosthets. Powered by ClokAI.
```

*CLOKAI — Where Neuromorphic Circuits Meet the Language of Design.*

</div>