| --- |
| license: apache-2.0 |
| language: |
| - zh |
| library_name: transformers |
| tags: |
| - snn |
| - spiking-neural-network |
| - text-generation |
| - neuromorphic |
| pipeline_tag: text-generation |
| --- |
| # NeuronSpark-0.9B |
|
|
| ## Introduction |
|
|
| **NeuronSpark-0.9B** is a **0.87-billion parameter language model built entirely on Spiking Neural Networks (SNNs)**. Unlike conventional Transformer-based LLMs that rely on attention mechanisms, NeuronSpark replaces the entire computation backbone with biologically-inspired spiking neurons, achieving language modeling through membrane potential dynamics, surrogate gradient training, and adaptive computation (PonderNet). |
|
|
| This is the **pretrained base model** (85,000 steps on a small subset of Seq-Monkey corpus). |
|
|
| > **Note on training data**: Due to limited compute resources (single DGX Spark), this model was trained on only **~85K steps with a small fraction of the full Seq-Monkey 10B-token corpus**. Despite the minimal training data, the model demonstrates emergent language capabilities — validating the architectural viability of pure SNN language models. We plan to continue scaling with more data and compute in future work. |
|
|
| For the instruction-tuned chat version, see [NeuronSpark-0.9B-Chat](https://huggingface.co/Brain2nd/NeuronSpark-0.9B-Chat). |
|
|
| ## Model Details |
|
|
| | Attribute | Value | |
| |-----------|-------| |
| | Parameters | 874M | |
| | Architecture | SNN Hidden State Space Model | |
| | Hidden Dimension (D) | 896 | |
| | Layers | 20 | |
| | SNN Timesteps (K) | 16 (PonderNet adaptive) | |
| | State Expansion (N) | 8 | |
| | FFN Dimension | 2688 | |
| | Vocabulary | 6144 (custom BPE) | |
| | Context Length | 512 tokens | |
| | Training Data | Seq-Monkey (small subset, Chinese) | |
| | Training Tokens | ~1.4B (of ~10B available) | |
| | Precision | bfloat16 | |
| | License | Apache 2.0 | |
|
|
| ## Architecture Highlights |
|
|
| - **Pure SNN**: No attention, no standard MLP — all computation via PLIF (Parametric Leaky Integrate-and-Fire) neurons |
| - **Membrane Potential Leakage Activation**: PLIFNode outputs `(1-β)·V_post` (leak current), naturally emphasizing fast-responding neurons over slow-memory neurons |
| - **Selective State Space**: Hidden neurons with input-dependent dynamic β(t), α(t), V_th(t) — analogous to selective state space models (Mamba) |
| - **PonderNet Adaptive K**: Each token dynamically decides how many SNN timesteps to use (1~K), with geometric distribution weighting |
| - **Triton Fused Kernels**: Custom PLIF forward/backward kernels, single-pass sequential scan replacing 3-phase approach |
| - **Pre-LN Residual Stream**: Continuous residual flow with RMSNorm, matching Qwen3/LLaMA architecture pattern |
| |
| ## Quickstart |
| |
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| "Brain2nd/NeuronSpark-0.9B", |
| trust_remote_code=True, |
| ) |
| tokenizer = AutoTokenizer.from_pretrained("Brain2nd/NeuronSpark-0.9B") |
| |
| # Text completion |
| text = f"{tokenizer.bos_token}人工智能的发展" |
| input_ids = tokenizer(text, return_tensors="pt")["input_ids"] |
|
|
| output_ids = model.generate( |
| input_ids, |
| max_new_tokens=128, |
| temperature=0.8, |
| top_k=50, |
| eos_token_id=tokenizer.eos_token_id, |
| ) |
| print(tokenizer.decode(output_ids[0], skip_special_tokens=True)) |
| ``` |
| |
| **Example Output:** |
| ``` |
| 人工智能的发展,为人类的未来发展提供了新的机遇。在未来,人工智能将是未来人工智能发展的重要方向。 |
| ``` |
|
|
| ## Requirements |
|
|
| ```bash |
| pip install torch transformers spikingjelly safetensors |
| # For Triton kernels (GPU): pip install triton |
| ``` |
|
|
| ## Training |
|
|
| Trained on a single NVIDIA DGX Spark (GB10, 128GB unified memory) with 4-GPU DDP. |
| Due to compute constraints, training used only a small subset of the full corpus (~85K steps, ~1.4B tokens of ~10B available). Even with this limited data budget, the model acquires basic language generation ability, demonstrating the architectural viability of pure SNN language modeling. |
|
|
| ```bash |
| torchrun --nproc_per_node=4 train_ddp.py \ |
| --D 896 --D_ff 2688 --K 16 --num_layers 20 \ |
| --batch_size 8 --accumulation_steps 8 \ |
| --learning_rate 2e-4 --warmup_iters 1000 |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{neuronspark2025, |
| title={NeuronSpark: A Spiking Neural Network Language Model with Selective State Space Dynamics}, |
| author={Zhengzheng Tang}, |
| year={2025}, |
| url={https://github.com/Brain2nd/NeuronSpark} |
| } |
| ``` |
|
|
| ## Contact |
|
|
| - **Author**: Zhengzheng Tang |
| - **Email**: zztangbu@bu.edu |
| - **GitHub**: [Brain2nd/NeuronSpark](https://github.com/Brain2nd/NeuronSpark) |
|
|
|
|