You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Voxyne

A tiny (128.8M-param) byte-level conversational language model with the RahuKetu sigma_K control channel. It runs offline on CPU, and quantized to int4 it stays coherent in ~80 MB — the intended default.

Scope — please read. Voxyne is built to converse, not to know. It has an identity and conversational ability; it is not a knowledge base and will make up facts if asked — knowledge is meant to come from external tools/retrieval (a separate system). Judge it on coherence, not factual recall.

Files — int4 is the default

file	size	use
`voxyne-int4.onnx`	81 MB	DEFAULT — CPU/edge deployment (ONNX Runtime)
`voxyne-int8.onnx`	129 MB	int8 ONNX
`voxyne-fp32.onnx`	511 MB	full-precision ONNX
`voxyne-v0.1.pt`	258 MB	bf16 PyTorch weights (for the `voxyne` package / fine-tuning)

The ONNX graphs are a single-token decode step with an explicit KV cache (inputs h, pk, pv -> outputs logits, npk, npv); the byte embedding and the sigma_K encoder run outside the graph. See the voxyne package for the decode protocol.

Quick start (PyTorch)

pip install voxyne   # then download voxyne-v0.1.pt from this repo

from voxyne import VoxyneConfig, build, load_weights, generate

model, enc = build(VoxyneConfig())
load_weights(model, "voxyne-v0.1.pt")
print(generate(model, enc, "who are you?", device="cpu"))
# -> "I'm Voxyne, an AI assistant created by Ramakrishnan."

Training-data provenance (why the license)

Voxyne's weights are trained on a mix that includes non-commercial sources, so the weights are released under CC BY-NC 4.0 (non-commercial). Free for research, education, and personal use.

stage	sources (examples)	license note
pretrain	FineWeb-Edu, Cosmopedia, TinyStories	permissive / synthetic
grammar	WordNet, FrameNet, GoEmotions	permissive (attribution)
commonsense	ConceptNet, ATOMIC	ConceptNet = CC BY-SA
dialogue	SODA, UltraChat, OASST2, daily_dialog, empathetic_dialogues, WildChat	daily_dialog / empathetic = CC BY-NC; WildChat = AI2 ImpACT
identity	author-written	original

The code (voxyne package) is Apache-2.0; only the weights are NC. A future clean-data retrain (kalki) will carry Apache-licensed weights.

AI-assistance disclosure

Built by Ramakrishnan (ORCID 0009-0006-0905-7275). AI tools assisted with the training/quantization automation and tooling; the model design, direction, and all decisions are the author's.

License

Weights: CC BY-NC 4.0 (non-commercial). Code: Apache-2.0.

Downloads last month: -; Downloads are not tracked for this model. How to track