Buckets:

Redvodk
/

Xoron-Dev-MultiMoe-bucket

14.4 GB

32 files

Updated 28 days ago

Ctrl+K

Name	Size	Uploaded	Xet hash
assets		28 days ago	2 items
.gitattributes	1 kB xet	28 days ago	0d0e1528
README.md	7.6 kB xet	28 days ago	b97186d8
added_tokens.json	14.5 kB xet	28 days ago	35f1fbc4
audio_decoder.safetensors	1.46 GB xet	28 days ago	b7f8cec1
audio_encoder.safetensors	466 MB xet	28 days ago	5e974f03
audio_projector.safetensors	2.1 MB xet	28 days ago	35289334
chat_template.jinja	1.12 kB xet	28 days ago	b42afa13
components.json	302 Bytes xet	28 days ago	40ccebd9
config.json	3.71 kB xet	28 days ago	0f2e9690
configuration_xoron.py	14.5 kB xet	28 days ago	a2e4d199
cross_attention.safetensors	174 MB xet	28 days ago	8ddefd22
generator.safetensors	629 MB xet	28 days ago	a0208d6d
llm.safetensors	3.38 GB xet	28 days ago	44cf3761
merges.txt	1.67 MB xet	28 days ago	87912eed
modality_markers.safetensors	12.8 kB xet	28 days ago	221d5f77
model.safetensors.index.json	419 kB xet	28 days ago	00fc6aeb
modeling_xoron.py	434 kB xet	28 days ago	a87b92c2
projector.safetensors	52.9 MB xet	28 days ago	b3da8ca4
special_tokens.json	18.2 kB xet	28 days ago	749933b3
special_tokens_map.json	79.2 kB xet	28 days ago	9afebca0
streaming_state.json	6.21 kB xet	28 days ago	1f04c8d3
tokenizer.json	11.5 MB xet	28 days ago	5642e82b
tokenizer_config.json	111 kB xet	28 days ago	200414b5
trainer_state.json	702 Bytes xet	28 days ago	10b9f6b7
training_state.pt	5.23 GB xet	28 days ago	2c3a248a
video_encoder.safetensors	1.92 GB xet	28 days ago	c8fc6d1c
video_generator.safetensors	61.6 MB xet	28 days ago	8fec007e
vision_encoder.safetensors	1 GB xet	28 days ago	2f0037c4
vocab.json	2.78 MB xet	28 days ago	9208e1be
waveform_decoder.safetensors	34.7 MB xet	28 days ago	5facd14f

README.md

🚀 Xoron-Dev: State-of-the-Art Multimodal MoE

Training-Stage

Xoron-Dev

✨ Xoron-Dev: The Elite SOTA Omni-Modal Intelligence

Xoron-Dev is the definitive open-source architecture for Omni-Modal Artificial Intelligence. Unlike legacy models that treat vision and audio as plugins, Xoron-Dev is designed for native, high-fidelity perception across every major sensory dimension.

🌟 Why Xoron-Dev?

Xoron-Dev represents a massive leap in multimodal reasoning, combining cutting-edge Sparse MoE architecture with a refined sensory stack.

1. 👁️ SOTA Vision (SigLIP-2 & TiTok)

Xoron-Dev exclusively uses SigLIP-2 for superior zero-shot performance and semantic alignment.

TiTok 1D VAE: Images are compressed into 256 ultra-dense tokens, allowing Xoron to "see" high-resolution scenes with unprecedented efficiency.
2D-RoPE: Integrated positional embeddings that maintain spatial relationships regardless of aspect ratio.

2. 🎬 Native Video Intelligence (VidTok)

Our custom VidTok encoder uses 3D Volumetric Compression to ingest up to 32 frames of high-definition video natively. Xoron doesn't just see a sequence of images—it understands motion, causality, and temporal context.

3. 🎙️ Raw PCM Audio (Conformer + BigVGAN)

Xoron-Dev processes Raw 16kHz PCM Audio directly. No Mel Spectrograms, no lossy Fourier transforms.

Micro-Latency S2S: True Speech-to-Speech interactions (<200ms) for natural, fluid conversations.
Zero-Shot Voice Cloning: Instantly clone any voice from a 5-second sample for high-fidelity personalized output.

🧠 The Brain: Aux-Lossless MoE & 128K Ring Attention

A sophisticated Mixture of Experts (MoE) backbone that dynamically routes the logic of every token through specialized hardware-aware sub-networks.

🏗️ Deep Expert Hierarchy

Unlike standard MoE models with uniform experts, Xoron-Dev implements a specialized Deep Expert system.

Expert Pool: 16 Experts Total (8 Standard + 8 Deep).
Variable Logical Depth: Deep Experts possess internal depths scaling from 2 up to 9 layers.
Expert Penalty Routing: A soft utilization penalty ($Cost \propto Depth$) ensures that the model only invokes deeper computation for tasks requiring maximum logical precision, maintaining high inference throughput for simpler tokens.

⚡ Reasoning Acceleration: Fast Ponder

Xoron-Dev features a dedicated FastPonderBlock for near-instant latent deliberation.

Attention-Free Reasoning: By bypassing the $O(N^2)$ Self-Attention stack during thought loops, the Depth-3 reasoning block propagates logic at 120+ thoughts/sec.
Dynamic Halting: A learned halt_head monitors latent entropy. Once the model reaches a decision (entropy threshold < 0.2), it breaks the ponder loop and returns to token decoding, reducing unnecessary FLOPs by up to 90%.

🔘 Infinite Context

Using Ring Attention, Xoron-Dev can analyze books, hour-long videos, or massive codebases with native 128K context window support.

🚀 Get Started with Xorfice

The easiest way to experience Xoron-Dev is via the xorfice engine—the SOTA orchestrator for multimodal deployment.

Installation

pip install xorfice

High-Fidelity Interaction

from xorfice import XoronEngine

# The engine automatically handles weights and optimizations
# Correct model slug: Backup-bdg/Xoron-Dev-MultiMoe
engine = XoronEngine(model_path="Backup-bdg/Xoron-Dev-MultiMoe")

# Start an omni-modal conversation
response = engine.generate(
    prompt="Who is this person and what are they doing?",
    images="https://example.com/interview.jpg",
    videos="https://example.com/interview.mp4"
)
print(response["text"])

📈 SOTA Benchmarks & Features

Feature	Xoron-Dev
Vision Backbone	SigLIP-2
Video Compression	VidTok 3D
Audio Ingestion	Raw PCM
Inference Efficiency	Sparse MoE (5B)
Context Window	128K (Ring)

🎨 Creative Generation

Fully integrated with MobileDiffusion, Xoron-Dev doesn't just understand—it creates.

Text-to-Video (T2V)
Image-to-Video (I2V)
Text-to-Image (T2I)
Image-to-Image (I2I)
Video-to-Video (V2V)

Join the Revolution

Xoron-Dev is more than a model—it's a vision for the future of AI. Build your own multimodal agent today.

Powered by Xoron-Dev Team

Total size: 14.4 GB

Files: 32

Last updated: Jun 6

Pre-warmed CDN: US EU US EU