Backup-bdg commited on
Commit
325016a
ยท
verified ยท
1 Parent(s): 593f123

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +135 -3
README.md CHANGED
@@ -1,3 +1,135 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ library_name: transformers
6
+ tags:
7
+ - multimodal
8
+ - moe
9
+ - text-to-image
10
+ - image editing
11
+ - image to video
12
+ - text-to-video
13
+ - video editing
14
+ - text-to-speech
15
+ - speech-to-text
16
+ - image-to-text
17
+ - video-to-text
18
+ - agentic
19
+ - tool-use
20
+ pipeline_tag: any-to-any
21
+ inference: false
22
+ datasets:
23
+ # === Code & Programming ===
24
+ - m-a-p/Code-Feedback
25
+ - iamtarun/python_code_instructions_18k_alpaca
26
+ - codeparrot/codeparrot-clean
27
+ - bigcode/humanevalpack
28
+ - loubnabnl/github-jupyter-code-to-text
29
+ - saurabh5/rlvr-code-data-Swift
30
+ - finbarr/rlvr-code-data-swift-code-edit
31
+ - ExAi/Code-Golang-QA-2k
32
+ - smcleod/golang-coder
33
+ # === Conversation & Agentic ===
34
+ - databricks/databricks-dolly-15k
35
+ - OpenAssistant/oasst1
36
+ - HuggingFaceH4/no_robots
37
+ - Open-Orca/OpenOrca
38
+ - abhi227070/converstion-to-summarization-dataset
39
+ - allenai/WildChat-1M
40
+ - THUDM/AgentInstruct
41
+ - glaiveai/glaive-code-assistant-v2
42
+ - stingning/ultrachat
43
+ - RyokoAI/ShareGPT52K
44
+ - AlicanKiraz0/Agentic-Chain-of-Thought-Coding-SFT-Dataset
45
+ # === Tool Use ===
46
+ - Locutusque/function-calling-chatml
47
+ - driaforall/pythonic-function-calling
48
+ - argilla/Synth-APIGen-v0.1
49
+ - interstellarninja/tool-calls-singleturn
50
+ - interstellarninja/tool-calls-multiturn
51
+ # === Vision (Image & Video) ===
52
+ - Naveengo/flickr8k
53
+ - ybelkada/football-dataset
54
+ - jmhessel/newyorker_caption_contest
55
+ - derek-thomas/ScienceQA
56
+ - HuggingFaceM4/WebSight
57
+ - lmms-lab/Video-MME
58
+ - MBZUAI/VideoInstruct-100K
59
+ # === Generation (Prompts & Media) ===
60
+ - Gustavosta/Stable-Diffusion-Prompts
61
+ - FredZhang7/stable-diffusion-prompts-2.47M
62
+ - succinctly/midjourney-prompts
63
+ - osunlp/MagicBrush
64
+ - timbrooks/instructpix2pix-clip-filtered
65
+ - Rapidata/sora-video-generation-physics-likert-scoring
66
+ - Rapidata/sora-video-generation-style-likert-scoring
67
+ - Rapidata/sora-video-generation-alignment-likert-scoring
68
+ - Rapidata/text-2-video-human-preferences
69
+ - Rapidata/text-2-video-human-preferences-sora-2
70
+ - TempoFunk/webvid-10M
71
+ - multimodalart/panda-70m
72
+ - nkp37/OpenVid-1M
73
+ - WenhaoWang/VidProM
74
+ - WenhaoWang/TIP-I2V
75
+ - jovianzm/img2vid-pexels-350k
76
+ - TencentARC/MiraData
77
+ - APRIL-AIGC/UltraVideo
78
+ - Mutonix/Vript
79
+ - Rapidata/image-to-video-human-preference-seedance-1-pro
80
+ # === Audio ===
81
+ - openslr/librispeech_asr
82
+ - blabble-io/libritts_r
83
+ - parler-tts/mls_eng_10k
84
+ - MikhailT/hifi-tts
85
+ # === File Ops ===
86
+ - renjiepi/medium_20000-file_operations_n100k1
87
+ ---
88
+
89
+ # ๐Ÿš€ Xoron-Dev: State-of-the-Art Multimodal MoE
90
+
91
+ <div align="center">
92
+
93
+ ![Xoron-Dev Logo](https://img.shields.io/badge/Xoron--Dev-MultiMoE-blue?style=for-the-badge&logo=pytorch)
94
+ ![License](https://img.shields.io/badge/License-MIT-green?style=for-the-badge)
95
+ ![Params](https://img.shields.io/badge/Parameters-1.5B_MoE-yellow?style=for-the-badge)
96
+ ![Context](https://img.shields.io/badge/Context-128K-red?style=for-the-badge)
97
+
98
+ </div>
99
+
100
+ **Xoron-Dev** is a unified, multimodal AI model designed to understand and generate text, images, video, and audio within a single architecture. It leverages a **Mixture of Experts (MoE)** backbone with DeepSeek-style shared experts and integrates SOTA encoders (SigLIP-2) and diffusers (MobileDiffusion) for comprehensive any-to-any capabilities.
101
+
102
+ ## ๐ŸŒŸ Model Highlights
103
+
104
+ * **Architecture:** Mixture of Experts (8 Experts + 1 Shared) with Sliding Window Attention.
105
+ * **Vision:** Native understanding of images (384px) and video (up to 32 frames) via SigLIP-2.
106
+ * **Generation:** Integrated MobileDiffusion for fast on-device Image & Video generation.
107
+ * **Audio:** Full duplex capabilities with Conformer-based ASR (Speech-to-Text) and Neural TTS.
108
+ * **Agentic:** Trained for tool calling, file operations, and code execution with uncertainty estimation.
109
+ * **Context:** Efficient 128K context window using sliding window attention (4096 local window).
110
+
111
+ ---
112
+
113
+ ## ๐Ÿ“š Training Data
114
+
115
+ Xoron-Dev is trained on a massive, curated mix of open-source Hugging Face datasets and specialized synthetic data generated to enhance agentic capabilities and reduce hallucinations.
116
+
117
+ ### ๐ŸŒ Open Source Datasets
118
+ We utilize over 50 high-quality datasets from Hugging Face, categorized by modality:
119
+
120
+ * **Text & Code:** Includes `Code-Feedback`, `HumanEvalPack`, `OpenOrca`, and `AgentInstruct` for robust coding and reasoning capabilities.
121
+ * **Tool Use:** Datasets like `Function-Calling-ChatML` and `Synth-APIGen` enable precise tool invocation.
122
+ * **Vision (Image/Video):** Visual understanding is grounded in `ScienceQA`, `Video-MME`, and `VideoInstruct-100K`.
123
+ * **Generation:** Text-to-Image/Video capabilities are fine-tuned on `Stable-Diffusion-Prompts`, `Sora-Likert-Scoring` datasets by Rapidata, and `WebVid-10M`.
124
+ * **Audio:** Speech tasks are powered by `LibriSpeech`, `LibriTTS-R`, and `HiFi-TTS`.
125
+
126
+ ### ๐Ÿงช Synthetic Data Pipeline
127
+ To bridge the gap between general knowledge and actionable agentic behavior, we generate extensive synthetic datasets locally using our custom `synth` engine. These datasets focus on complex behaviors often missing from public corpuses:
128
+
129
+ | Category | Description |
130
+ |----------|-------------|
131
+ | **Anti-Hallucination** | Training the model to say "I don't know" (`Synth-IDK`), verify facts (`Synth-FactCheck`), and provide citations (`Synth-Citation`) rather than fabricating information. |
132
+ | **System Administration** | Simulated environments for `Docker` setup, `SSH` configuration, database management, and package installation (`Synth-AptInstall`). |
133
+ | **Code Execution** | Traces of code execution including `Shell` errors, timeouts, and multi-step debugging workflows to teach the model how to recover from errors. |
134
+ | **Git Operations** | Simulated version control tasks including committing, handling diffs, and resolving merge conflicts. |
135
+ | **Chain-of-Thought** | explicit `Synth-CoT` data to encourage internal reasoning before generating final answers. |