Spaces:

build-small-hackathon
/

OpenMythos

Running

App Files Files Community

Update README.md

by himanshu17HF - opened 12 days ago

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

+102

-1

Files changed (1) hide show

README.md +102 -1

README.md CHANGED Viewed

@@ -12,4 +12,105 @@ short_description: An Open Source Cyber Security Agent
 license: apache-2.0
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 license: apache-2.0
 ---
+# openMythos 🌌
+**Paste your codebase. Our AI security agent audits the repository** — a multi-level vulnerability analysis, a visual dependency risk path, a declared threat level — then generates an instant, verifiable hotfix patch before threat actors can exploit it.
+Built during the **Hugging Face Small Gradio Hackathon**, openMythos democratizes cutting-edge security auditing. It bridges an immersive retro terminal interface with the elite agentic reasoning and long-context preservation architecture of a fine-tuned dense model.
+> ⚠️ **Proactive Defense.** This platform is engineered for defensive security intelligence. It aims to discover flaws, memory leaks, security configurations, and input bugs instantly, empowering software engineering teams to deploy hotfixes long before a threat vector is weaponized.
+---
+## ▶️ See it in action
+- **Demo video:** TODO — Watch the Social Media Demo Video & Technical Explainer Post
+- **Social post:** TODO — Paste your launch post link here
+---
+## Why it's worth a look
+- 🧠 **Deep Agentic Reasoning, Not a Basic RegEx Scanner.** Powered by a specialized Qwen3.6-27B foundation architecture, openMythos maps complex variable trails and dependency structures across entire software repositories during a single security sweep using its native long-context window.
+- 🎨 **Immersive Retro UI.** No default Gradio look: a distraction-free retro terminal architecture optimized for low-latency code-auditing loops.
+- 🔌 **100% Local & Privacy-First.** Designed as a fully open-source alternative to proprietary security intelligence layers (like Claude's Mythos model). It can be run entirely locally, requiring zero internet connectivity or external dependencies to operate.
+---
+## How it works
+A multi-stage engineering pipeline built around aggregated, industry-standard security sources:
+| Stage | Role | Source Data / Methodology |
+|:-----:|------|---------------------------|
+| **1** | **Data Prep & Aggregation** | Incident reports, GitHub Advisory, VulnHub, and papers. Rigorously trained on BigVul-Filtered and Arvix-Filtered sets. |
+| **2** | **Initial Fine-Tuning (SFT)** | Supervised Fine-Tuning on cybersecurity tasks. Qwen3.6-27B Base (Up to 262k+ token context window). |
+| **3** | **Reinforcement Learning (RLVR)** | Verifiable Reward via vulnerable vs. fixed repo branches. Verified by a separate evaluation model checking fixes. |
+| **4** | **Rigorous Evaluation** | Benchmarked against CyberGYM and SWE Bench Verified. Evaluates historical vulnerabilities and code generation. |
+The entire pipeline leverages highly specialized weights to ensure an elite vulnerability discovery rate. No massive API dependencies anywhere: a clever chain of targeted engineering (**prepare → SFT → RLVR → verify**) delivers the whole security suite.
+```
+Raw Codebase Input
+   └─▶ Stage 1: Data Prep  ─ BigVul & arXiv research paper data curation
+        └─▶ Stage 2: SFT Train  ─ Supervised fine-tuning on targeted cybersecurity tasks
+             └─▶ Stage 3: RLVR Refinement  ─ Reinforcement Learning via Verifiable Rewards (Vulnerable vs Fixed Code)
+                  + CyberGYM & SWE Bench verification models
+                  + Retro Terminal UI output
+                  → Instantly remediated source-code patch
+```
+---
+## Tech
+- **Frontend:** This Gradio 6 Space using an immersive terminal configuration.
+- **Base Architecture Alternative Options:** While utilizing Qwen3.6-27B, the training framework also supports Devstral-Small-2-24B, Magistral-Small, gemma-4-12B-it, and gpt-oss-20b.
+- **Data Integrations:** Hardwired to ingest top-tier vulnerability streams like BigVul-Filtered and ArvixImport-Filtered-Final.
+---
+## Datasets
+* **[BigVul-Filtered](https://huggingface.co/datasets/himanshu17HF/BigVul-Filtered/)** – A curated version of the Big Vulnerability Dataset containing widespread common vulnerabilities, further filtered and optimized for maximum accuracy.
+* **[Arvix-Filtered](https://huggingface.co/datasets/himanshu17HF/ArvixImport-Filtered-Final)** – A collection of filtered academic research papers focused explicitly on programming language vulnerabilities.
+---
+## Run it locally
+```bash
+# Clone the repository and initialize the security agent interface locally
+python app.py
+```
+---
+## 🤝 Project Contributors & Ecosystem Credits
+Developed with ❤️ during the **Hugging Face Small Gradio Hackathon** by:
+- **KingNish** – [HuggingFace Profile](https://huggingface.co/KingNish)
+- **Himanshu** – [HuggingFace Profile](https://huggingface.co/Himanshu)
+---
+## 📜 Citations & Academic Attributions
+```bibtex
+@misc{openmythos2026,
+    title  = {openMythos: Defensive Security Code-Auditing Agent Interface via Qwen3.6 Context Preservation},
+    author = {KingNish and Himanshu},
+    year   = {2026},
+    howpublished = {Hugging Face Small Gradio Hackathon Project Suite}
+}
+@misc{qwen3.6-27b,
+    title  = {{Qwen3.6-27B}: Flagship-Level Coding in a {27B} Dense Model},
+    author = {{Qwen Team}},
+    month  = {April},
+    year   = {2026},
+    url    = {https://qwen.ai/blog?id=qwen3.6-27b}
+}