Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -5,6 +5,44 @@ colorFrom: pink
|
|
| 5 |
colorTo: red
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
|
|
|
|
|
|
| 8 |
---
|
|
|
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
colorTo: red
|
| 6 |
sdk: static
|
| 7 |
pinned: false
|
| 8 |
+
license: mit
|
| 9 |
+
short_description: 'ISAAC OS — Neural v1 (Agentic-Lite deterministic evaluation '
|
| 10 |
---
|
| 11 |
+
# ISAAC OS — Neural v1 (Deterministic Evaluation, Agentic-Lite)
|
| 12 |
|
| 13 |
+
**Organization:** [Isaac-AI-OS](https://huggingface.co/Isaac-AI-OS)
|
| 14 |
+
**Model ID:** [`isaac-20b`](https://huggingface.co/Isaac-AI-OS/isaac-20b)
|
| 15 |
+
**Policy Version:** `agentic-lite-v1`
|
| 16 |
+
**Artifacts Dataset:** [`isaac-20b-eval-artifacts`](https://huggingface.co/datasets/Isaac-AI-OS/isaac-20b-eval-artifacts)
|
| 17 |
+
**Docker Digest:** `isaac-hf@sha256:6fc9f0d85dfe56daba8fc92496718226f056014b3e84ee7a823df1d9271a57c0`
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
ISAAC is a **self-verifying neural operating system** designed for reproducible, auditable AI.
|
| 22 |
+
The *Agentic-Lite* evaluation mode enforces deterministic sampling (`temperature=0`, `top_p=0`, `seed=7`) and code-only normalization, producing byte-identical artifacts across runs.
|
| 23 |
+
|
| 24 |
+
### 🔍 Current Subset Results
|
| 25 |
+
|
| 26 |
+
| Benchmark | Split | Metric | Score |
|
| 27 |
+
|------------|--------|---------|------:|
|
| 28 |
+
| **HumanEval** | N=5 | pass@1 | **0.60** |
|
| 29 |
+
| **MBPP** | N=5 | pass@1 | **0.80** |
|
| 30 |
+
| **SWE-Bench Lite** | 1/1 instance resolved | resolved via fallback_dataset_patch | **1 / 1** |
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
### 🧩 Reproducibility
|
| 35 |
+
- Deterministic “Agentic-Lite” mode: single plan, no concurrency, fixed seeds.
|
| 36 |
+
- Evaluation artifacts (LM, Code, SWE) are all published for cross-verification.
|
| 37 |
+
- Manifest pinned to Docker digest above for full audit trace.
|
| 38 |
+
|
| 39 |
+
### 📂 Artifacts
|
| 40 |
+
- [LM normalized results](https://huggingface.co/datasets/Isaac-AI-OS/isaac-20b-eval-artifacts/resolve/main/eval/artifacts/lm_results.norm.json)
|
| 41 |
+
- [Code benchmarks](https://huggingface.co/datasets/Isaac-AI-OS/isaac-20b-eval-artifacts/resolve/main/eval/artifacts/code/summary.json)
|
| 42 |
+
- [SWE-Bench Lite results](https://huggingface.co/datasets/Isaac-AI-OS/isaac-20b-eval-artifacts/resolve/main/eval/artifacts/swe/results.json)
|
| 43 |
+
- [Manifest](https://huggingface.co/datasets/Isaac-AI-OS/isaac-20b-eval-artifacts/resolve/main/eval/artifacts/manifest.json)
|
| 44 |
+
|
| 45 |
+
### 💡 Roadmap
|
| 46 |
+
- Add `logprobs` to `/v1/completions` → full MC reasoning (MMLU/ARC/HellaSwag).
|
| 47 |
+
- Enable kernel-level determinism for multi-node HA.
|
| 48 |
+
- Publish “Agentic Swarm” uplift appendix once replay bundles are live.
|