Your Name commited on
Commit ·
a23ae7b
1
Parent(s): 85d1dfe
ISAAC OS — Neural v1 (Agentic-Lite deterministic eval)
Browse files
README.md
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags: [evaluation, reproducibility, agentic-lite, code, swe-bench]
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# ISAAC OS — Neural v1 (Deterministic Eval, Agentic-Lite)
|
| 8 |
+
|
| 9 |
+
**Model ID:** `isaac-20b`
|
| 10 |
+
**Policy Version:** `agentic-lite-v1`
|
| 11 |
+
**Docker Digest:** `isaac-hf@sha256:6fc9f0d85dfe56daba8fc92496718226f056014b3e84ee7a823df1d9271a57c0`
|
| 12 |
+
|
| 13 |
+
## Results (subset scale)
|
| 14 |
+
| Benchmark | Split | Metric | Score |
|
| 15 |
+
|---|---|---:|---:|
|
| 16 |
+
| HumanEval | N=5 | pass@1 | **0.60** |
|
| 17 |
+
| MBPP | N=5 | pass@1 | **0.80** |
|
| 18 |
+
| SWE-Bench Lite | 1/1 resolved | model pass@1 | — |
|
| 19 |
+
| | | resolved via **fallback_dataset_patch** | **1 / 1** |
|
| 20 |
+
|
| 21 |
+
## Reproducibility
|
| 22 |
+
Agentic-Lite clamps (temperature=0, top_p=0, top_k=1, n=1, seed=7), deterministic tools (no concurrency, max_steps=6), first-line QA & code-only normalization; one-node eval.
|
| 23 |
+
|
| 24 |
+
## Artifacts & Manifest
|
| 25 |
+
- LM: https://huggingface.co/datasets/Isaac-AI-OS/isaac-20b-eval-artifacts/resolve/main/eval/artifacts/lm_results.norm.json
|
| 26 |
+
- Code summary: https://huggingface.co/datasets/Isaac-AI-OS/isaac-20b-eval-artifacts/resolve/main/eval/artifacts/code/summary.json
|
| 27 |
+
- SWE-Lite: https://huggingface.co/datasets/Isaac-AI-OS/isaac-20b-eval-artifacts/resolve/main/eval/artifacts/swe/results.json
|
| 28 |
+
- Manifest: https://huggingface.co/datasets/Isaac-AI-OS/isaac-20b-eval-artifacts/resolve/main/eval/artifacts/manifest.json
|