Spaces:
Running
Running
Rename OpenMarkA.md to OpenMark.md
Browse files- OpenMark.md +43 -0
- OpenMarkA.md +0 -13
OpenMark.md
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: OpenMark
|
| 3 |
+
emoji: π―
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: static
|
| 7 |
+
pinned: true
|
| 8 |
+
short_description: "AI model benchmarking platform β compare 100+ models on your own tasks"
|
| 9 |
+
tags:
|
| 10 |
+
- benchmarking
|
| 11 |
+
- llm
|
| 12 |
+
- ai
|
| 13 |
+
- model-evaluation
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# OpenMark β AI Model Benchmarking Platform
|
| 17 |
+
|
| 18 |
+
**Stop trusting leaderboards. Benchmark your own work.**
|
| 19 |
+
|
| 20 |
+
[OpenMark](https://openmark.ai) lets you benchmark 100+ AI models on your own tasks with deterministic scoring, stability metrics, and real API cost tracking.
|
| 21 |
+
|
| 22 |
+
## What Makes OpenMark Different
|
| 23 |
+
|
| 24 |
+
- **Your tasks, not generic tests** β Write any evaluation task (code review, classification, creative writing, vision analysis) and test models against it
|
| 25 |
+
- **Deterministic scoring** β Same prompt, same score, every time. No vibes-based evaluation
|
| 26 |
+
- **Stability metrics** β See which models change their answer across runs (hint: many do)
|
| 27 |
+
- **Real API costs** β Know exactly what each model costs per task, not just per million tokens
|
| 28 |
+
- **100+ models** β OpenAI, Anthropic, Google, Meta, Mistral, xAI, and more. Side-by-side comparison
|
| 29 |
+
|
| 30 |
+
## Why It Matters
|
| 31 |
+
|
| 32 |
+
Generic benchmarks (MMLU, HumanEval, MATH) test models on tasks you'll never use. The only benchmark that matters is yours: does this model, with this prompt, for this task, give you the result you expect β reliably and affordably?
|
| 33 |
+
|
| 34 |
+
## Try It
|
| 35 |
+
|
| 36 |
+
π **[openmark.ai](https://openmark.ai)** β Free to start. No credit card required.
|
| 37 |
+
|
| 38 |
+
## Links
|
| 39 |
+
|
| 40 |
+
- π [Website](https://openmark.ai)
|
| 41 |
+
- π [Why Generic Benchmarks Are Useless](https://dev.to/openmarkai/i-benchmarked-10-ai-models-on-reading-human-emotions-3m0b)
|
| 42 |
+
- π¦ [Twitter/X](https://x.com/OpenMarkAI)
|
| 43 |
+
- πΌ [LinkedIn](https://www.linkedin.com/company/openmark-ai)
|
OpenMarkA.md
DELETED
|
@@ -1,13 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: README
|
| 3 |
-
emoji: π
|
| 4 |
-
colorFrom: purple
|
| 5 |
-
colorTo: green
|
| 6 |
-
sdk: static
|
| 7 |
-
pinned: true
|
| 8 |
-
thumbnail: >-
|
| 9 |
-
https://cdn-uploads.huggingface.co/production/uploads/6997b2c868950cfdb9f34310/yoX33UYjvhN52TZOM2OCW.png
|
| 10 |
-
short_description: AI model benchmarking platform β compare 100+ models on your
|
| 11 |
-
---
|
| 12 |
-
|
| 13 |
-
Edit this `README.md` markdown file to author your organization card.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|