Commit ·
60f8b77
1
Parent(s): 47076af
readme
Browse files
README.md
CHANGED
|
@@ -1,10 +1,42 @@
|
|
| 1 |
---
|
| 2 |
title: Dowser
|
| 3 |
-
emoji:
|
| 4 |
colorFrom: red
|
| 5 |
colorTo: yellow
|
| 6 |
sdk: docker
|
| 7 |
pinned: false
|
| 8 |
---
|
|
|
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
title: Dowser
|
| 3 |
+
emoji: ⏱️
|
| 4 |
colorFrom: red
|
| 5 |
colorTo: yellow
|
| 6 |
sdk: docker
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
+
## Problem
|
| 10 |
|
| 11 |
+
|
| 12 |
+
AI teams are data constrained, not model constrained and waste millions retraining models on data with little or negative impact.
|
| 13 |
+
|
| 14 |
+
They spend most of their budget collecting, processing, and labeling data without knowing what actually improves performance.
|
| 15 |
+
|
| 16 |
+
This leads to repeated failed retraining cycles, wasted GPU runs, and slow iteration because teams lack insights in which datasets improve the model and which degrade it.
|
| 17 |
+
|
| 18 |
+
## Solution
|
| 19 |
+
|
| 20 |
+
Influence guided training has been shown to halve the convergence time. [*Dowser by Durinn](http://durinn.ai/)* tells AI teams which training data improves model performance and which data hurts it, democratizing what big model providers are doing.
|
| 21 |
+
|
| 22 |
+
## Product
|
| 23 |
+
|
| 24 |
+
[*Dowser*](https://durinn-concept-explorer.azurewebsites.net/) doesn’t just recommend data or provide infrastructure — it directly benchmarks models to produce confident influence scores, with sub-**2-minute** cached results and **10–30 minute** fresh evaluations across **100 open source datasets** on a 8gb RAM and 2 vCPU host.
|
| 25 |
+
|
| 26 |
+
## How it works
|
| 27 |
+
|
| 28 |
+
Teams define a target capability or task → *Dowser* identifies high impact datasets from [Huggingface](https://huggingface.co/) and suggests optimized training directions.
|
| 29 |
+
|
| 30 |
+
## Why now?
|
| 31 |
+
|
| 32 |
+
- Training costs are exploding while performance gains are flattening
|
| 33 |
+
- Synthetic data is increasingly contaminating training pipelines
|
| 34 |
+
- Teams need precision, not more data
|
| 35 |
+
- Influence methods are now viable via proxy models and distillation
|
| 36 |
+
|
| 37 |
+
## Market
|
| 38 |
+
|
| 39 |
+
- Every company training or fine tuning LLMs
|
| 40 |
+
- 59% of AI budgets go to training data
|
| 41 |
+
- 40% of firms spend over 70% of AI budget on data
|
| 42 |
+
- Initial wedge is small and mid sized model teams
|