vstrandmoe commited on
Commit
60f8b77
·
1 Parent(s): 47076af
Files changed (1) hide show
  1. README.md +34 -2
README.md CHANGED
@@ -1,10 +1,42 @@
1
  ---
2
  title: Dowser
3
- emoji: 📉
4
  colorFrom: red
5
  colorTo: yellow
6
  sdk: docker
7
  pinned: false
8
  ---
 
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Dowser
3
+ emoji: ⏱️
4
  colorFrom: red
5
  colorTo: yellow
6
  sdk: docker
7
  pinned: false
8
  ---
9
+ ## Problem
10
 
11
+
12
+ AI teams are data constrained, not model constrained and waste millions retraining models on data with little or negative impact.
13
+
14
+ They spend most of their budget collecting, processing, and labeling data without knowing what actually improves performance.
15
+
16
+ This leads to repeated failed retraining cycles, wasted GPU runs, and slow iteration because teams lack insights in which datasets improve the model and which degrade it.
17
+
18
+ ## Solution
19
+
20
+ Influence guided training has been shown to halve the convergence time. [*Dowser by Durinn](http://durinn.ai/)* tells AI teams which training data improves model performance and which data hurts it, democratizing what big model providers are doing.
21
+
22
+ ## Product
23
+
24
+ [*Dowser*](https://durinn-concept-explorer.azurewebsites.net/) doesn’t just recommend data or provide infrastructure — it directly benchmarks models to produce confident influence scores, with sub-**2-minute** cached results and **10–30 minute** fresh evaluations across **100 open source datasets** on a 8gb RAM and 2 vCPU host.
25
+
26
+ ## How it works
27
+
28
+ Teams define a target capability or task → *Dowser* identifies high impact datasets from [Huggingface](https://huggingface.co/) and suggests optimized training directions.
29
+
30
+ ## Why now?
31
+
32
+ - Training costs are exploding while performance gains are flattening
33
+ - Synthetic data is increasingly contaminating training pipelines
34
+ - Teams need precision, not more data
35
+ - Influence methods are now viable via proxy models and distillation
36
+
37
+ ## Market
38
+
39
+ - Every company training or fine tuning LLMs
40
+ - 59% of AI budgets go to training data
41
+ - 40% of firms spend over 70% of AI budget on data
42
+ - Initial wedge is small and mid sized model teams