Spaces:

durinn
/

dowser

Running

App Files Files Community

dowser / README.md

vstrandmoe

pinned true

6be34f3 8 days ago

preview code

raw

history blame contribute delete

2.08 kB

	---
	title: Dowser
	emoji: ⏱️
	colorFrom: red
	colorTo: yellow
	sdk: docker
	pinned: true
	tags:
	- llm
	- language-models
	- training-data
	- dataset-analysis
	- data-selection
	- data-efficiency
	- model-evaluation
	- benchmarking
	- fine-tuning
	- machine-learning
	- deep-learning
	- nlp
	- transformers
	- research
	- mlops
	- model-training
	- evaluation
	---
	## Problem


	AI teams are data constrained, not model constrained and waste millions retraining models on data with little or negative impact.

	They spend most of their budget collecting, processing, and labeling data without knowing what actually improves performance.

	This leads to repeated failed retraining cycles, wasted GPU runs, and slow iteration because teams lack insights in which datasets improve the model and which degrade it.

	## Solution

	Influence guided training has been shown to halve the convergence time. [Dowser by Durinn](http://durinn.ai/) tells AI teams which training data improves model performance and which data hurts it, democratizing what big model providers are doing.

	## Product

	[Dowser](https://durinn-concept-explorer.azurewebsites.net/) doesn’t just recommend data or provide infrastructure — it directly benchmarks models to produce confident influence scores, with sub-2-minute cached results and 10–30 minute fresh evaluations across 100 open source datasets on a 8gb RAM and 2 vCPU host.

	## How it works

	Teams define a target capability or task → Dowser identifies high impact datasets from [Huggingface](https://huggingface.co/) and suggests optimized training directions.

	## Why now?

	- Training costs are exploding while performance gains are flattening
	- Synthetic data is increasingly contaminating training pipelines
	- Teams need precision, not more data
	- Influence methods are now viable via proxy models and distillation

	## Market

	- Every company training or fine tuning LLMs
	- 59% of AI budgets go to training data
	- 40% of firms spend over 70% of AI budget on data
	- Initial wedge is small and mid sized model teams