Spaces:
Configuration error
Configuration error
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,73 +1,78 @@
|
|
| 1 |
---
|
| 2 |
tags:
|
| 3 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
- testing
|
| 5 |
- privacy
|
| 6 |
-
-
|
| 7 |
-
- enterprise
|
| 8 |
-
- anonymization
|
| 9 |
-
- data-augmentation
|
| 10 |
-
- simulation
|
| 11 |
- regulated-industries
|
| 12 |
-
- insurance
|
| 13 |
pretty_name: DataFramer
|
| 14 |
license: other
|
| 15 |
---
|
| 16 |
|
| 17 |
# DataFramer
|
| 18 |
|
| 19 |
-
**
|
| 20 |
|
| 21 |
-
DataFramer helps AI teams
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
## Why teams use DataFramer
|
| 26 |
|
| 27 |
-
AI
|
| 28 |
|
| 29 |
-
-
|
| 30 |
-
|
| 31 |
-
- too messy to recreate by hand
|
| 32 |
-
- too unrealistic when manually mocked
|
| 33 |
|
| 34 |
-
|
|
|
|
| 35 |
|
| 36 |
-
|
|
|
|
| 37 |
|
| 38 |
-
|
| 39 |
-
Build eval datasets with stronger coverage across common cases, rare cases, and edge cases.
|
| 40 |
|
| 41 |
-
|
| 42 |
-
Work with realistic data for testing and iteration without exposing sensitive production records.
|
| 43 |
-
|
| 44 |
-
- **Complex workflow validation**
|
| 45 |
-
Test systems that depend on long documents, multi-file inputs, nested structures, and business-specific constraints.
|
| 46 |
|
| 47 |
-
|
| 48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
-
## Built for enterprise data
|
| 51 |
|
| 52 |
-
DataFramer
|
| 53 |
|
| 54 |
- long-form documents and PDFs
|
| 55 |
-
- structured and semi-structured
|
| 56 |
-
- nested and hierarchical
|
| 57 |
-
- multi-file
|
| 58 |
-
- high-variability
|
| 59 |
|
| 60 |
-
##
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
-
-
|
| 65 |
-
|
| 66 |
-
- healthcare
|
| 67 |
-
- enterprise AI teams working with restricted or hard-to-access data
|
| 68 |
|
| 69 |
-
##
|
| 70 |
|
| 71 |
-
|
|
|
|
| 72 |
|
| 73 |
-
**https://www.dataframer.ai**
|
|
|
|
| 1 |
---
|
| 2 |
tags:
|
| 3 |
+
- synthetic-data
|
| 4 |
+
- data-generation
|
| 5 |
+
- data-anonymization
|
| 6 |
+
- simulation
|
| 7 |
+
- llm-evaluation
|
| 8 |
+
- fine-tuning
|
| 9 |
- testing
|
| 10 |
- privacy
|
| 11 |
+
- enterprise-ai
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
- regulated-industries
|
|
|
|
| 13 |
pretty_name: DataFramer
|
| 14 |
license: other
|
| 15 |
---
|
| 16 |
|
| 17 |
# DataFramer
|
| 18 |
|
| 19 |
+
**Generate, anonymize, and simulate diverse datasets from your own data for testing, evals, and fine-tuning.**
|
| 20 |
|
| 21 |
+
DataFramer helps AI teams take their own data further — creating realistic, privacy-safe datasets for **testing, evaluation, and post-training** without exposing sensitive production records.
|
| 22 |
|
| 23 |
+
**DataFramer works from your data**, adding diversity while preserving the **structure, distributions, and constraints** your models depend on.
|
| 24 |
|
| 25 |
## Why teams use DataFramer
|
| 26 |
|
| 27 |
+
AI teams often get blocked because:
|
| 28 |
|
| 29 |
+
- **their seed data isn’t enough**
|
| 30 |
+
Generate diverse, scaled datasets without starting from scratch.
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
- **their real data is off-limits**
|
| 33 |
+
Anonymize sensitive records while keeping structure intact.
|
| 34 |
|
| 35 |
+
- **their data doesn’t cover what models will face in production**
|
| 36 |
+
Simulate edge cases, rare scenarios, and real-world variation missing from existing samples.
|
| 37 |
|
| 38 |
+
## How it works
|
|
|
|
| 39 |
|
| 40 |
+
DataFramer supports a seed-based workflow for enterprise AI data readiness:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
1. **Seed input** from manual samples or production data
|
| 43 |
+
2. **Anonymize** sensitive records when needed
|
| 44 |
+
3. **Analyze** schema, structure, distributions, and patterns
|
| 45 |
+
4. **Configure** variation, volume, edge cases, and format mix
|
| 46 |
+
5. **Generate** realistic datasets across complex formats
|
| 47 |
+
6. **Use** the outputs for model evaluation, testing, and fine-tuning
|
| 48 |
|
| 49 |
+
## Built for real enterprise data
|
| 50 |
|
| 51 |
+
DataFramer works with **any textual dataset — any format, any domain, any complexity**, including:
|
| 52 |
|
| 53 |
- long-form documents and PDFs
|
| 54 |
+
- structured and semi-structured records
|
| 55 |
+
- nested and hierarchical data
|
| 56 |
+
- multi-file workflows
|
| 57 |
+
- high-variability business inputs
|
| 58 |
|
| 59 |
+
## Best-fit use cases
|
| 60 |
+
|
| 61 |
+
- **LLM and AI evaluations**
|
| 62 |
+
Build stronger eval datasets with better coverage across common, rare, and edge-case scenarios.
|
| 63 |
|
| 64 |
+
- **Privacy-safe testing**
|
| 65 |
+
Use realistic datasets for testing and iteration without exposing sensitive production data.
|
| 66 |
+
|
| 67 |
+
- **Anonymization for AI workflows**
|
| 68 |
+
Transform restricted real-world data into safe seed inputs for downstream generation and evaluation.
|
| 69 |
|
| 70 |
+
- **Fine-tuning and dataset expansion**
|
| 71 |
+
Extend sparse datasets with more realistic variation while preserving fidelity to source patterns.
|
|
|
|
|
|
|
| 72 |
|
| 73 |
+
## Enterprise-ready
|
| 74 |
|
| 75 |
+
Built for teams in regulated and data-sensitive environments.
|
| 76 |
+
**Your data never has to leave.**
|
| 77 |
|
| 78 |
+
Learn more at **https://www.dataframer.ai**
|