Spaces:
Sleeping
Sleeping
Commit Β·
849fb14
1
Parent(s): dcff4ca
Deploy version 0.2.0
Browse files- README.md +10 -16
- assets/styles.css +62 -10
- main.py +32 -5
- pyproject.toml +2 -2
- src/ui.py +12 -4
- uv.lock +2 -2
README.md
CHANGED
|
@@ -1,17 +1,8 @@
|
|
| 1 |
-
---
|
| 2 |
-
title: DataGen
|
| 3 |
-
emoji: π§¬
|
| 4 |
-
colorFrom: indigo
|
| 5 |
-
colorTo: pink
|
| 6 |
-
sdk: docker
|
| 7 |
-
short_description: AI-powered synthetic data generator
|
| 8 |
-
---
|
| 9 |
-
|
| 10 |
# 𧬠DataGen: AI-Powered Synthetic Data Generator
|
| 11 |
|
| 12 |
Generate realistic synthetic datasets by simply describing what you need.
|
| 13 |
|
| 14 |
-
[π **Try the Live Demo**](https://
|
| 15 |
|
| 16 |
<img src="https://github.com/lisekarimi/datagen/blob/main/assets/screenshot.png?raw=true" alt="DataGen interface" width="450">
|
| 17 |
|
|
@@ -26,10 +17,14 @@ DataGen transforms simple descriptions into structured datasets using AI. Perfec
|
|
| 26 |
- **AI-powered:** Uses GPT and Claude models
|
| 27 |
- **Instant download** with clean, ready-to-use datasets
|
| 28 |
|
|
|
|
|
|
|
|
|
|
| 29 |
## π Quick Start
|
| 30 |
|
| 31 |
### Prerequisites
|
| 32 |
- Python 3.11+
|
|
|
|
| 33 |
- [uv package manager](https://docs.astral.sh/uv/getting-started/installation/)
|
| 34 |
|
| 35 |
### Installation
|
|
@@ -54,21 +49,21 @@ make run
|
|
| 54 |
make ui
|
| 55 |
```
|
| 56 |
|
| 57 |
-
*For complete setup instructions, commands, and development guidelines, see [
|
| 58 |
|
| 59 |
## π§βπ» How to Use
|
| 60 |
|
| 61 |
1. **Describe your data:** "Customer purchase history with demographics"
|
| 62 |
-
2. **Choose format:** CSV, JSON, Parquet, or Markdown
|
| 63 |
3. **Select AI model:** GPT or Claude
|
| 64 |
4. **Set sample size:** Number of records to generate
|
| 65 |
-
5. **Generate & download** your dataset
|
| 66 |
|
| 67 |
## π‘οΈ Quality & Security
|
| 68 |
|
| 69 |
DataGen maintains high standards with comprehensive test coverage, automated security scanning, and code quality enforcement.
|
| 70 |
|
| 71 |
-
*For CI/CD setup and technical details, see [docs
|
| 72 |
|
| 73 |
## π Notes
|
| 74 |
- Generated files are automatically cleaned up after 5 minutes
|
|
@@ -76,7 +71,6 @@ DataGen maintains high standards with comprehensive test coverage, automated sec
|
|
| 76 |
- JSON output includes proper indentation for readability
|
| 77 |
- Cross-platform compatibility (Windows, macOS, Linux)
|
| 78 |
|
| 79 |
-
|
| 80 |
## π License
|
| 81 |
|
| 82 |
-
MIT
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# 𧬠DataGen: AI-Powered Synthetic Data Generator
|
| 2 |
|
| 3 |
Generate realistic synthetic datasets by simply describing what you need.
|
| 4 |
|
| 5 |
+
[π **Try the Live Demo**](https://datagen.lisekarimi.com)
|
| 6 |
|
| 7 |
<img src="https://github.com/lisekarimi/datagen/blob/main/assets/screenshot.png?raw=true" alt="DataGen interface" width="450">
|
| 8 |
|
|
|
|
| 17 |
- **AI-powered:** Uses GPT and Claude models
|
| 18 |
- **Instant download** with clean, ready-to-use datasets
|
| 19 |
|
| 20 |
+
To understand the full workflow from user input to file output, see the [architecture section](https://datagen.lisekarimi.com/docs/#/archi).
|
| 21 |
+
|
| 22 |
+
|
| 23 |
## π Quick Start
|
| 24 |
|
| 25 |
### Prerequisites
|
| 26 |
- Python 3.11+
|
| 27 |
+
- [Docker Desktop](https://www.docker.com/products/docker-desktop/)
|
| 28 |
- [uv package manager](https://docs.astral.sh/uv/getting-started/installation/)
|
| 29 |
|
| 30 |
### Installation
|
|
|
|
| 49 |
make ui
|
| 50 |
```
|
| 51 |
|
| 52 |
+
*For complete setup instructions, commands, and development guidelines, see [the Docs Page](https://datagen.lisekarimi.com/docs).*
|
| 53 |
|
| 54 |
## π§βπ» How to Use
|
| 55 |
|
| 56 |
1. **Describe your data:** "Customer purchase history with demographics"
|
| 57 |
+
2. **Choose format:** CSV, JSON, Parquet, or Markdown
|
| 58 |
3. **Select AI model:** GPT or Claude
|
| 59 |
4. **Set sample size:** Number of records to generate
|
| 60 |
+
5. **Generate & download** your dataset
|
| 61 |
|
| 62 |
## π‘οΈ Quality & Security
|
| 63 |
|
| 64 |
DataGen maintains high standards with comprehensive test coverage, automated security scanning, and code quality enforcement.
|
| 65 |
|
| 66 |
+
*For CI/CD setup and technical details, see [the docs Page](https://datagen.lisekarimi.com/docs/#/cicd).*
|
| 67 |
|
| 68 |
## π Notes
|
| 69 |
- Generated files are automatically cleaned up after 5 minutes
|
|
|
|
| 71 |
- JSON output includes proper indentation for readability
|
| 72 |
- Cross-platform compatibility (Windows, macOS, Linux)
|
| 73 |
|
|
|
|
| 74 |
## π License
|
| 75 |
|
| 76 |
+
MIT
|
assets/styles.css
CHANGED
|
@@ -11,7 +11,7 @@ html, body, #app, body > div, .gradio-container {
|
|
| 11 |
}
|
| 12 |
|
| 13 |
#app-container {
|
| 14 |
-
background-color: #1d3451 !important;
|
| 15 |
padding: 40px;
|
| 16 |
border-radius: 12px;
|
| 17 |
box-shadow: 0 4px 25px rgba(0, 0, 0, 0.4);
|
|
@@ -27,7 +27,7 @@ html, body, #app, body > div, .gradio-container {
|
|
| 27 |
#app-container strong {
|
| 28 |
font-size: 16px;
|
| 29 |
line-height: 1.6;
|
| 30 |
-
color: white !important;
|
| 31 |
}
|
| 32 |
|
| 33 |
#app-title {
|
|
@@ -54,7 +54,7 @@ html, body, #app, body > div, .gradio-container {
|
|
| 54 |
|
| 55 |
#intro-text {
|
| 56 |
font-size: 16px;
|
| 57 |
-
color: white !important;
|
| 58 |
margin-top: 20px;
|
| 59 |
line-height: 1.6;
|
| 60 |
}
|
|
@@ -67,7 +67,7 @@ html, body, #app, body > div, .gradio-container {
|
|
| 67 |
|
| 68 |
.button-link {
|
| 69 |
background: linear-gradient(to left, #ff416c, #ff4b2b);
|
| 70 |
-
color: white !important;
|
| 71 |
padding: 10px 20px;
|
| 72 |
text-decoration: none;
|
| 73 |
font-weight: bold;
|
|
@@ -100,7 +100,7 @@ html, body, #app, body > div, .gradio-container {
|
|
| 100 |
|
| 101 |
.label-box {
|
| 102 |
background-color: #1f2937;
|
| 103 |
-
color: white;
|
| 104 |
padding: 4px 10px;
|
| 105 |
border-radius: 8px;
|
| 106 |
display: inline-block;
|
|
@@ -115,11 +115,11 @@ html, body, #app, body > div, .gradio-container {
|
|
| 115 |
margin: 0 !important;
|
| 116 |
}
|
| 117 |
.row-spacer {
|
| 118 |
-
margin-top: 24px !important;
|
| 119 |
}
|
| 120 |
|
| 121 |
.column-gap {
|
| 122 |
-
gap: 16px;
|
| 123 |
}
|
| 124 |
|
| 125 |
textarea, input[type="text"] {
|
|
@@ -163,7 +163,7 @@ ul[role="listbox"] li[aria-selected="true"] {
|
|
| 163 |
|
| 164 |
input[type="number"] {
|
| 165 |
background-color: #374151;
|
| 166 |
-
color: white !important;
|
| 167 |
}
|
| 168 |
|
| 169 |
#business-problem-box {
|
|
@@ -181,7 +181,7 @@ input[type="number"] {
|
|
| 181 |
|
| 182 |
#run-btn {
|
| 183 |
background: linear-gradient(to left, #ff416c, #ff4b2b);
|
| 184 |
-
color: white !important;
|
| 185 |
font-weight: bold;
|
| 186 |
border: none;
|
| 187 |
padding: 10px 20px;
|
|
@@ -234,4 +234,56 @@ input[type="number"] {
|
|
| 234 |
.version-banner {
|
| 235 |
text-align: center;
|
| 236 |
font-size: 0.9em;
|
| 237 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
}
|
| 12 |
|
| 13 |
#app-container {
|
| 14 |
+
background-color: #1d3451 !important;
|
| 15 |
padding: 40px;
|
| 16 |
border-radius: 12px;
|
| 17 |
box-shadow: 0 4px 25px rgba(0, 0, 0, 0.4);
|
|
|
|
| 27 |
#app-container strong {
|
| 28 |
font-size: 16px;
|
| 29 |
line-height: 1.6;
|
| 30 |
+
color: white !important;
|
| 31 |
}
|
| 32 |
|
| 33 |
#app-title {
|
|
|
|
| 54 |
|
| 55 |
#intro-text {
|
| 56 |
font-size: 16px;
|
| 57 |
+
color: white !important;
|
| 58 |
margin-top: 20px;
|
| 59 |
line-height: 1.6;
|
| 60 |
}
|
|
|
|
| 67 |
|
| 68 |
.button-link {
|
| 69 |
background: linear-gradient(to left, #ff416c, #ff4b2b);
|
| 70 |
+
color: white !important;
|
| 71 |
padding: 10px 20px;
|
| 72 |
text-decoration: none;
|
| 73 |
font-weight: bold;
|
|
|
|
| 100 |
|
| 101 |
.label-box {
|
| 102 |
background-color: #1f2937;
|
| 103 |
+
color: white;
|
| 104 |
padding: 4px 10px;
|
| 105 |
border-radius: 8px;
|
| 106 |
display: inline-block;
|
|
|
|
| 115 |
margin: 0 !important;
|
| 116 |
}
|
| 117 |
.row-spacer {
|
| 118 |
+
margin-top: 24px !important;
|
| 119 |
}
|
| 120 |
|
| 121 |
.column-gap {
|
| 122 |
+
gap: 16px;
|
| 123 |
}
|
| 124 |
|
| 125 |
textarea, input[type="text"] {
|
|
|
|
| 163 |
|
| 164 |
input[type="number"] {
|
| 165 |
background-color: #374151;
|
| 166 |
+
color: white !important;
|
| 167 |
}
|
| 168 |
|
| 169 |
#business-problem-box {
|
|
|
|
| 181 |
|
| 182 |
#run-btn {
|
| 183 |
background: linear-gradient(to left, #ff416c, #ff4b2b);
|
| 184 |
+
color: white !important;
|
| 185 |
font-weight: bold;
|
| 186 |
border: none;
|
| 187 |
padding: 10px 20px;
|
|
|
|
| 234 |
.version-banner {
|
| 235 |
text-align: center;
|
| 236 |
font-size: 0.9em;
|
| 237 |
+
}
|
| 238 |
+
|
| 239 |
+
/* ==== Floating chat button for Gradio ==== */
|
| 240 |
+
.floating-chat-btn {
|
| 241 |
+
position: fixed !important;
|
| 242 |
+
bottom: 30px !important;
|
| 243 |
+
right: 30px !important;
|
| 244 |
+
display: flex !important;
|
| 245 |
+
align-items: center !important;
|
| 246 |
+
gap: 0.5rem !important;
|
| 247 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%) !important;
|
| 248 |
+
color: white !important;
|
| 249 |
+
padding: 1rem 1.75rem !important;
|
| 250 |
+
border-radius: 50px !important;
|
| 251 |
+
text-decoration: none !important;
|
| 252 |
+
font-weight: 600 !important;
|
| 253 |
+
font-size: 1rem !important;
|
| 254 |
+
box-shadow: 0 4px 20px rgba(102, 126, 234, 0.5) !important;
|
| 255 |
+
z-index: 9999 !important;
|
| 256 |
+
transition: all 0.3s cubic-bezier(0.4, 0, 0.2, 1) !important;
|
| 257 |
+
white-space: nowrap !important;
|
| 258 |
+
}
|
| 259 |
+
|
| 260 |
+
.floating-chat-btn:hover {
|
| 261 |
+
transform: translateY(-3px) scale(1.02) !important;
|
| 262 |
+
box-shadow: 0 8px 24px rgba(255, 65, 108, 0.7) !important;
|
| 263 |
+
text-decoration: none !important;
|
| 264 |
+
color: white !important;
|
| 265 |
+
}
|
| 266 |
+
|
| 267 |
+
.floating-chat-btn:active {
|
| 268 |
+
transform: translateY(-1px) scale(0.98) !important;
|
| 269 |
+
}
|
| 270 |
+
|
| 271 |
+
/* Tablet and smaller desktop */
|
| 272 |
+
@media (max-width: 1024px) {
|
| 273 |
+
.floating-chat-btn {
|
| 274 |
+
bottom: 25px !important;
|
| 275 |
+
right: 25px !important;
|
| 276 |
+
font-size: 0.95rem !important;
|
| 277 |
+
padding: 0.875rem 1.5rem !important;
|
| 278 |
+
}
|
| 279 |
+
}
|
| 280 |
+
|
| 281 |
+
/* Mobile */
|
| 282 |
+
@media (max-width: 768px) {
|
| 283 |
+
.floating-chat-btn {
|
| 284 |
+
bottom: 20px !important;
|
| 285 |
+
right: 20px !important;
|
| 286 |
+
font-size: 0.9rem !important;
|
| 287 |
+
padding: 0.875rem 1.25rem !important;
|
| 288 |
+
}
|
| 289 |
+
}
|
main.py
CHANGED
|
@@ -1,14 +1,41 @@
|
|
| 1 |
"""Entry point for the application."""
|
| 2 |
|
| 3 |
import os
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
from src.ui import build_ui
|
| 5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
demo = build_ui()
|
| 7 |
|
|
|
|
|
|
|
|
|
|
| 8 |
# Main application entry point
|
| 9 |
if __name__ == "__main__":
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
server_port=int(os.environ.get("PORT", 7860)),
|
| 14 |
-
)
|
|
|
|
| 1 |
"""Entry point for the application."""
|
| 2 |
|
| 3 |
import os
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
from fastapi import FastAPI
|
| 6 |
+
from fastapi.staticfiles import StaticFiles
|
| 7 |
+
from fastapi.responses import RedirectResponse
|
| 8 |
+
import gradio as gr
|
| 9 |
from src.ui import build_ui
|
| 10 |
|
| 11 |
+
# Create FastAPI app with custom docs URLs
|
| 12 |
+
app = FastAPI(
|
| 13 |
+
docs_url="/api-docs", redoc_url="/api-redoc", openapi_url="/api-openapi.json"
|
| 14 |
+
)
|
| 15 |
+
|
| 16 |
+
# Get docs path
|
| 17 |
+
docs_path = Path(__file__).parent / "docs"
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
# Add redirect from /docs to /docs/ (must come BEFORE mounting)
|
| 21 |
+
@app.get("/docs")
|
| 22 |
+
async def redirect_to_docs():
|
| 23 |
+
"""Redirect to the /docs/ homepage."""
|
| 24 |
+
return RedirectResponse(url="/docs/")
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
# Mount your documentation
|
| 28 |
+
if docs_path.exists():
|
| 29 |
+
app.mount("/docs", StaticFiles(directory=str(docs_path), html=True), name="docs")
|
| 30 |
+
|
| 31 |
+
# Build Gradio UI
|
| 32 |
demo = build_ui()
|
| 33 |
|
| 34 |
+
# Mount Gradio to the root path (this should come LAST)
|
| 35 |
+
app = gr.mount_gradio_app(app, demo, path="")
|
| 36 |
+
|
| 37 |
# Main application entry point
|
| 38 |
if __name__ == "__main__":
|
| 39 |
+
import uvicorn
|
| 40 |
+
|
| 41 |
+
uvicorn.run(app, host="0.0.0.0", port=int(os.environ.get("PORT", 7860)))
|
|
|
|
|
|
pyproject.toml
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
[project]
|
| 2 |
name = "datagen"
|
| 3 |
-
version = "0.
|
| 4 |
description = "AI-powered platform for generating synthetic datasets"
|
| 5 |
readme = "README.md"
|
| 6 |
requires-python = ">=3.11"
|
|
@@ -23,7 +23,7 @@ filterwarnings = [
|
|
| 23 |
[tool.ruff.lint]
|
| 24 |
select = [
|
| 25 |
"E", # pycodestyle errors
|
| 26 |
-
"W", # pycodestyle warnings
|
| 27 |
"F", # Pyflakes
|
| 28 |
"D", # pydocstyle (docstrings)
|
| 29 |
"UP", # pyupgrade
|
|
|
|
| 1 |
[project]
|
| 2 |
name = "datagen"
|
| 3 |
+
version = "0.2.0"
|
| 4 |
description = "AI-powered platform for generating synthetic datasets"
|
| 5 |
readme = "README.md"
|
| 6 |
requires-python = ">=3.11"
|
|
|
|
| 23 |
[tool.ruff.lint]
|
| 24 |
select = [
|
| 25 |
"E", # pycodestyle errors
|
| 26 |
+
"W", # pycodestyle warnings
|
| 27 |
"F", # Pyflakes
|
| 28 |
"D", # pydocstyle (docstrings)
|
| 29 |
"UP", # pyupgrade
|
src/ui.py
CHANGED
|
@@ -59,11 +59,10 @@ def build_ui(css_path="assets/styles.css"):
|
|
| 59 |
"""
|
| 60 |
gr.HTML(intro_html)
|
| 61 |
|
| 62 |
-
|
| 63 |
-
learn_more_html = f"""
|
| 64 |
<div id="learn-more-button">
|
| 65 |
-
<a href="
|
| 66 |
-
class="button-link"
|
| 67 |
</div>
|
| 68 |
"""
|
| 69 |
gr.HTML(learn_more_html)
|
|
@@ -173,6 +172,15 @@ def build_ui(css_path="assets/styles.css"):
|
|
| 173 |
"""
|
| 174 |
)
|
| 175 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 176 |
return ui
|
| 177 |
|
| 178 |
except Exception as e:
|
|
|
|
| 59 |
"""
|
| 60 |
gr.HTML(intro_html)
|
| 61 |
|
| 62 |
+
learn_more_html = """
|
|
|
|
| 63 |
<div id="learn-more-button">
|
| 64 |
+
<a href="/docs/"
|
| 65 |
+
class="button-link">Documentation</a>
|
| 66 |
</div>
|
| 67 |
"""
|
| 68 |
gr.HTML(learn_more_html)
|
|
|
|
| 172 |
"""
|
| 173 |
)
|
| 174 |
|
| 175 |
+
# Floating chat button
|
| 176 |
+
gr.HTML(
|
| 177 |
+
"""
|
| 178 |
+
<a href="/docs/" class="floating-chat-btn" target="_blank">
|
| 179 |
+
π¬ Chat with AI Assistant
|
| 180 |
+
</a>
|
| 181 |
+
"""
|
| 182 |
+
)
|
| 183 |
+
|
| 184 |
return ui
|
| 185 |
|
| 186 |
except Exception as e:
|
uv.lock
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
version = 1
|
| 2 |
-
revision =
|
| 3 |
requires-python = ">=3.11"
|
| 4 |
resolution-markers = [
|
| 5 |
"python_full_version >= '3.13'",
|
|
@@ -177,7 +177,7 @@ wheels = [
|
|
| 177 |
|
| 178 |
[[package]]
|
| 179 |
name = "datagen"
|
| 180 |
-
version = "0.
|
| 181 |
source = { virtual = "." }
|
| 182 |
dependencies = [
|
| 183 |
{ name = "anthropic" },
|
|
|
|
| 1 |
version = 1
|
| 2 |
+
revision = 3
|
| 3 |
requires-python = ">=3.11"
|
| 4 |
resolution-markers = [
|
| 5 |
"python_full_version >= '3.13'",
|
|
|
|
| 177 |
|
| 178 |
[[package]]
|
| 179 |
name = "datagen"
|
| 180 |
+
version = "0.2.0"
|
| 181 |
source = { virtual = "." }
|
| 182 |
dependencies = [
|
| 183 |
{ name = "anthropic" },
|