Spaces:
Paused
Paused
| title: DataHubHub | |
| emoji: ⚡ | |
| colorFrom: red | |
| colorTo: indigo | |
| sdk: streamlit | |
| sdk_version: 1.42.2 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| language: en | |
| # ML Dataset & Code Generation Manager | |
| A comprehensive platform for ML dataset management and code generation with Hugging Face integration. | |
| ## Features | |
| - **Dataset Management**: Upload, explore, and manage machine learning datasets | |
| - **Data Visualization**: Visualize dataset statistics and distributions | |
| - **Code Generation**: Fine-tune models for code generation tasks | |
| - **Code Quality Tools**: Improve code quality with integrated formatters, linters, and type checkers | |
| ## Technology Stack | |
| - **Frontend**: Streamlit | |
| - **Backend**: Python | |
| - **Database**: SQLite (via SQLAlchemy) | |
| - **ML Integration**: Hugging Face Transformers, Datasets | |
| - **Visualization**: Plotly, Matplotlib | |
| ## Project Structure | |
| ``` | |
| . | |
| ├── app.py # Main application entry point | |
| ├── components/ # UI components | |
| │ ├── code_quality.py # Code quality tools | |
| │ ├── dataset_preview.py # Dataset preview component | |
| │ ├── dataset_statistics.py # Dataset statistics component | |
| │ ├── dataset_uploader.py # Dataset upload component | |
| │ ├── dataset_validation.py # Dataset validation component | |
| │ ├── dataset_visualization.py # Dataset visualization component | |
| │ └── fine_tuning/ # Fine-tuning components | |
| │ ├── finetune_ui.py # Fine-tuning UI | |
| │ └── model_interface.py # Model interface | |
| ├── database/ # Database configuration | |
| │ ├── models.py # Database models | |
| │ └── operations.py # Database operations | |
| ├── utils/ # Utility functions | |
| │ ├── dataset_utils.py # Dataset utilities | |
| │ ├── huggingface_integration.py # Hugging Face integration | |
| │ └── smolagents_integration.py # SmolaAgents integration | |
| └── assets/ # Static assets | |
| ``` | |
| ## Deployment | |
| This application is designed to be deployed as a Hugging Face Space. | |
| ### Hugging Face Space Deployment | |
| 1. Fork this repository | |
| 2. Create a new Hugging Face Space | |
| 3. Connect the forked repository to your Space | |
| 4. The application will be deployed automatically | |
| ### Local Development | |
| 1. Clone the repository | |
| 2. Install dependencies: | |
| ``` | |
| pip install streamlit pandas numpy plotly matplotlib scikit-learn SQLAlchemy huggingface-hub datasets transformers torch | |
| ``` | |
| 3. Run the application: | |
| ``` | |
| streamlit run app.py | |
| ``` | |
| ## Configuration | |
| - `.streamlit/config.toml`: Streamlit configuration | |
| - `.streamlit/secrets.toml`: Secrets and API keys | |
| - `huggingface-spacefile`: Hugging Face Space configuration | |
| ## API Keys | |
| To use the Hugging Face integration features, add your Hugging Face API token to `.streamlit/secrets.toml`: | |
| ```toml | |
| [huggingface] | |
| hf_token = "HF_TOKEN" | |
| ``` | |
| ## License | |
| This project is licensed under the Apache-2.0 License - see the LICENSE file for details. |