Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.2.0
title: MGT Detection
emoji: π
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: MGT-Detection
MGT-Detection
Overview
MGT-Detection (Machine-Generated Text Detection) is a project designed to classify and detect whether a given text is human-written or machine-generated. The project leverages state-of-the-art natural language processing (NLP) models and pipelines to achieve accurate classification results. It includes tools for training, evaluating, and deploying models for text classification tasks.
Features
- Text Classification: Detects whether a text is human-written or machine-generated.
- Model Training Pipeline: Includes hyperparameter optimization, dataset preparation, and model training.
- Evaluation: Provides metrics such as accuracy, precision, recall, and F1 score.
- Dataset Management: Tools for preparing and tokenizing datasets.
- Model Deployment: Save and load fine-tuned models for deployment.
Project Structure
MGT-Detection/
βββ app.py # Main application for text classification
βββ pipeline/
β βββ dataset.py # Dataset preparation and management
β βββ model_pipeline.py # Model training and evaluation pipeline
β βββ main.py # Entry point for running the training pipeline
βββ samples.json # Sample dataset for testing
Usage
Running the Application
To launch the text classification application:
python app.py
Training a Model
To train a model using the pipeline:
python pipeline/main.py \
--file_path <path_to_dataset> \
--out_path <output_directory> \
--model_name <model_name> \
--num_labels 2 \
--sample_frac 1.0 \
--num_trials 5 \
--num_epochs 5
Dataset Preparation
Ensure your dataset is in JSON format with the following structure:
[
{
"text": "<text_sample>",
"label": "<label>",
},
...
]
Key Components
app.py
- Provides a user interface for classifying text as human-written or machine-generated.
pipeline/model_pipeline.py
- Contains functions for model training, hyperparameter optimization, and evaluation.
pipeline/dataset.py
- Handles dataset preparation, tokenization, and saving/loading datasets.
samples.json
- A sample dataset for testing the application.
Requirements
- Python 3.8+
- Transformers
- Datasets
- Optuna
- Gradio
- Scikit-learn
Contributing
Contributions are welcome! Please fork the repository and submit a pull request with your changes.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
- Hugging Face Transformers
- Optuna for hyperparameter optimization
- Gradio for building the user interface