Spaces:

ziadmostafa
/

MGT-Detection

Sleeping

App Files Files Community

MGT-Detection / README.md

ziadmostafa

added app files

640b4b2 8 months ago

preview code

raw

history blame contribute delete

2.88 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

metadata

title: MGT Detection
emoji: 🐠
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: MGT-Detection

MGT-Detection

Overview

MGT-Detection (Machine-Generated Text Detection) is a project designed to classify and detect whether a given text is human-written or machine-generated. The project leverages state-of-the-art natural language processing (NLP) models and pipelines to achieve accurate classification results. It includes tools for training, evaluating, and deploying models for text classification tasks.

Features

Text Classification: Detects whether a text is human-written or machine-generated.
Model Training Pipeline: Includes hyperparameter optimization, dataset preparation, and model training.
Evaluation: Provides metrics such as accuracy, precision, recall, and F1 score.
Dataset Management: Tools for preparing and tokenizing datasets.
Model Deployment: Save and load fine-tuned models for deployment.

Project Structure

MGT-Detection/
├── app.py                # Main application for text classification
├── pipeline/
│   ├── dataset.py        # Dataset preparation and management
│   ├── model_pipeline.py # Model training and evaluation pipeline
│   ├── main.py           # Entry point for running the training pipeline
├── samples.json          # Sample dataset for testing

Usage

Running the Application

To launch the text classification application:

python app.py

Training a Model

To train a model using the pipeline:

python pipeline/main.py \
  --file_path <path_to_dataset> \
  --out_path <output_directory> \
  --model_name <model_name> \
  --num_labels 2 \
  --sample_frac 1.0 \
  --num_trials 5 \
  --num_epochs 5

Dataset Preparation

Ensure your dataset is in JSON format with the following structure:

[
  {
    "text": "<text_sample>",
    "label": "<label>",
  },
  ...
]

Key Components

`app.py`

Provides a user interface for classifying text as human-written or machine-generated.

`pipeline/model_pipeline.py`

Contains functions for model training, hyperparameter optimization, and evaluation.

`pipeline/dataset.py`

Handles dataset preparation, tokenization, and saving/loading datasets.

`samples.json`

A sample dataset for testing the application.

Requirements

Python 3.8+
Transformers
Datasets
Optuna
Gradio
Scikit-learn

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your changes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Hugging Face Transformers
Optuna for hyperparameter optimization
Gradio for building the user interface