Spaces:

ziadmostafa
/

MGT-Detection

Sleeping

File size: 2,875 Bytes

3bccccd
 
 
 
 
 
 
 
 
 
 
 
 
640b4b2

---
title: MGT Detection
emoji: 🐠
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: MGT-Detection
---

# MGT-Detection

## Overview
MGT-Detection (Machine-Generated Text Detection) is a project designed to classify and detect whether a given text is human-written or machine-generated. The project leverages state-of-the-art natural language processing (NLP) models and pipelines to achieve accurate classification results. It includes tools for training, evaluating, and deploying models for text classification tasks.

## Features
- **Text Classification**: Detects whether a text is human-written or machine-generated.
- **Model Training Pipeline**: Includes hyperparameter optimization, dataset preparation, and model training.
- **Evaluation**: Provides metrics such as accuracy, precision, recall, and F1 score.
- **Dataset Management**: Tools for preparing and tokenizing datasets.
- **Model Deployment**: Save and load fine-tuned models for deployment.

## Project Structure
```
MGT-Detection/
├── app.py                # Main application for text classification
├── pipeline/
│   ├── dataset.py        # Dataset preparation and management
│   ├── model_pipeline.py # Model training and evaluation pipeline
│   ├── main.py           # Entry point for running the training pipeline
├── samples.json          # Sample dataset for testing
```


## Usage
### Running the Application
To launch the text classification application:
```bash
python app.py
```

### Training a Model
To train a model using the pipeline:
```bash
python pipeline/main.py \
  --file_path <path_to_dataset> \
  --out_path <output_directory> \
  --model_name <model_name> \
  --num_labels 2 \
  --sample_frac 1.0 \
  --num_trials 5 \
  --num_epochs 5
```

### Dataset Preparation
Ensure your dataset is in JSON format with the following structure:
```json
[
  {
    "text": "<text_sample>",
    "label": "<label>",
  },
  ...
]
```

## Key Components
### `app.py`
- Provides a user interface for classifying text as human-written or machine-generated.

### `pipeline/model_pipeline.py`
- Contains functions for model training, hyperparameter optimization, and evaluation.

### `pipeline/dataset.py`
- Handles dataset preparation, tokenization, and saving/loading datasets.

### `samples.json`
- A sample dataset for testing the application.

## Requirements
- Python 3.8+
- Transformers
- Datasets
- Optuna
- Gradio
- Scikit-learn

## Contributing
Contributions are welcome! Please fork the repository and submit a pull request with your changes.

## License
This project is licensed under the MIT License. See the LICENSE file for details.

## Acknowledgments
- Hugging Face Transformers
- Optuna for hyperparameter optimization
- Gradio for building the user interface