Spaces:
Sleeping
Sleeping
File size: 2,875 Bytes
3bccccd 640b4b2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | ---
title: MGT Detection
emoji: π
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: MGT-Detection
---
# MGT-Detection
## Overview
MGT-Detection (Machine-Generated Text Detection) is a project designed to classify and detect whether a given text is human-written or machine-generated. The project leverages state-of-the-art natural language processing (NLP) models and pipelines to achieve accurate classification results. It includes tools for training, evaluating, and deploying models for text classification tasks.
## Features
- **Text Classification**: Detects whether a text is human-written or machine-generated.
- **Model Training Pipeline**: Includes hyperparameter optimization, dataset preparation, and model training.
- **Evaluation**: Provides metrics such as accuracy, precision, recall, and F1 score.
- **Dataset Management**: Tools for preparing and tokenizing datasets.
- **Model Deployment**: Save and load fine-tuned models for deployment.
## Project Structure
```
MGT-Detection/
βββ app.py # Main application for text classification
βββ pipeline/
β βββ dataset.py # Dataset preparation and management
β βββ model_pipeline.py # Model training and evaluation pipeline
β βββ main.py # Entry point for running the training pipeline
βββ samples.json # Sample dataset for testing
```
## Usage
### Running the Application
To launch the text classification application:
```bash
python app.py
```
### Training a Model
To train a model using the pipeline:
```bash
python pipeline/main.py \
--file_path <path_to_dataset> \
--out_path <output_directory> \
--model_name <model_name> \
--num_labels 2 \
--sample_frac 1.0 \
--num_trials 5 \
--num_epochs 5
```
### Dataset Preparation
Ensure your dataset is in JSON format with the following structure:
```json
[
{
"text": "<text_sample>",
"label": "<label>",
},
...
]
```
## Key Components
### `app.py`
- Provides a user interface for classifying text as human-written or machine-generated.
### `pipeline/model_pipeline.py`
- Contains functions for model training, hyperparameter optimization, and evaluation.
### `pipeline/dataset.py`
- Handles dataset preparation, tokenization, and saving/loading datasets.
### `samples.json`
- A sample dataset for testing the application.
## Requirements
- Python 3.8+
- Transformers
- Datasets
- Optuna
- Gradio
- Scikit-learn
## Contributing
Contributions are welcome! Please fork the repository and submit a pull request with your changes.
## License
This project is licensed under the MIT License. See the LICENSE file for details.
## Acknowledgments
- Hugging Face Transformers
- Optuna for hyperparameter optimization
- Gradio for building the user interface |