--- title: MGT Detection emoji: 🐠 colorFrom: red colorTo: yellow sdk: gradio sdk_version: 5.29.0 app_file: app.py pinned: false license: apache-2.0 short_description: MGT-Detection --- # MGT-Detection ## Overview MGT-Detection (Machine-Generated Text Detection) is a project designed to classify and detect whether a given text is human-written or machine-generated. The project leverages state-of-the-art natural language processing (NLP) models and pipelines to achieve accurate classification results. It includes tools for training, evaluating, and deploying models for text classification tasks. ## Features - **Text Classification**: Detects whether a text is human-written or machine-generated. - **Model Training Pipeline**: Includes hyperparameter optimization, dataset preparation, and model training. - **Evaluation**: Provides metrics such as accuracy, precision, recall, and F1 score. - **Dataset Management**: Tools for preparing and tokenizing datasets. - **Model Deployment**: Save and load fine-tuned models for deployment. ## Project Structure ``` MGT-Detection/ ├── app.py # Main application for text classification ├── pipeline/ │ ├── dataset.py # Dataset preparation and management │ ├── model_pipeline.py # Model training and evaluation pipeline │ ├── main.py # Entry point for running the training pipeline ├── samples.json # Sample dataset for testing ``` ## Usage ### Running the Application To launch the text classification application: ```bash python app.py ``` ### Training a Model To train a model using the pipeline: ```bash python pipeline/main.py \ --file_path \ --out_path \ --model_name \ --num_labels 2 \ --sample_frac 1.0 \ --num_trials 5 \ --num_epochs 5 ``` ### Dataset Preparation Ensure your dataset is in JSON format with the following structure: ```json [ { "text": "", "label": "