Auto-upgrade: composite 0.8576 | GLEU 0.7506 | BERTScore 0.9733 | 1-WER 0.8488 | r=16, 10 epochs, combined loss
Browse files- README.md +112 -502
- adapter_config.json +48 -0
- adapter_model.safetensors +3 -0
- tokenizer.json +0 -0
- tokenizer_config.json +114 -0
README.md
CHANGED
|
@@ -1,596 +1,206 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
tags:
|
| 5 |
-
-
|
| 6 |
-
- dyslexia
|
| 7 |
-
- grammar-correction
|
| 8 |
-
- style-preservation
|
| 9 |
- lora
|
| 10 |
-
-
|
| 11 |
-
license: mit
|
| 12 |
-
base_model: google/flan-t5-small
|
| 13 |
-
datasets:
|
| 14 |
-
- cambridge/fce
|
| 15 |
-
- wi_locness
|
| 16 |
-
- jfleg
|
| 17 |
-
pipeline_tag: translation
|
| 18 |
---
|
| 19 |
|
| 20 |
-
#
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
-
## Overview
|
| 25 |
|
| 26 |
-
This system takes text written by dyslexic students and corrects grammar, spelling, and fluency errors while:
|
| 27 |
|
| 28 |
-
|
| 29 |
-
2. **Elevating vocabulary to academic register** using Coxhead's Academic Word List (AWL) and BERT-based lexical substitution
|
| 30 |
-
3. **Resisting AI detection** through a frozen Human Pattern Classifier that penalises AI-typical writing during training
|
| 31 |
-
4. **Maintaining semantic meaning** with cosine-similarity-based semantic preservation loss
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
-
---
|
| 36 |
|
| 37 |
-
## Features
|
| 38 |
-
|
| 39 |
-
| Feature | Description |
|
| 40 |
-
|---------|-------------|
|
| 41 |
-
| **Two-pass spell correction** | Dyslexia-aware phonetic pattern handling via LanguageTool |
|
| 42 |
-
| **Style fingerprinting** | 41 raw features → MLP → 512-dim L2-normalised style vector |
|
| 43 |
-
| **LoRA fine-tuning** | 1.63% trainable params (1.28M / 78.2M total), rank=8 |
|
| 44 |
-
| **Academic vocabulary elevation** | BERT fill-mask → AWL candidate filtering → semantic similarity gate |
|
| 45 |
-
| **Human pattern anti-AI loss** | Pre-trained frozen MLP classifier (17-dim features including GPT-2 perplexity) |
|
| 46 |
-
| **Combined training loss** | `L_CE + λ₁·L_style + λ₂·L_semantic + λ₃·L_human_pattern` |
|
| 47 |
-
| **Sentence-chunked inference** | Long texts split into 128-token chunks matching training window |
|
| 48 |
-
| **FastAPI server** | RESTful `/correct` endpoint with CORS and rate limiting |
|
| 49 |
-
| **Multi-stage training** | Orchestrated via `train.sh` with checkpoint system (Skip/Redo/Continue) |
|
| 50 |
-
| **Synthetic data augmentation** | `DyslexiaSimulator` generates realistic errors from clean text |
|
| 51 |
|
| 52 |
-
---
|
| 53 |
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
│ ├── inference_config.yaml # Inference & generation settings
|
| 62 |
-
│ ├── model_config.yaml # Model architecture registry
|
| 63 |
-
│ └── awl_config.yaml # Academic Word List settings
|
| 64 |
-
├── scripts/
|
| 65 |
-
│ ├── train.py # Main training script (Click CLI)
|
| 66 |
-
│ ├── evaluate.py # Test set evaluation (GLEU, ERRANT, BERTScore)
|
| 67 |
-
│ ├── run_inference.py # Interactive CLI inference
|
| 68 |
-
│ ├── preprocess_data.py # Raw datasets → unified JSONL
|
| 69 |
-
│ ├── pretrain_human_pattern_classifier.py # Stage 3: anti-AI classifier
|
| 70 |
-
│ ├── download_datasets.sh # BEA-2019 dataset downloader
|
| 71 |
-
│ └── download_kaggle_datasets.sh # Kaggle human/AI data downloader
|
| 72 |
-
├── src/
|
| 73 |
-
│ ├── model/
|
| 74 |
-
│ │ ├── base_model.py # Model loader (T5/BART/Llama + LoRA + quantization)
|
| 75 |
-
│ │ ├── style_conditioner.py # Prefix tuning: style → virtual tokens
|
| 76 |
-
│ │ ├── generation_utils.py # Beam search, sampling, batch generation
|
| 77 |
-
│ │ └── lora_adapter.py # LoRA configuration helpers
|
| 78 |
-
│ ├── preprocessing/
|
| 79 |
-
│ │ ├── pipeline.py # Full preprocessing orchestrator
|
| 80 |
-
│ │ ├── spell_corrector.py # LanguageTool + dyslexia-aware correction
|
| 81 |
-
│ │ ├── dyslexia_simulator.py # Synthetic error generation (Rello et al.)
|
| 82 |
-
│ │ ├── dependency_parser.py # spaCy dependency tree analysis
|
| 83 |
-
│ │ ├── ner_tagger.py # Named entity protection
|
| 84 |
-
│ │ └── sentence_segmenter.py # Sentence boundary detection
|
| 85 |
-
│ ├── style/
|
| 86 |
-
│ │ ├── fingerprinter.py # 41 features → 512-dim style vector
|
| 87 |
-
│ │ ├── style_vector.py # Style vector dataclass
|
| 88 |
-
│ │ ├── formality_classifier.py # Rule-based formality scoring
|
| 89 |
-
│ │ └── emotion_classifier.py # Emotion detection
|
| 90 |
-
│ ├── training/
|
| 91 |
-
│ │ ├── dataset.py # Pre-tokenized cached dataset with style vectors
|
| 92 |
-
│ │ ├── trainer.py # CorrectionTrainer (HF Trainer + PEFT fixes)
|
| 93 |
-
│ │ ├── loss_functions.py # V1 and V2 combined losses
|
| 94 |
-
│ │ ├── human_pattern_extractor.py # 17-dim feature extraction + classifier
|
| 95 |
-
│ │ └── callbacks.py # Evaluation logging callbacks
|
| 96 |
-
│ ├── vocabulary/
|
| 97 |
-
│ │ ├── lexical_substitution.py # BERT fill-mask → AWL substitution pipeline
|
| 98 |
-
│ │ ├── awl_loader.py # Coxhead Academic Word List loader
|
| 99 |
-
│ │ └── register_filter.py # Contraction expansion + colloquial replacement
|
| 100 |
-
│ ├── inference/
|
| 101 |
-
│ │ ├── corrector.py # End-to-end inference pipeline orchestrator
|
| 102 |
-
│ │ └── postprocessor.py # Cleanup, entity restore, formatting
|
| 103 |
-
│ ├── evaluation/
|
| 104 |
-
│ │ ├── gleu_scorer.py # GLEU + BERTScore computation
|
| 105 |
-
│ │ ├── errant_evaluator.py # ERRANT P/R/F0.5 evaluation
|
| 106 |
-
│ │ ├── style_metrics.py # Style similarity + AWL coverage
|
| 107 |
-
│ │ └── authorship_verifier.py # AI detection resistance testing
|
| 108 |
-
│ └── api/
|
| 109 |
-
│ ├── main.py # FastAPI application
|
| 110 |
-
│ ├── schemas.py # Pydantic request/response models
|
| 111 |
-
│ └── middleware.py # Rate limiting + CORS
|
| 112 |
-
├── data/
|
| 113 |
-
│ ├── raw/ # Original datasets (FCE, W&I+LOCNESS, JFLEG, Kaggle)
|
| 114 |
-
│ ├── processed/ # Unified JSONL (train/val/test splits)
|
| 115 |
-
│ ├── cache/ # Pre-tokenized dataset caches (.pt files)
|
| 116 |
-
│ └── awl/ # Coxhead Academic Word List
|
| 117 |
-
├── train.sh # Multi-stage training orchestrator
|
| 118 |
-
├── start.sh # Inference launcher (CLI or API mode)
|
| 119 |
-
├── Dockerfile # Production container
|
| 120 |
-
├── docker-compose.yml # Docker deployment
|
| 121 |
-
├── requirements.txt # Python dependencies
|
| 122 |
-
└── pyproject.toml # Project metadata
|
| 123 |
-
```
|
| 124 |
-
|
| 125 |
-
## Model Architecture
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
### PNG:
|
| 129 |
-

|
| 130 |
-
|
| 131 |
-
### Mermaid Diagram:
|
| 132 |
-
```mermaid
|
| 133 |
-
graph TB
|
| 134 |
-
%% ── Inference Pipeline (left-to-right flow) ──────────────────────
|
| 135 |
-
subgraph INFERENCE["🔮 Inference Pipeline"]
|
| 136 |
-
direction TB
|
| 137 |
-
INPUT["📝 Raw Dyslectic Text"]
|
| 138 |
-
|
| 139 |
-
subgraph PREPROCESS["Pre-Processing"]
|
| 140 |
-
SPELL["Spell Corrector<br/><i>dyslexia-aware phonetic</i>"]
|
| 141 |
-
SENT_SEG["Sentence Segmenter"]
|
| 142 |
-
DEP_PARSE["Dependency Parser"]
|
| 143 |
-
NER["NER Tagger"]
|
| 144 |
-
end
|
| 145 |
-
|
| 146 |
-
subgraph STYLE["Style Analysis"]
|
| 147 |
-
FINGER["Style Fingerprinter<br/><i>512-dim vector</i>"]
|
| 148 |
-
EMOTION["Emotion Classifier"]
|
| 149 |
-
FORMALITY["Formality Classifier"]
|
| 150 |
-
STYLE_VEC["Style Vector Composer"]
|
| 151 |
-
end
|
| 152 |
-
|
| 153 |
-
subgraph GENERATION["Core Generation"]
|
| 154 |
-
STYLE_COND["Style Conditioner<br/><i>prefix tuning</i>"]
|
| 155 |
-
BASE_MODEL["Base LM<br/><i>Flan-T5 / BART / Llama-3</i>"]
|
| 156 |
-
LORA["LoRA Adapter"]
|
| 157 |
-
GEN_UTILS["Generation Utils<br/><i>beam search, sampling</i>"]
|
| 158 |
-
end
|
| 159 |
-
|
| 160 |
-
subgraph POSTPROCESS["Post-Processing"]
|
| 161 |
-
POSTPROC["Post-Processor<br/><i>formatting, cleanup</i>"]
|
| 162 |
-
VOCAB_SUB["Lexical Substitution<br/><i>BERT-based</i>"]
|
| 163 |
-
AWL["AWL Loader<br/><i>Coxhead Academic Word List</i>"]
|
| 164 |
-
REG_FILTER["Register Filter<br/><i>academic tone gate</i>"]
|
| 165 |
-
end
|
| 166 |
-
|
| 167 |
-
OUTPUT["✅ Corrected Academic Text"]
|
| 168 |
-
|
| 169 |
-
INPUT --> SPELL --> SENT_SEG --> DEP_PARSE --> NER
|
| 170 |
-
INPUT --> FINGER --> EMOTION --> FORMALITY --> STYLE_VEC
|
| 171 |
-
NER --> STYLE_COND
|
| 172 |
-
STYLE_VEC --> STYLE_COND
|
| 173 |
-
STYLE_COND --> BASE_MODEL
|
| 174 |
-
LORA -.->|"merged weights"| BASE_MODEL
|
| 175 |
-
BASE_MODEL --> GEN_UTILS --> POSTPROC
|
| 176 |
-
POSTPROC --> VOCAB_SUB
|
| 177 |
-
AWL --> VOCAB_SUB
|
| 178 |
-
VOCAB_SUB --> REG_FILTER --> OUTPUT
|
| 179 |
-
end
|
| 180 |
-
|
| 181 |
-
%% ── Training Pipeline ────────────────────────────────────────────
|
| 182 |
-
subgraph TRAINING["🏋️ Training Pipeline"]
|
| 183 |
-
direction TB
|
| 184 |
-
|
| 185 |
-
subgraph DATA["Data Pipeline"]
|
| 186 |
-
RAW_DATA["Raw Datasets<br/><i>JFLEG, WI+LOCNESS, C4_200M,<br/>FCE, Lang-8, NUCLE</i>"]
|
| 187 |
-
KAGGLE["Kaggle Datasets<br/><i>Shanegerami, Starblasters8</i>"]
|
| 188 |
-
PREPROC_SCRIPT["preprocess_data.py"]
|
| 189 |
-
TRAIN_JSONL["train.jsonl / val.jsonl / test.jsonl"]
|
| 190 |
-
end
|
| 191 |
-
|
| 192 |
-
subgraph HP_PRETRAIN["Human Pattern Pre-Training"]
|
| 193 |
-
FEAT_EXTRACT["Feature Extractor<br/><i>17-dim: perplexity, burstiness,<br/>n-gram novelty, AI markers...</i>"]
|
| 194 |
-
GPT2["GPT-2<br/><i>perplexity scorer</i>"]
|
| 195 |
-
HP_CLASSIFIER["Human Pattern Classifier<br/><i>MLP: 17→128→64→1</i>"]
|
| 196 |
-
HP_WEIGHTS["human_pattern_classifier.pt"]
|
| 197 |
-
end
|
| 198 |
-
|
| 199 |
-
subgraph MAIN_TRAIN["Main Model Training"]
|
| 200 |
-
DATASET["WritingCorrectionDataset"]
|
| 201 |
-
COMBINED_LOSS["Combined Loss Function"]
|
| 202 |
-
L_CE["L_CE<br/><i>cross-entropy</i>"]
|
| 203 |
-
L_STYLE["λ₁ · L_style<br/><i>style consistency</i>"]
|
| 204 |
-
L_SEM["λ₂ · L_semantic<br/><i>meaning preservation</i>"]
|
| 205 |
-
L_HUMAN["λ₃ · L_human_pattern<br/><i>anti-AI penalty</i>"]
|
| 206 |
-
TRAINER["CorrectionTrainer"]
|
| 207 |
-
CALLBACKS["Callbacks<br/><i>StyleMetrics,<br/>EarlyStoppingOnStyleDrift</i>"]
|
| 208 |
-
end
|
| 209 |
-
|
| 210 |
-
subgraph EVAL["Evaluation"]
|
| 211 |
-
ERRANT["ERRANT Evaluator<br/><i>P / R / F₀.₅</i>"]
|
| 212 |
-
GLEU["GLEU Scorer"]
|
| 213 |
-
STYLE_MET["Style Metrics<br/><i>cosine similarity</i>"]
|
| 214 |
-
AUTH_VER["Authorship Verifier<br/><i>AI detection resistance</i>"]
|
| 215 |
-
end
|
| 216 |
-
|
| 217 |
-
RAW_DATA --> PREPROC_SCRIPT --> TRAIN_JSONL
|
| 218 |
-
KAGGLE --> FEAT_EXTRACT
|
| 219 |
-
GPT2 --> FEAT_EXTRACT --> HP_CLASSIFIER --> HP_WEIGHTS
|
| 220 |
-
TRAIN_JSONL --> DATASET --> TRAINER
|
| 221 |
-
L_CE --> COMBINED_LOSS
|
| 222 |
-
L_STYLE --> COMBINED_LOSS
|
| 223 |
-
L_SEM --> COMBINED_LOSS
|
| 224 |
-
HP_WEIGHTS -.->|"frozen"| L_HUMAN --> COMBINED_LOSS
|
| 225 |
-
COMBINED_LOSS --> TRAINER
|
| 226 |
-
CALLBACKS --> TRAINER
|
| 227 |
-
TRAINER --> EVAL
|
| 228 |
-
end
|
| 229 |
-
|
| 230 |
-
%% ── API Layer ────────────────────────────────────────────────────
|
| 231 |
-
subgraph API["🌐 FastAPI Server"]
|
| 232 |
-
ENDPOINT["/correct endpoint"]
|
| 233 |
-
SCHEMAS["Request / Response Schemas"]
|
| 234 |
-
MIDDLEWARE["Rate Limiting & CORS"]
|
| 235 |
-
CORRECTOR["Corrector<br/><i>orchestrates full pipeline</i>"]
|
| 236 |
-
end
|
| 237 |
-
|
| 238 |
-
ENDPOINT --> CORRECTOR --> INFERENCE
|
| 239 |
-
TRAINER -->|"best_model/"| BASE_MODEL
|
| 240 |
-
|
| 241 |
-
%% ── Styling ──────────────────────────────────────────────────────
|
| 242 |
-
classDef pipeline fill:#1a1a2e,stroke:#16213e,color:#e94560,stroke-width:2px
|
| 243 |
-
classDef module fill:#0f3460,stroke:#533483,color:#e2e2e2,stroke-width:1px
|
| 244 |
-
classDef data fill:#1a1a2e,stroke:#e94560,color:#eee,stroke-width:1px
|
| 245 |
-
classDef output fill:#533483,stroke:#e94560,color:#fff,stroke-width:2px
|
| 246 |
-
|
| 247 |
-
class INPUT,RAW_DATA,KAGGLE,TRAIN_JSONL data
|
| 248 |
-
class OUTPUT,HP_WEIGHTS output
|
| 249 |
-
```
|
| 250 |
|
| 251 |
-
|
| 252 |
|
| 253 |
-
|
| 254 |
|
| 255 |
-
|
|
|
|
|
|
|
| 256 |
|
| 257 |
-
|
| 258 |
-
|---------------|----------|
|
| 259 |
-
| **Hardware constraint** | RTX 3050 Laptop GPU (4GB VRAM) — rules out models > 500M params |
|
| 260 |
-
| **Architecture** | Encoder-decoder (seq2seq) is ideal for text-to-text correction tasks |
|
| 261 |
-
| **Instruction tuning** | Flan-T5 is pre-trained on 1,800+ instruction tasks — follows correction prompts naturally |
|
| 262 |
-
| **LoRA efficiency** | Only 1.28M trainable params (1.63%) — fits in 4GB with batch_size=4 + bf16 |
|
| 263 |
|
| 264 |
-
|
| 265 |
|
| 266 |
-
|
| 267 |
-
- **Speed**: LoRA converges in 5 epochs (~1,515 steps) on a single RTX 3050
|
| 268 |
-
- **Merging**: LoRA weights merge into base model at inference time — zero latency overhead
|
| 269 |
-
- **Configuration**: `r=8, alpha=16, dropout=0.05`, targeting all attention + FFN projections (`q, k, v, o, wi_0, wi_1, wo`)
|
| 270 |
|
| 271 |
-
|
| 272 |
|
| 273 |
-
|
| 274 |
|
| 275 |
-
|
| 276 |
-
|------|---------|--------|
|
| 277 |
-
| `L_CE` | Standard cross-entropy token prediction | 1.0 |
|
| 278 |
-
| `L_style` | `1 - cos_sim(output_style, input_style)` — preserves writing fingerprint | 0.3 |
|
| 279 |
-
| `L_semantic` | `1 - cos_sim(input_embedding, output_embedding)` — preserves meaning | 0.5 |
|
| 280 |
-
| `L_human` | `1 - HumanPatternClassifier(output)` — penalises AI-like text patterns | 0.4 |
|
| 281 |
|
| 282 |
-
|
| 283 |
|
| 284 |
-
|
| 285 |
|
| 286 |
-
|
| 287 |
-
- **Lower GPT-2 perplexity** (AI text is more "predictable")
|
| 288 |
-
- **Lower burstiness** (AI has uniform sentence lengths; humans vary)
|
| 289 |
-
- **Higher AI marker density** (overuse of "delve", "leverage", "furthermore")
|
| 290 |
-
- **Lower n-gram novelty** (AI reuses phrases more)
|
| 291 |
|
| 292 |
-
|
| 293 |
|
| 294 |
-
|
| 295 |
|
| 296 |
-
|
| 297 |
|
| 298 |
-
|
| 299 |
-
2. Grouped into chunks that fit the 128-token budget
|
| 300 |
-
3. Each chunk is corrected independently
|
| 301 |
-
4. Results are joined back together
|
| 302 |
|
| 303 |
-
|
| 304 |
|
| 305 |
-
###
|
| 306 |
|
| 307 |
-
|
| 308 |
|
| 309 |
-
|
| 310 |
-
2. Identify non-AWL content words (nouns, verbs, adjectives, adverbs)
|
| 311 |
-
3. Mask each candidate → run BERT fill-mask → filter to AWL-only predictions
|
| 312 |
-
4. Accept substitution only if `semantic_similarity > 0.82` (measured with `all-mpnet-base-v2`)
|
| 313 |
-
5. Track used substitutions to prevent duplicate replacements
|
| 314 |
|
| 315 |
-
|
| 316 |
|
| 317 |
-
|
| 318 |
|
| 319 |
-
|
| 320 |
|
| 321 |
-
|
| 322 |
-
- NVIDIA GPU with ≥ 4GB VRAM (or CPU, slower)
|
| 323 |
-
- ~10GB disk space for models and datasets
|
| 324 |
|
| 325 |
-
###
|
| 326 |
|
| 327 |
-
|
| 328 |
-
# Clone and setup
|
| 329 |
-
git clone https://huggingface.co/morpheuslord/rewriter && cd rewriter
|
| 330 |
-
pip install -r requirements.txt
|
| 331 |
|
| 332 |
-
|
| 333 |
-
export WANDB_API_KEY="your-key-here"
|
| 334 |
|
| 335 |
-
#
|
| 336 |
-
bash train.sh
|
| 337 |
-
```
|
| 338 |
|
| 339 |
-
|
| 340 |
|
| 341 |
-
|
| 342 |
|
| 343 |
-
|
| 344 |
|
| 345 |
-
```bash
|
| 346 |
-
# 1. Install dependencies
|
| 347 |
-
pip install -r requirements.txt
|
| 348 |
-
python -m spacy download en_core_web_sm
|
| 349 |
|
| 350 |
-
#
|
| 351 |
-
python scripts/preprocess_data.py
|
| 352 |
|
| 353 |
-
|
| 354 |
-
python scripts/pretrain_human_pattern_classifier.py
|
| 355 |
|
| 356 |
-
#
|
| 357 |
-
PYTHONPATH=. python scripts/train.py --config configs/training_config.yaml --use-v2-loss
|
| 358 |
|
| 359 |
-
|
| 360 |
-
python -c "
|
| 361 |
-
from peft import PeftModel
|
| 362 |
-
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
|
| 363 |
-
import torch
|
| 364 |
-
model = AutoModelForSeq2SeqLM.from_pretrained('google/flan-t5-small', torch_dtype=torch.bfloat16)
|
| 365 |
-
model = PeftModel.from_pretrained(model, 'checkpoints/checkpoint-BEST')
|
| 366 |
-
model = model.merge_and_unload()
|
| 367 |
-
model.save_pretrained('checkpoints/best_model_merged')
|
| 368 |
-
AutoTokenizer.from_pretrained('google/flan-t5-small').save_pretrained('checkpoints/best_model_merged')
|
| 369 |
-
"
|
| 370 |
|
| 371 |
-
|
| 372 |
-
PYTHONPATH=. python scripts/run_inference.py --text "The studnet recieved alot of informtion."
|
| 373 |
|
| 374 |
-
#
|
| 375 |
-
PYTHONPATH=. python -m uvicorn src.api.main:app --host 0.0.0.0 --port 8000
|
| 376 |
-
```
|
| 377 |
|
| 378 |
-
---
|
| 379 |
|
| 380 |
-
##
|
| 381 |
|
| 382 |
-
###
|
| 383 |
-
Installs Python packages, downloads spaCy models (`en_core_web_sm`), and NLTK tokenizers.
|
| 384 |
|
| 385 |
-
|
| 386 |
-
Converts raw datasets into unified JSONL format:
|
| 387 |
|
| 388 |
-
|
| 389 |
-
|---------|--------|--------|-------|
|
| 390 |
-
| **FCE v2.1** | BEA-2019 Shared Task | Character-level edits | ~28k |
|
| 391 |
-
| **W&I+LOCNESS v2.1** | BEA-2019 Shared Task | Character-level edits | ~34k |
|
| 392 |
-
| **JFLEG** | Johns Hopkins | 4 reference corrections per source | ~5k |
|
| 393 |
|
| 394 |
-
|
| 395 |
|
| 396 |
-
|
| 397 |
|
| 398 |
-
|
| 399 |
-
Trains a frozen binary MLP classifier on ~100k human vs AI text samples. Uses 17 features:
|
| 400 |
|
| 401 |
-
|
| 402 |
-
[perplexity, burstiness, sentence_starter_diversity,
|
| 403 |
-
bigram_novelty, trigram_novelty, 4gram_novelty,
|
| 404 |
-
ai_marker_density, overused_discourse_density,
|
| 405 |
-
em_dash_rate, ellipsis_rate, comma_rate, semicolon_rate,
|
| 406 |
-
word_count, sentence_count, mean_sent_length, std_sent_length, ttr]
|
| 407 |
-
```
|
| 408 |
|
| 409 |
-
|
| 410 |
|
| 411 |
-
|
| 412 |
-
Fine-tunes Flan-T5-Small with LoRA using the V2 combined loss. Key hyperparameters:
|
| 413 |
|
| 414 |
-
|
| 415 |
-
|-----------|-------|
|
| 416 |
-
| Effective batch size | 32 (4 × 8 gradient accumulation) |
|
| 417 |
-
| Learning rate | 3e-4 (cosine schedule, 5% warmup) |
|
| 418 |
-
| Precision | bf16 (Ampere+ GPUs) |
|
| 419 |
-
| Max input tokens | 128 |
|
| 420 |
-
| Max target tokens | 128 |
|
| 421 |
-
| Epochs | 5 |
|
| 422 |
-
| Eval/Save interval | Every 100 steps |
|
| 423 |
|
| 424 |
-
|
| 425 |
-
Runs on test set with metrics: GLEU, BERTScore F1, ERRANT F0.5, Style Similarity, AWL Coverage.
|
| 426 |
|
| 427 |
-
|
| 428 |
|
| 429 |
-
## Inference Pipeline (7 Steps)
|
| 430 |
-
|
| 431 |
-
```
|
| 432 |
-
Raw Text
|
| 433 |
-
│
|
| 434 |
-
▼
|
| 435 |
-
1. Preprocessing ─────── LanguageTool spell correction + spaCy parsing
|
| 436 |
-
│
|
| 437 |
-
▼
|
| 438 |
-
2. Style Fingerprinting ─ Extract 41 features → MLP → 512-dim vector
|
| 439 |
-
│
|
| 440 |
-
▼
|
| 441 |
-
3. Sentence-Chunked Generation ─ Split into 128-token chunks → Flan-T5 → rejoin
|
| 442 |
-
│
|
| 443 |
-
▼
|
| 444 |
-
4. Post-Processing ───── Remove artifacts, replace em dashes, fix spacing
|
| 445 |
-
│
|
| 446 |
-
▼
|
| 447 |
-
5. Vocabulary Elevation ─ BERT fill-mask → AWL filtering → semantic gate
|
| 448 |
-
│
|
| 449 |
-
▼
|
| 450 |
-
6. Register Filtering ── Expand contractions, replace colloquialisms
|
| 451 |
-
│
|
| 452 |
-
▼
|
| 453 |
-
7. Metrics ──────────── Style similarity, AWL coverage, readability scores
|
| 454 |
-
│
|
| 455 |
-
▼
|
| 456 |
-
Corrected Text
|
| 457 |
-
```
|
| 458 |
|
| 459 |
-
---
|
| 460 |
|
| 461 |
-
##
|
| 462 |
-
|
| 463 |
-
### `configs/training_config.yaml`
|
| 464 |
-
|
| 465 |
-
```yaml
|
| 466 |
-
model:
|
| 467 |
-
key: "flan-t5-small" # flan-t5-xl | flan-t5-large | flan-t5-base | flan-t5-small
|
| 468 |
-
quantize: false # 4-bit NF4 quantization (needs GPU)
|
| 469 |
-
use_lora: true # Parameter-efficient fine-tuning
|
| 470 |
-
|
| 471 |
-
lora:
|
| 472 |
-
r: 8 # LoRA rank (higher = more capacity, more VRAM)
|
| 473 |
-
lora_alpha: 16 # Scaling factor (usually 2×r)
|
| 474 |
-
lora_dropout: 0.05 # Regularisation
|
| 475 |
-
target_modules: [q, v, k, o, wi_0, wi_1, wo] # All attention + FFN layers
|
| 476 |
-
|
| 477 |
-
training:
|
| 478 |
-
per_device_train_batch_size: 4
|
| 479 |
-
gradient_accumulation_steps: 8 # Effective batch = 32
|
| 480 |
-
learning_rate: 3.0e-4
|
| 481 |
-
lr_scheduler_type: cosine
|
| 482 |
-
bf16: true # Use bfloat16 on Ampere+ GPUs
|
| 483 |
-
|
| 484 |
-
loss:
|
| 485 |
-
lambda_style: 0.3 # Style preservation weight
|
| 486 |
-
lambda_semantic: 0.5 # Meaning preservation weight
|
| 487 |
-
lambda_human_pattern: 0.4 # Anti-AI penalty weight
|
| 488 |
-
```
|
| 489 |
-
|
| 490 |
-
### `configs/inference_config.yaml`
|
| 491 |
-
|
| 492 |
-
```yaml
|
| 493 |
-
model:
|
| 494 |
-
key: "flan-t5-small"
|
| 495 |
-
checkpoint_path: "checkpoints/best_model_merged"
|
| 496 |
-
use_lora: false # Merged model — no adapter needed
|
| 497 |
-
|
| 498 |
-
generation:
|
| 499 |
-
num_beams: 5 # Beam search width
|
| 500 |
-
length_penalty: 1.2 # > 1.0 rewards longer outputs
|
| 501 |
-
no_repeat_ngram_size: 3 # Prevents repetition
|
| 502 |
-
max_new_tokens: 128 # Must match training max_target_length
|
| 503 |
-
|
| 504 |
-
vocabulary:
|
| 505 |
-
semantic_threshold: 0.82 # Minimum cosine similarity for AWL substitution
|
| 506 |
-
```
|
| 507 |
|
| 508 |
-
---
|
| 509 |
|
| 510 |
-
|
| 511 |
|
| 512 |
-
|
| 513 |
-
# Start the server
|
| 514 |
-
PYTHONPATH=. python -m uvicorn src.api.main:app --host 0.0.0.0 --port 8000
|
| 515 |
|
| 516 |
-
|
| 517 |
-
curl -X POST http://localhost:8000/correct \
|
| 518 |
-
-H "Content-Type: application/json" \
|
| 519 |
-
-d '{"text": "The studnet recieved alot of informtion.", "style_alpha": 0.6}'
|
| 520 |
|
| 521 |
-
#
|
| 522 |
-
curl http://localhost:8000/health
|
| 523 |
-
```
|
| 524 |
|
| 525 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 526 |
|
| 527 |
-
|
| 528 |
|
| 529 |
-
##
|
| 530 |
|
| 531 |
-
|
| 532 |
-
|------|-----|-------|---------------|
|
| 533 |
-
| **Tested** | RTX 3050 4GB | Flan-T5-Small + LoRA | ~45 min (5 epochs) |
|
| 534 |
-
| Recommended | RTX 3090 24GB | Flan-T5-Base + LoRA | ~2h |
|
| 535 |
-
| Maximum | A100 80GB | Flan-T5-XL + LoRA | ~12h |
|
| 536 |
|
| 537 |
-
|
| 538 |
|
| 539 |
-
|
| 540 |
|
| 541 |
-
##
|
| 542 |
|
| 543 |
-
|
| 544 |
-
|---------|------|------|--------|
|
| 545 |
-
| FCE v2.1 | Learner errors + corrections | ~28k pairs | Cambridge English |
|
| 546 |
-
| W&I+LOCNESS v2.1 | Learner errors + corrections | ~34k pairs | BEA-2019 Shared Task |
|
| 547 |
-
| JFLEG | Fluency corrections (4 refs) | ~5k pairs | Johns Hopkins |
|
| 548 |
-
| Shanegerami AI_Human.csv | Human vs AI classification | ~50k samples | Kaggle |
|
| 549 |
-
| Starblasters8 data.parquet | Human vs AI classification | ~50k samples | Kaggle |
|
| 550 |
-
| Coxhead AWL | Academic Word List | 570 families / 549 headwords | Victoria University |
|
| 551 |
|
| 552 |
-
|
| 553 |
|
| 554 |
-
|
| 555 |
|
| 556 |
-
|
| 557 |
|
| 558 |
-
|
| 559 |
-
|-----------|-----------|---------|
|
| 560 |
-
| Phonetic substitution | 35% | "because" → "becaus" |
|
| 561 |
-
| Letter transposition | 18% | "the" → "teh" |
|
| 562 |
-
| Letter omission | 16% | "important" → "importnt" |
|
| 563 |
-
| Letter doubling | 12% | "letter" → "lettter" |
|
| 564 |
-
| Letter reversal (b/d, p/q) | 10% | "bad" → "dad" |
|
| 565 |
-
| Word boundary errors | 9% | "a lot" → "alot" |
|
| 566 |
|
| 567 |
-
|
| 568 |
|
| 569 |
-
|
| 570 |
|
| 571 |
-
|
| 572 |
|
| 573 |
-
|
| 574 |
-
|-------|----------|-------|
|
| 575 |
-
| Sentence stats | mean, std, skew of sentence lengths | 3 |
|
| 576 |
-
| Word stats | mean, std of word lengths | 2 |
|
| 577 |
-
| Lexical | type-token ratio, lexical density | 2 |
|
| 578 |
-
| Syntactic | passive/active voice ratio, subordinate clause ratio, avg dependency tree depth | 4 |
|
| 579 |
-
| Discourse | 20 academic discourse markers (per 100 words) | 20 |
|
| 580 |
-
| Register | hedging frequency, formality score, nominalization ratio | 3 |
|
| 581 |
-
| Readability | Flesch reading ease, avg syllables per word | 2 |
|
| 582 |
-
| Pronouns | first-person ratio, third-person ratio | 2 |
|
| 583 |
-
| Other | question ratio, exclamation ratio, AWL coverage | 3 |
|
| 584 |
|
| 585 |
-
|
| 586 |
|
| 587 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 588 |
|
| 589 |
-
|
|
|
|
| 590 |
|
| 591 |
-
|
| 592 |
-
2. **Training window**: 128-token max input means very long sentences may be split mid-clause
|
| 593 |
-
3. **Vocabulary elevation**: BERT fill-mask can suggest semantically inappropriate AWL words; the similarity threshold (0.82) is a trade-off between coverage and accuracy
|
| 594 |
-
4. **Already-correct text**: The model is trained on error→correction pairs; feeding it clean text produces unpredictable output
|
| 595 |
-
5. **LanguageTool latency**: Spell correction takes ~15-20s due to JVM startup on first call
|
| 596 |
-
6. **Semantic drift in correction**: Qualitative evaluation reveals the pipeline can introduce meaning-level errors rather than purely correcting surface errors — e.g. dyslexic phonetic patterns misread by LanguageTool produce plausible-but-wrong word substitutions that corrupt the intended meaning. The Style Similarity metric (0.96) does not capture this failure mode, as it measures surface token overlap rather than semantic faithfulness. Future work should add **BERTScore F1** and **Word Error Rate (WER)** against ground-truth corrections as primary evaluation signals, and a dedicated post-correction **semantic faithfulness check** (cosine similarity between input and output sentence embeddings) to flag and reject meaning-drift before returning output.
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model: google/flan-t5-small
|
| 3 |
+
library_name: peft
|
| 4 |
tags:
|
| 5 |
+
- base_model:adapter:google/flan-t5-small
|
|
|
|
|
|
|
|
|
|
| 6 |
- lora
|
| 7 |
+
- transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
+
# Model Card for Model ID
|
| 11 |
|
| 12 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
| 13 |
|
|
|
|
| 14 |
|
|
|
|
| 15 |
|
| 16 |
+
## Model Details
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
+
### Model Description
|
| 19 |
|
| 20 |
+
<!-- Provide a longer summary of what this model is. -->
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
|
|
|
| 23 |
|
| 24 |
+
- **Developed by:** [More Information Needed]
|
| 25 |
+
- **Funded by [optional]:** [More Information Needed]
|
| 26 |
+
- **Shared by [optional]:** [More Information Needed]
|
| 27 |
+
- **Model type:** [More Information Needed]
|
| 28 |
+
- **Language(s) (NLP):** [More Information Needed]
|
| 29 |
+
- **License:** [More Information Needed]
|
| 30 |
+
- **Finetuned from model [optional]:** [More Information Needed]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
### Model Sources [optional]
|
| 33 |
|
| 34 |
+
<!-- Provide the basic links for the model. -->
|
| 35 |
|
| 36 |
+
- **Repository:** [More Information Needed]
|
| 37 |
+
- **Paper [optional]:** [More Information Needed]
|
| 38 |
+
- **Demo [optional]:** [More Information Needed]
|
| 39 |
|
| 40 |
+
## Uses
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 43 |
|
| 44 |
+
### Direct Use
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
| 47 |
|
| 48 |
+
[More Information Needed]
|
| 49 |
|
| 50 |
+
### Downstream Use [optional]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
| 53 |
|
| 54 |
+
[More Information Needed]
|
| 55 |
|
| 56 |
+
### Out-of-Scope Use
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
| 59 |
|
| 60 |
+
[More Information Needed]
|
| 61 |
|
| 62 |
+
## Bias, Risks, and Limitations
|
| 63 |
|
| 64 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
+
[More Information Needed]
|
| 67 |
|
| 68 |
+
### Recommendations
|
| 69 |
|
| 70 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
| 71 |
|
| 72 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
|
| 74 |
+
## How to Get Started with the Model
|
| 75 |
|
| 76 |
+
Use the code below to get started with the model.
|
| 77 |
|
| 78 |
+
[More Information Needed]
|
| 79 |
|
| 80 |
+
## Training Details
|
|
|
|
|
|
|
| 81 |
|
| 82 |
+
### Training Data
|
| 83 |
|
| 84 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
+
[More Information Needed]
|
|
|
|
| 87 |
|
| 88 |
+
### Training Procedure
|
|
|
|
|
|
|
| 89 |
|
| 90 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 91 |
|
| 92 |
+
#### Preprocessing [optional]
|
| 93 |
|
| 94 |
+
[More Information Needed]
|
| 95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
|
| 97 |
+
#### Training Hyperparameters
|
|
|
|
| 98 |
|
| 99 |
+
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
|
|
|
| 100 |
|
| 101 |
+
#### Speeds, Sizes, Times [optional]
|
|
|
|
| 102 |
|
| 103 |
+
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
+
[More Information Needed]
|
|
|
|
| 106 |
|
| 107 |
+
## Evaluation
|
|
|
|
|
|
|
| 108 |
|
| 109 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 110 |
|
| 111 |
+
### Testing Data, Factors & Metrics
|
| 112 |
|
| 113 |
+
#### Testing Data
|
|
|
|
| 114 |
|
| 115 |
+
<!-- This should link to a Dataset Card if possible. -->
|
|
|
|
| 116 |
|
| 117 |
+
[More Information Needed]
|
|
|
|
|
|
|
|
|
|
|
|
|
| 118 |
|
| 119 |
+
#### Factors
|
| 120 |
|
| 121 |
+
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 122 |
|
| 123 |
+
[More Information Needed]
|
|
|
|
| 124 |
|
| 125 |
+
#### Metrics
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
|
| 127 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 128 |
|
| 129 |
+
[More Information Needed]
|
|
|
|
| 130 |
|
| 131 |
+
### Results
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
+
[More Information Needed]
|
|
|
|
| 134 |
|
| 135 |
+
#### Summary
|
| 136 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 137 |
|
|
|
|
| 138 |
|
| 139 |
+
## Model Examination [optional]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
|
| 141 |
+
<!-- Relevant interpretability work for the model goes here -->
|
| 142 |
|
| 143 |
+
[More Information Needed]
|
| 144 |
|
| 145 |
+
## Environmental Impact
|
|
|
|
|
|
|
| 146 |
|
| 147 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
|
|
|
|
|
|
|
|
|
| 148 |
|
| 149 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
|
|
|
|
|
|
| 150 |
|
| 151 |
+
- **Hardware Type:** [More Information Needed]
|
| 152 |
+
- **Hours used:** [More Information Needed]
|
| 153 |
+
- **Cloud Provider:** [More Information Needed]
|
| 154 |
+
- **Compute Region:** [More Information Needed]
|
| 155 |
+
- **Carbon Emitted:** [More Information Needed]
|
| 156 |
|
| 157 |
+
## Technical Specifications [optional]
|
| 158 |
|
| 159 |
+
### Model Architecture and Objective
|
| 160 |
|
| 161 |
+
[More Information Needed]
|
|
|
|
|
|
|
|
|
|
|
|
|
| 162 |
|
| 163 |
+
### Compute Infrastructure
|
| 164 |
|
| 165 |
+
[More Information Needed]
|
| 166 |
|
| 167 |
+
#### Hardware
|
| 168 |
|
| 169 |
+
[More Information Needed]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 170 |
|
| 171 |
+
#### Software
|
| 172 |
|
| 173 |
+
[More Information Needed]
|
| 174 |
|
| 175 |
+
## Citation [optional]
|
| 176 |
|
| 177 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 178 |
|
| 179 |
+
**BibTeX:**
|
| 180 |
|
| 181 |
+
[More Information Needed]
|
| 182 |
|
| 183 |
+
**APA:**
|
| 184 |
|
| 185 |
+
[More Information Needed]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 186 |
|
| 187 |
+
## Glossary [optional]
|
| 188 |
|
| 189 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 190 |
+
|
| 191 |
+
[More Information Needed]
|
| 192 |
+
|
| 193 |
+
## More Information [optional]
|
| 194 |
+
|
| 195 |
+
[More Information Needed]
|
| 196 |
+
|
| 197 |
+
## Model Card Authors [optional]
|
| 198 |
+
|
| 199 |
+
[More Information Needed]
|
| 200 |
+
|
| 201 |
+
## Model Card Contact
|
| 202 |
|
| 203 |
+
[More Information Needed]
|
| 204 |
+
### Framework versions
|
| 205 |
|
| 206 |
+
- PEFT 0.19.1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
adapter_config.json
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"alora_invocation_tokens": null,
|
| 3 |
+
"alpha_pattern": {},
|
| 4 |
+
"arrow_config": null,
|
| 5 |
+
"auto_mapping": null,
|
| 6 |
+
"base_model_name_or_path": "google/flan-t5-small",
|
| 7 |
+
"bias": "none",
|
| 8 |
+
"corda_config": null,
|
| 9 |
+
"ensure_weight_tying": false,
|
| 10 |
+
"eva_config": null,
|
| 11 |
+
"exclude_modules": null,
|
| 12 |
+
"fan_in_fan_out": false,
|
| 13 |
+
"inference_mode": true,
|
| 14 |
+
"init_lora_weights": true,
|
| 15 |
+
"layer_replication": null,
|
| 16 |
+
"layers_pattern": null,
|
| 17 |
+
"layers_to_transform": null,
|
| 18 |
+
"loftq_config": {},
|
| 19 |
+
"lora_alpha": 32,
|
| 20 |
+
"lora_bias": false,
|
| 21 |
+
"lora_dropout": 0.05,
|
| 22 |
+
"lora_ga_config": null,
|
| 23 |
+
"megatron_config": null,
|
| 24 |
+
"megatron_core": "megatron.core",
|
| 25 |
+
"modules_to_save": null,
|
| 26 |
+
"peft_type": "LORA",
|
| 27 |
+
"peft_version": "0.19.1",
|
| 28 |
+
"qalora_group_size": 16,
|
| 29 |
+
"r": 16,
|
| 30 |
+
"rank_pattern": {},
|
| 31 |
+
"revision": null,
|
| 32 |
+
"target_modules": [
|
| 33 |
+
"wo",
|
| 34 |
+
"o",
|
| 35 |
+
"q",
|
| 36 |
+
"wi_0",
|
| 37 |
+
"v",
|
| 38 |
+
"k",
|
| 39 |
+
"wi_1"
|
| 40 |
+
],
|
| 41 |
+
"target_parameters": null,
|
| 42 |
+
"task_type": "SEQ_2_SEQ_LM",
|
| 43 |
+
"trainable_token_indices": null,
|
| 44 |
+
"use_bdlora": null,
|
| 45 |
+
"use_dora": false,
|
| 46 |
+
"use_qalora": false,
|
| 47 |
+
"use_rslora": false
|
| 48 |
+
}
|
adapter_model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:739806c54db7ce3ca21af4278e4160f3ed7feff9f6e09ad03beae7b26aa457c4
|
| 3 |
+
size 10264128
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,114 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"backend": "tokenizers",
|
| 3 |
+
"eos_token": "</s>",
|
| 4 |
+
"extra_ids": 100,
|
| 5 |
+
"extra_special_tokens": [
|
| 6 |
+
"<extra_id_0>",
|
| 7 |
+
"<extra_id_1>",
|
| 8 |
+
"<extra_id_2>",
|
| 9 |
+
"<extra_id_3>",
|
| 10 |
+
"<extra_id_4>",
|
| 11 |
+
"<extra_id_5>",
|
| 12 |
+
"<extra_id_6>",
|
| 13 |
+
"<extra_id_7>",
|
| 14 |
+
"<extra_id_8>",
|
| 15 |
+
"<extra_id_9>",
|
| 16 |
+
"<extra_id_10>",
|
| 17 |
+
"<extra_id_11>",
|
| 18 |
+
"<extra_id_12>",
|
| 19 |
+
"<extra_id_13>",
|
| 20 |
+
"<extra_id_14>",
|
| 21 |
+
"<extra_id_15>",
|
| 22 |
+
"<extra_id_16>",
|
| 23 |
+
"<extra_id_17>",
|
| 24 |
+
"<extra_id_18>",
|
| 25 |
+
"<extra_id_19>",
|
| 26 |
+
"<extra_id_20>",
|
| 27 |
+
"<extra_id_21>",
|
| 28 |
+
"<extra_id_22>",
|
| 29 |
+
"<extra_id_23>",
|
| 30 |
+
"<extra_id_24>",
|
| 31 |
+
"<extra_id_25>",
|
| 32 |
+
"<extra_id_26>",
|
| 33 |
+
"<extra_id_27>",
|
| 34 |
+
"<extra_id_28>",
|
| 35 |
+
"<extra_id_29>",
|
| 36 |
+
"<extra_id_30>",
|
| 37 |
+
"<extra_id_31>",
|
| 38 |
+
"<extra_id_32>",
|
| 39 |
+
"<extra_id_33>",
|
| 40 |
+
"<extra_id_34>",
|
| 41 |
+
"<extra_id_35>",
|
| 42 |
+
"<extra_id_36>",
|
| 43 |
+
"<extra_id_37>",
|
| 44 |
+
"<extra_id_38>",
|
| 45 |
+
"<extra_id_39>",
|
| 46 |
+
"<extra_id_40>",
|
| 47 |
+
"<extra_id_41>",
|
| 48 |
+
"<extra_id_42>",
|
| 49 |
+
"<extra_id_43>",
|
| 50 |
+
"<extra_id_44>",
|
| 51 |
+
"<extra_id_45>",
|
| 52 |
+
"<extra_id_46>",
|
| 53 |
+
"<extra_id_47>",
|
| 54 |
+
"<extra_id_48>",
|
| 55 |
+
"<extra_id_49>",
|
| 56 |
+
"<extra_id_50>",
|
| 57 |
+
"<extra_id_51>",
|
| 58 |
+
"<extra_id_52>",
|
| 59 |
+
"<extra_id_53>",
|
| 60 |
+
"<extra_id_54>",
|
| 61 |
+
"<extra_id_55>",
|
| 62 |
+
"<extra_id_56>",
|
| 63 |
+
"<extra_id_57>",
|
| 64 |
+
"<extra_id_58>",
|
| 65 |
+
"<extra_id_59>",
|
| 66 |
+
"<extra_id_60>",
|
| 67 |
+
"<extra_id_61>",
|
| 68 |
+
"<extra_id_62>",
|
| 69 |
+
"<extra_id_63>",
|
| 70 |
+
"<extra_id_64>",
|
| 71 |
+
"<extra_id_65>",
|
| 72 |
+
"<extra_id_66>",
|
| 73 |
+
"<extra_id_67>",
|
| 74 |
+
"<extra_id_68>",
|
| 75 |
+
"<extra_id_69>",
|
| 76 |
+
"<extra_id_70>",
|
| 77 |
+
"<extra_id_71>",
|
| 78 |
+
"<extra_id_72>",
|
| 79 |
+
"<extra_id_73>",
|
| 80 |
+
"<extra_id_74>",
|
| 81 |
+
"<extra_id_75>",
|
| 82 |
+
"<extra_id_76>",
|
| 83 |
+
"<extra_id_77>",
|
| 84 |
+
"<extra_id_78>",
|
| 85 |
+
"<extra_id_79>",
|
| 86 |
+
"<extra_id_80>",
|
| 87 |
+
"<extra_id_81>",
|
| 88 |
+
"<extra_id_82>",
|
| 89 |
+
"<extra_id_83>",
|
| 90 |
+
"<extra_id_84>",
|
| 91 |
+
"<extra_id_85>",
|
| 92 |
+
"<extra_id_86>",
|
| 93 |
+
"<extra_id_87>",
|
| 94 |
+
"<extra_id_88>",
|
| 95 |
+
"<extra_id_89>",
|
| 96 |
+
"<extra_id_90>",
|
| 97 |
+
"<extra_id_91>",
|
| 98 |
+
"<extra_id_92>",
|
| 99 |
+
"<extra_id_93>",
|
| 100 |
+
"<extra_id_94>",
|
| 101 |
+
"<extra_id_95>",
|
| 102 |
+
"<extra_id_96>",
|
| 103 |
+
"<extra_id_97>",
|
| 104 |
+
"<extra_id_98>",
|
| 105 |
+
"<extra_id_99>"
|
| 106 |
+
],
|
| 107 |
+
"is_local": false,
|
| 108 |
+
"local_files_only": false,
|
| 109 |
+
"model_max_length": 512,
|
| 110 |
+
"pad_token": "<pad>",
|
| 111 |
+
"sp_model_kwargs": {},
|
| 112 |
+
"tokenizer_class": "T5Tokenizer",
|
| 113 |
+
"unk_token": "<unk>"
|
| 114 |
+
}
|