HRM-Grammar-Light

Multilingual lightweight grammar-correction model based on a decoder-only HRM-like architecture with Adaptive Computation Time (ACT). Trains from prompt+target pairs, supports community contributions (safe contrib branches), and runs efficiently on T4 and A100.

Overview

  • Task: Causal grammar correction given a prompt "corregir {idioma}: ".
  • Architecture: RMSNorm, SwiGLU FFN, multi-head attention with RoPE, separate H/L stacks, pooled ACT halting (Graves-style), optional deep supervision, tied embeddings.
  • Trainer: nnm.py implements data prep, training, validation, and Hub uploads.
  • Hub model: dreuxx26/HRM-Grammar-Light (weights/checkpoints uploaded by the trainer).

Data

Your dataset should provide at least these columns:

  • input_text: Original text to correct. Optionally prefixed like corregir español: ... (if not present, the script infers/cleans it).
  • target_text: Corrected text target.
  • Optional language: If present, it’s normalized; rare classes are collapsed to other for stable splits.

Two ways to load the data:

  1. From Hugging Face Datasets Hub (recommended for contributors)
  • Set env HRM_DATASET_HUB_ID (e.g., your_user/hrm_grammar_light). Optional: HRM_DATASET_CONFIG, HRM_DATASET_SPLIT, HRM_DATASET_REVISION.
  1. From Google Drive (legacy/local)

Training (Colab-friendly)

  • Secrets: Save HF_TOKEN in Colab Secrets to enable Hub uploads.
  • Run nnm.py in Colab (the script auto-mounts Drive and installs deps when in Colab).
  • The trainer builds sequences as: "corregir {idioma}: " + </s> + target + </s> and masks the prompt (labels = -100) so loss is on the target only.

Key features during training:

Community Mode (safe contributions)

Enable contributors to train for 1–2 epochs and upload without touching main:

  • Set HRM_COMMUNITY_MODE=1 and HRM_CONTRIB=<handle>.
  • The script writes a manifest.json and uploads to contrib/<handle>-<timestamp>.
  • Upload gating: by default, only uploads when validation loss improves. To allow uploads even without improvement, set HRM_ALLOW_REGRESSION=1.
  • Commit message includes BEST when it’s the best val loss so far.

Important environment variables

  • Project
    • HF_REPO_ID: Model repo to upload (default: dreuxx26/HRM-Grammar-Light).
  • Data
    • HRM_DATASET_HUB_ID, HRM_DATASET_CONFIG, HRM_DATASET_SPLIT, HRM_DATASET_REVISION.
    • LOCAL_DATASET_PATH: Drive folder with .arrow files (fallback when no Hub dataset).
  • Training
    • HRM_EPOCHS (default 10), HRM_BLOCK_SIZE (default 256), HRM_BATCH_SIZE (default 8), HRM_LR (default 3e-4), HRM_WARMUP_RATIO (default 0.1).
    • HRM_EFF_BATCH: target effective batch size (auto-adjusts grad accumulation).
    • HRM_CHECKPOINTING (1/0), HRM_ACT_STEPS (default 8), HRM_ACT_EPS (default 0.01).
    • HRM_DEEP_SUPERVISION (0/1), HRM_DS_WEIGHT (0.2 by default).
  • Community/Uploads
    • HRM_COMMUNITY_MODE (0/1), HRM_CONTRIB, HRM_UPLOAD_BRANCH (optional explicit branch), HRM_ALLOW_REGRESSION (0/1).

T4 tips

  • Keep HRM_BLOCK_SIZE at 192–256.
  • Use HRM_EFF_BATCH to reach your desired effective batch while avoiding OOM.
  • Leave checkpointing on (HRM_CHECKPOINTING=1).
  • Mixed precision is automatic (fp16 on T4).

How uploads work

  • If HF_TOKEN is present, the trainer saves weights (pytorch_model.bin), a checkpoint state (local_training_state.pt), and manifest.json locally.
  • If in Community Mode, it uploads to contrib/<handle>-<timestamp>; otherwise to main.
  • Uploads are gated by validation improvement unless HRM_ALLOW_REGRESSION=1.

Troubleshooting

  • OOM on T4: lower HRM_BLOCK_SIZE, increase HRM_GRAD_ACCUM via HRM_EFF_BATCH, keep checkpointing on.
  • HF auth errors: set HF_TOKEN in Colab Secrets or as an environment variable.
  • Dataset errors: verify HRM_DATASET_HUB_ID exists and is public (or authenticated), or ensure Arrow files are in the Drive path.

License & contributions

  • This project is licensed under Apache-2.0. You’re free to use, modify, and distribute—with attribution and notice.
  • Contributions are welcome via Community Mode PRs/branches on the Hub.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support