bert-pretrained-edu / README.md
ajeet9843's picture
Upload folder using huggingface_hub
7cfd188 verified
|
raw
history blame
600 Bytes

BERT-PRETRAINED-EDU

Overview

This is a BERT-style masked language model pretrained from scratch using streaming data from FineWeb-Edu.

Architecture

  • Hidden size: 384
  • Layers: 6
  • Heads: 6
  • Sequence length: 128
  • Objective: Masked Language Modeling (MLM)

Training

  • Dataset: HuggingFaceFW/fineweb-edu (streaming)
  • Steps: ~20,000
  • GPUs: Dual GPU (DDP)
  • Mixed Precision Training

Intended Use

  • Fine-tuning for:
    • Sentiment classification
    • Document classification
    • Retrieval / RAG encoder
    • NLP research

Limitations

  • Not instruction-tuned
  • Not chat-optimized