Model Card

This model was trained for the purposes of analysing model utility when trained on various Derived Text Formats.
These are versions of the same text that are adjusted to reduce the chances that the original text can ever be extracted from the model, with applications in privacy and copyright infringement protection. In this case, the model was trained on only the dataset's nouns.

The dataset used for these experiments is codelion/fineweb-edu-1B, with all obfuscated formats found here.

Training Configuration

The model was trained using the following key hyperparameters:

Model Architecture

  • Base Architecture: BERT (base, cased)
  • Hidden Size: 768
  • Number of Layers: 12
  • Attention Heads: 12
  • Intermediate Size: 3072
  • Max Sequence Length: 512
  • Activation Function: GELU
  • Normalization: Layer normalization (pre-norm)
  • Position Embeddings: Learned

Training Hyperparameters

  • Objective: Masked Language Modeling (MLM)
  • Optimizer: AdamW (8-bit quantized)
  • Learning Rate: 1e-4
  • Weight Decay: 1e-5
  • Warmup Steps: 10,000
  • Warmup Decay: 0.1
  • Max Steps: 150,000
  • Precision: bfloat16 mixed precision
  • Batch Size: 16 (train and validation)

Dataset

  • Training Data: DanielGallagherIRE/fineweb-edu-1B-obfuscated
  • Tokenizer: bert-base-cased
Downloads last month
21
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DanielGallagherIRE/obfuscated-bert-fineweb-1B-noun

Finetuned
(2774)
this model

Dataset used to train DanielGallagherIRE/obfuscated-bert-fineweb-1B-noun

Collection including DanielGallagherIRE/obfuscated-bert-fineweb-1B-noun