DanielGallagherIRE/fineweb-edu-1B-obfuscated
Viewer • Updated • 1.19M • 346
This model was trained for the purposes of analysing model utility when trained on various Derived Text Formats.
These are versions of the same text that are adjusted to reduce the chances that the original text can ever be extracted from the model, with applications in privacy and copyright infringement protection.
In this case, the model was trained on the dataset after a process of hierarchical scrambling so as to reverse the heads in a dependency tree.
The dataset used for these experiments is codelion/fineweb-edu-1B, with all obfuscated formats found here.
The model was trained using the following key hyperparameters:
Base model
google-bert/bert-base-cased