metadata
license: apache-2.0
base_model:
- nomic-ai/nomic-bert-2048
This is the 26 categorical finetune of nomic-bert-2048's encoder.
130,000,000 - 4-30 masked token samples with 80% mask rate
253,952,000 - 77 token samples with 20% mask rate
The model has learned to categorize certain masked patterns with their categories and special tokens.
<subject>
<subject1>
<subject2>
<pose>
<emotion>
<surface>
<lighting>
<material>
<accessory>
<footwear>
<upper_body_clothing>
<hair_style>
<hair_length>
<headwear>
<texture>
<pattern>
<grid>
<zone>
<offset>
<object_left>
<object_right>
<relation>
<intent>
<style>
<fabric>
<jewelry>
With the categorical shunts;
[SHUNT_1000000]
[SHUNT_1000001]
[SHUNT_1000002]
[SHUNT_1000003]
[SHUNT_1000004]
[SHUNT_1000005]
[SHUNT_1000006]
[SHUNT_1000007]
[SHUNT_1000008]
[SHUNT_1000009]
[SHUNT_1000010]
[SHUNT_1000011]
[SHUNT_1000012]
[SHUNT_1000013]
[SHUNT_1000014]
[SHUNT_1000015]
[SHUNT_1000016]
[SHUNT_1000017]
[SHUNT_1000018]
[SHUNT_1000019]
[SHUNT_1000020]
[SHUNT_1000021]
[SHUNT_1000022]
[SHUNT_1000023]
[SHUNT_1000024]
[SHUNT_1000025]
Each shunt meant to activate cross-categorical conceptualization within their 77 token window.