Update README.md

3569ce8 verified 11 months ago

1.18 kB

license: apache-2.0
base_model:
  - nomic-ai/nomic-bert-2048

This is the 26 categorical finetune of nomic-bert-2048's encoder.

130,000,000 - 4-30 masked token samples with 80% mask rate

253,952,000 - 77 token samples with 20% mask rate

The model has learned to categorize certain masked patterns with their categories and special tokens.

<subject>
<subject1>
<subject2>
<pose>
<emotion>
<surface>
<lighting>
<material>
<accessory>
<footwear>
<upper_body_clothing>
<hair_style>
<hair_length>
<headwear>
<texture>
<pattern>
<grid>
<zone>
<offset>
<object_left>
<object_right>
<relation>
<intent>
<style>
<fabric>
<jewelry>

With the categorical shunts;

[SHUNT_1000000]
[SHUNT_1000001]
[SHUNT_1000002]
[SHUNT_1000003]
[SHUNT_1000004]
[SHUNT_1000005]
[SHUNT_1000006]
[SHUNT_1000007]
[SHUNT_1000008]
[SHUNT_1000009]
[SHUNT_1000010]
[SHUNT_1000011]
[SHUNT_1000012]
[SHUNT_1000013]
[SHUNT_1000014]
[SHUNT_1000015]
[SHUNT_1000016]
[SHUNT_1000017]
[SHUNT_1000018]
[SHUNT_1000019]
[SHUNT_1000020]
[SHUNT_1000021]
[SHUNT_1000022]
[SHUNT_1000023]
[SHUNT_1000024]
[SHUNT_1000025]

Each shunt meant to activate cross-categorical conceptualization within their 77 token window.