AbstractPhil
/

penta-vit-experiments

Zero-Shot Classification

Model card Files Files and versions

Metrics Training metrics Community

AbstractPhil commited on Sep 16, 2025

Commit

dccfee9

·

verified ·

1 Parent(s): 76e69ea

Update README.md

Files changed (1) hide show

README.md +24 -0

README.md CHANGED Viewed

@@ -4,6 +4,30 @@ datasets:
 - AbstractPhil/geometric-vocab
 pipeline_tag: zero-shot-classification
 ---
 # Likely reintroduce the theta head tomorrow
 The theta trains were actually not that bad. The head added some overhead but not really that much and the outcome improved, so it's worth exploring more.

 - AbstractPhil/geometric-vocab
 pipeline_tag: zero-shot-classification
 ---
+# After a big notebook refactor
+I have pushed the updated model code, and included the loader. I will not include the losses or the training methodology until the full process is prepared and the paper published. After which you will see exactly what I've developed and why each piece exists. Until then there are only breadcrumbs and inference code.
+I released a new version of eval with the new version of the model code.
+* Model load/save code has been streamlined, so it should correctly include the variant information each checkpoint now.
+* Multiple formula quirks that were contributing to invalidity and incorrect truths, contributing to negation
+* Cascading errors from zero due to silent unseen internal model deviance which have been corrected with careful entropy usage
+* Faulty contributions from multiple highly-responsible losses required to sustain complexity while introducing variance.
+* Integrated the cutmix again which had been omitted due to instability with the earlier variant.
+Tech debt smashed.
+Okay next up; the last system's variant appeared to be capped at around 55% no matter the size. With the correct formulas this still may not be sufficient. More than likely the entire feature will need to be reimagined, the patch size altered to 16, and the full imagenet 256 variant trained.
+First though, the small one has to be cohesive enough.
+## Note
+This is custom code to load/save the models. Be sure to always review custom code from any source before running it in a project.
 # Likely reintroduce the theta head tomorrow
 The theta trains were actually not that bad. The head added some overhead but not really that much and the outcome improved, so it's worth exploring more.