AbstractPhil commited on
Commit
dccfee9
·
verified ·
1 Parent(s): 76e69ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -4,6 +4,30 @@ datasets:
4
  - AbstractPhil/geometric-vocab
5
  pipeline_tag: zero-shot-classification
6
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  # Likely reintroduce the theta head tomorrow
8
 
9
  The theta trains were actually not that bad. The head added some overhead but not really that much and the outcome improved, so it's worth exploring more.
 
4
  - AbstractPhil/geometric-vocab
5
  pipeline_tag: zero-shot-classification
6
  ---
7
+ # After a big notebook refactor
8
+
9
+ I have pushed the updated model code, and included the loader. I will not include the losses or the training methodology until the full process is prepared and the paper published. After which you will see exactly what I've developed and why each piece exists. Until then there are only breadcrumbs and inference code.
10
+
11
+ I released a new version of eval with the new version of the model code.
12
+
13
+
14
+ * Model load/save code has been streamlined, so it should correctly include the variant information each checkpoint now.
15
+ * Multiple formula quirks that were contributing to invalidity and incorrect truths, contributing to negation
16
+ * Cascading errors from zero due to silent unseen internal model deviance which have been corrected with careful entropy usage
17
+ * Faulty contributions from multiple highly-responsible losses required to sustain complexity while introducing variance.
18
+ * Integrated the cutmix again which had been omitted due to instability with the earlier variant.
19
+
20
+ Tech debt smashed.
21
+
22
+ Okay next up; the last system's variant appeared to be capped at around 55% no matter the size. With the correct formulas this still may not be sufficient. More than likely the entire feature will need to be reimagined, the patch size altered to 16, and the full imagenet 256 variant trained.
23
+
24
+ First though, the small one has to be cohesive enough.
25
+
26
+ ## Note
27
+
28
+ This is custom code to load/save the models. Be sure to always review custom code from any source before running it in a project.
29
+
30
+
31
  # Likely reintroduce the theta head tomorrow
32
 
33
  The theta trains were actually not that bad. The head added some overhead but not really that much and the outcome improved, so it's worth exploring more.