liuyao
/

QLNet

Image Classification

timm

PDE

ConvNet

Model card Files Files and versions

xet

Community

liuyao commited on Dec 6, 2023

Commit

05605d8

1 Parent(s): a9fb1d4

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ tags:
 # Model Card for Model ID
-Based on a class of partial differential equations called **quasi-linear hyperbolic systems** [[Liu et al, 2023](https://github.com/liuyao12/ConvNets-PDE-perspective)], the QLNet makes an entry into uncharted waters of ConvNet model space marked by the use of (element-wise) multiplication in lieu of ReLU as the primary nonlinearity. It achieves comparable performance as ResNet50 on ImageNet-1k (acc=**78.61**), demonstrating that it has the same level of capacity/expressivity, and deserves more analysis and study (hyper-paremeter tuning, optimizer, etc.) by the academic community.
 ![](https://huggingface.co/liuyao/QLNet/resolve/main/PDE_perspective.jpeg)
@@ -24,13 +24,13 @@ One notable feature is that the architecture (trained or not) admits a *continuo
 FAQ (as the author imagines):
 - Q: Who needs another ConvNet, when the SOTA for ImageNet-1k is now in the low 80s with models of comparable size?
-- A: Aside from shortage of resources to perform extensive experiments, the real answer is that the new symmetry has the potential to be exploited (e.g., symmetry-aware optimization). The non-activation nonlinearity does have more "naturalness" (coordinate independence) that is innate in many equations in mathematics and physics. Activation is but a legacy from the early days of models inspired by *biological* neural networks.
 - Q: Multiplication is too simple, someone must have tried it?
 - A: Perhaps. My bet is whoever tried it soon found the model fail to train with standard ReLU. Without the belief in the underlying PDE perspective, maybe it wasn't pushed to its limit.
 - Q: Is it not similar to attention in Transformer?
 - A: It is, indeed. It's natural to wonder if the activation functions in Transformer could be removed (or reduced) while still achieve comparable performance.
 - Q: If the weight/parameter space has a symmetry (other than permutations), perhaps there's redundancy in the weights.
-- A: The transformation in the demo indeed can be used to reduce the weights from the get-go. However, there are variants of the model that admit an even large symmetry that would be hard to remove. It is also related to the phenomenon of "flat minima" found empirically in conventional deep neural networks.
 *This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).*

 # Model Card for Model ID
+Based on a class of partial differential equations called **quasi-linear hyperbolic systems** [[Liu et al, 2023](https://github.com/liuyao12/ConvNets-PDE-perspective)], the QLNet breaks into uncharted waters of ConvNet model space marked by the use of (element-wise) multiplication in lieu of ReLU as the primary nonlinearity. It achieves comparable performance as ResNet50 on ImageNet-1k (acc=**78.61**), demonstrating that it has the same level of capacity/expressivity, and deserves more analysis and study (hyper-paremeter tuning, optimizer, etc.) by the academic community.
 ![](https://huggingface.co/liuyao/QLNet/resolve/main/PDE_perspective.jpeg)
 FAQ (as the author imagines):
 - Q: Who needs another ConvNet, when the SOTA for ImageNet-1k is now in the low 80s with models of comparable size?
+- A: Aside from lack of resources to perform extensive experiments, the real answer is that the new symmetry has the potential to be exploited (e.g., symmetry-aware optimization). The non-activation nonlinearity does have more "naturalness" (coordinate independence) that is innate in many equations in mathematics and physics. Activation is but a legacy from the early days of models inspired by *biological* neural networks.
 - Q: Multiplication is too simple, someone must have tried it?
 - A: Perhaps. My bet is whoever tried it soon found the model fail to train with standard ReLU. Without the belief in the underlying PDE perspective, maybe it wasn't pushed to its limit.
 - Q: Is it not similar to attention in Transformer?
 - A: It is, indeed. It's natural to wonder if the activation functions in Transformer could be removed (or reduced) while still achieve comparable performance.
 - Q: If the weight/parameter space has a symmetry (other than permutations), perhaps there's redundancy in the weights.
+- A: The transformation in our demo indeed can be used to reduce the weights from the get-go. However, there are variants of the model that admit a much larger symmetry. It is also related to the phenomenon of "flat minima" found empirically in some conventional neural networks.
 *This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).*