YModel1.1

structure

  • using SnifferCaptain's LoE (lack of expert) layer as feed forward.
  • using SnifferCaptain's PEGA (Position Embedding Gate Attention) as Transformer attention layer
  • using additional identity link between ffn's intermediate part.
Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support