YModel1.1

structure

using SnifferCaptain's LoE (lack of expert) layer as feed forward.
using SnifferCaptain's PEGA (Position Embedding Gate Attention) as Transformer attention layer
using additional identity link between ffn's intermediate part.