turns out LayerNorm also has weight and bias and needs to be pre-multiplied and trained for hypernets
Attach a file by drag & drop or click to upload