turns out LayerNorm also has weight and bias and needs to be pre-multiplied...
turns out LayerNorm also has weight and bias and needs to be pre-multiplied and trained for hypernets
Showing
Please register or sign in to comment
turns out LayerNorm also has weight and bias and needs to be pre-multiplied and trained for hypernets