There is a LayerNorm (post_layernorm) following the last layer of ViT, which is followed by a RMSNorm (ln_q from VLPatchMerger).
post_layernorm
ln_q
VLPatchMerger
Is there any special consideration on cascading two Norms?
Norm
· Sign up or log in to comment