Fix _init_weights and RotaryEmbedding for transformers v5.x compatibility

#10

by apsys - opened 3 days ago

base: refs/heads/main

←

from: refs/pr/10

Discussion Files changed

+19

-7

apsys

3 days ago

•

edited 3 days ago

Fix _init_weights and RotaryEmbedding initialization (for transformers 5.x)

_init_weights was using .data.normal_() directly on tensors, which bypasses the _is_hf_initialized guard in transformers v5.x. Since v5.x loads on meta device first then calls initialize_weights() post-checkpoint, this was silently re-randomizing every Linear and Embedding after from_pretrained. Model loads fine, outputs garbage. Switched to torch.nn.init.normal_() / zeros_() so the guard works.

Also, RotaryEmbedding.__init__ KeyErrors on "default" rope type - ROPE_INIT_FUNCTIONS just doesn't have that key, and Ring-mini-2.0 has rope_scaling=None so it always hits this path. Handled default inline. While at it, forced float32 for the inv_freq computation because rope_theta=600k overflows bf16 trivially.

Fix _init_weights and RotaryEmbedding initializationb0c5f624

apsys changed pull request status to open 3 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment