Skip to content

Size mismatch when loading 60M no tri checkpoint #13

@WesleyHsieh0806

Description

@WesleyHsieh0806

Hi,

I tried to load the 60M no tri model and encountered the error messages below.
There seems to be a misalignment between the config and the checkpoint.

RuntimeError: Error(s) in loading state_dict for Proteina:
        size mismatch for nn.init_repr_factory.linear_out.weight: copying a param with shape torch.Size([512, 200]) from checkpoint, the shape in current model is torch.Size([512, 132]).
        size mismatch for nn.cond_factory.feat_creators.1.embedding_C.weight: copying a param with shape torch.Size([6, 196]) from checkpoint, the shape in current model is torch.Size([6, 256]).
        size mismatch for nn.cond_factory.feat_creators.1.embedding_A.weight: copying a param with shape torch.Size([44, 196]) from checkpoint, the shape in current model is torch.Size([44, 256]).
        size mismatch for nn.cond_factory.feat_creators.1.embedding_T.weight: copying a param with shape torch.Size([1473, 196]) from checkpoint, the shape in current model is torch.Size([1473, 256]).
        size mismatch for nn.cond_factory.linear_out.weight: copying a param with shape torch.Size([128, 784]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for nn.transition_c_1.swish_linear.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([2048, 512]).
        size mismatch for nn.transition_c_1.linear_out.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for nn.transition_c_2.swish_linear.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([2048, 512]).
        size mismatch for nn.transition_c_2.linear_out.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
        size mismatch for nn.pair_repr_builder.init_repr_factory.ln_out.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.pair_repr_builder.init_repr_factory.ln_out.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.pair_repr_builder.init_repr_factory.linear_out.weight: copying a param with shape torch.Size([196, 319]) from checkpoint, the shape in current model is torch.Size([256, 319]).
        size mismatch for nn.pair_repr_builder.cond_factory.ln_out.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.pair_repr_builder.cond_factory.ln_out.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.pair_repr_builder.cond_factory.linear_out.weight: copying a param with shape torch.Size([128, 196]) from checkpoint, the shape in current model is torch.Size([512, 256]).
        size mismatch for nn.pair_repr_builder.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.pair_repr_builder.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.pair_repr_builder.adaln.to_gamma.0.weight: copying a param with shape torch.Size([196, 128]) from checkpoint, the shape in current model is torch.Size([256, 512]).
        size mismatch for nn.pair_repr_builder.adaln.to_gamma.0.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.pair_repr_builder.adaln.to_beta.weight: copying a param with shape torch.Size([196, 128]) from checkpoint, the shape in current model is torch.Size([256, 512]).
        size mismatch for nn.transformer_layers.0.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.0.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.0.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.0.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.0.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
        size mismatch for nn.transformer_layers.0.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for nn.transformer_layers.0.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.0.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.0.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.0.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.0.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.0.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.0.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.0.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
        size mismatch for nn.transformer_layers.0.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.0.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.0.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.0.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.0.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.0.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.0.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.0.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.1.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.1.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.1.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.1.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.1.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
        size mismatch for nn.transformer_layers.1.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for nn.transformer_layers.1.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.1.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.1.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.1.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.1.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.1.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.1.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.1.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
        size mismatch for nn.transformer_layers.1.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.1.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.1.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.1.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.1.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.1.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.1.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.1.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.2.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.2.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.2.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.2.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.2.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
        size mismatch for nn.transformer_layers.2.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for nn.transformer_layers.2.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.2.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.2.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.2.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.2.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.2.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.2.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.2.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
        size mismatch for nn.transformer_layers.2.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.2.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.2.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.2.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.2.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.2.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.2.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.2.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.3.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.3.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.3.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.3.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.3.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
        size mismatch for nn.transformer_layers.3.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for nn.transformer_layers.3.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.3.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.3.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.3.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.3.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.3.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.3.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.3.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
        size mismatch for nn.transformer_layers.3.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.3.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.3.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.3.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.3.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.3.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.3.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.3.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.4.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.4.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.4.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.4.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.4.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
        size mismatch for nn.transformer_layers.4.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for nn.transformer_layers.4.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.4.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.4.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.4.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.4.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.4.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.4.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.4.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
        size mismatch for nn.transformer_layers.4.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.4.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.4.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.4.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.4.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.4.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.4.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.4.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.5.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.5.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.5.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.5.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.5.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
        size mismatch for nn.transformer_layers.5.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for nn.transformer_layers.5.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.5.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.5.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.5.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.5.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.5.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.5.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.5.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
        size mismatch for nn.transformer_layers.5.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.5.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.5.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.5.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.5.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.5.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.5.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.5.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.6.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.6.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.6.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.6.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.6.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
        size mismatch for nn.transformer_layers.6.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for nn.transformer_layers.6.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.6.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.6.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.6.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.6.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.6.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.6.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.6.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
        size mismatch for nn.transformer_layers.6.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.6.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.6.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.6.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.6.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.6.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.6.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.6.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.7.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.7.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.7.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.7.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.7.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
        size mismatch for nn.transformer_layers.7.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for nn.transformer_layers.7.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.7.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.7.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.7.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.7.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.7.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.7.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.7.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
        size mismatch for nn.transformer_layers.7.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.7.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.7.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.7.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.7.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.7.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.7.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.7.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.8.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.8.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.8.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.8.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.8.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
        size mismatch for nn.transformer_layers.8.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for nn.transformer_layers.8.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.8.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.8.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.8.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.8.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.8.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.8.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.8.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
        size mismatch for nn.transformer_layers.8.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.8.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.8.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.8.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.8.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.8.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.8.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.8.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.9.mhba.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.9.mhba.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.9.mhba.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.9.mhba.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.9.mhba.mha.to_qkv.weight: copying a param with shape torch.Size([1512, 512]) from checkpoint, the shape in current model is torch.Size([1536, 512]).
        size mismatch for nn.transformer_layers.9.mhba.mha.to_qkv.bias: copying a param with shape torch.Size([1512]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for nn.transformer_layers.9.mhba.mha.to_g.weight: copying a param with shape torch.Size([504, 512]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.9.mhba.mha.to_g.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.9.mhba.mha.to_out_node.weight: copying a param with shape torch.Size([512, 504]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.9.mhba.mha.q_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.9.mhba.mha.q_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.9.mhba.mha.k_layer_norm.weight: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.9.mhba.mha.k_layer_norm.bias: copying a param with shape torch.Size([504]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.9.mhba.mha.to_bias.weight: copying a param with shape torch.Size([12, 196]) from checkpoint, the shape in current model is torch.Size([8, 256]).
        size mismatch for nn.transformer_layers.9.mhba.mha.pair_norm.weight: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.9.mhba.mha.pair_norm.bias: copying a param with shape torch.Size([196]) from checkpoint, the shape in current model is torch.Size([256]).
        size mismatch for nn.transformer_layers.9.mhba.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.9.transition.adaln.norm_cond.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.9.transition.adaln.norm_cond.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
        size mismatch for nn.transformer_layers.9.transition.adaln.to_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.9.transition.adaln.to_beta.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).
        size mismatch for nn.transformer_layers.9.transition.scale_output.to_adaln_zero_gamma.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([512, 512]).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions