Correction: modeling_nemotron_h.py
#6
by
JennBing
- opened
line 625: B = B.repeat(1, 1, self.num_heads // self.n_groups, 1)
line 626: C = C.repeat(1, 1, self.num_heads // self.n_groups, 1)
Should be -
line 625: B = torch.repeat_interleave(B, self.num_heads // self.n_groups, dim=2)
line 626: C = torch.repeat_interleave(C, self.num_heads // self.n_groups, dim=2)