The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training
Paper
• 2603.10444 • Published
• 8
None defined yet.
device=["cuda:0", "cuda:1"] or device=["cpu"]*4 on the model.predict or model.rank calls.dataset_id, e.g. dataset_id="lightonai/NanoBEIR-de" for the German benchmark.output_scores=True to get similarity scores returned. This can be useful for some distillation losses!