How does the MoE work?
#5
by PacmanIncarnate - opened
Can you explain how the gate picks between the two included models? Was there any additional merge system involved as well, or simple MoE?
Also, did you fine-tuned after the merge?
Can you explain how the gate picks between the two included models? Was there any additional merge system involved as well, or simple MoE?
Also, did you fine-tuned after the merge?