Do you need fine-tune after merging?

by tanganke - opened Jan 12, 2024

Discussion

tanganke

Jan 12, 2024

•

edited Jan 12, 2024

Great model. I wonder know how did you get the weights for the MoE routers?

cloudyu

Owner Jan 12, 2024

don't need fine-tune, because only two experts

tanganke

Jan 12, 2024

Thanks for your time!

I am also trying to construct a MoE model like this using mergekit.
The configuration needs to specify a base model and positive prompts. How did you set these?

base_model: ???
gate_mode: hidden
dtype: float32

experts:
  - source_model: NurtureAI/neural-chat-7b-v3-16k # https://huggingface.co/NurtureAI/neural-chat-7b-v3-16k
    positive_prompts:
      - "???"
    #   (optional)
    # negative_prompts:
    #   - "This is a prompt expert_model_1 should not be used for"
  - source_model: mncai/mistral-7b-dpo-v6 # https://huggingface.co/mncai/mistral-7b-dpo-v6
    positive_prompts:
      - "???"

cloudyu

Owner Jan 12, 2024

•

edited Jan 12, 2024

You have to try every candidate and then locally test the model performance by https://github.com/EleutherAI/lm-evaluation-harness.
I use hellaswag metric only and some manual testing.
You will find the best setting sooner or later.
Good luck!

tanganke changed discussion status to closed Apr 26, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment