gsaltintas's picture
Upload folder using huggingface_hub
ebe2977 verified
metadata
license: mit
host_model:
  - toksuite/meta-llama-Llama-3.2-1B
  - toksuite/Qwen-Qwen3-8B
tags:
  - merge
  - parameter-averaging
  - flexitok

Merged Model: qwen_onto_llama_lambda-1-nse-random

This model is a result of parameter averaging (Model Soup) across 2 models.

Merged Models

The following models were included in the merge:

  • toksuite/meta-llama-Llama-3.2-1B
  • toksuite/Qwen-Qwen3-8B

Merging Configuration

  • Method: Weighted Parameter Averaging
  • Weights: Simple average with merging lambda = 1.0.
  • Excluded Layers: Embeddings and LM Head were kept from the host model (toksuite/meta-llama-Llama-3.2-1B).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("flexitok/qwen_onto_llama_lambda-1-nse-random")
tokenizer = AutoTokenizer.from_pretrained("flexitok/qwen_onto_llama_lambda-1-nse-random")