lejelly
/

ds7b-ep3-data10-ood-gsm8k-taskwise-lambda05

Model card Files Files and versions

ds7b-ep3-data10-ood-gsm8k-taskwise-lambda05 / README.md

lejelly's picture

Upload folder using huggingface_hub

f39ffcf verified 4 months ago

|

history blame contribute delete

1.37 kB

	---
	tags:
	- merge
	- task_wise
	- llm-adamerge
	base_model: deepseek-ai/deepseek-coder-7b-base-v1.5
	---

	# Merged Model using LLM-AdaMerge (task_wise)

	This model was created by merging multiple fine-tuned models using the LLM-AdaMerge approach with task_wise merging.

	## Merge Details

	- Merge Type: task_wise
	- Base Model: deepseek-ai/deepseek-coder-7b-base-v1.5
	- Number of Models Merged: 2
	- Models Merged: math, code
	- Final Training Loss: N/A
	- Training Epochs: 0

	## Lambda Coefficients

	The following lambda coefficients were learned during training:


	Task-wise lambda coefficients are stored in the `learned_lambdas.json` file.

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("your-username/model-name")
	tokenizer = AutoTokenizer.from_pretrained("your-username/model-name")

	# Use the model
	inputs = tokenizer("Hello, how are you?", return_tensors="pt")
	outputs = model.generate(**inputs)
	print(tokenizer.decode(outputs[0]))
	```

	## Training Configuration

	See the uploaded `training_config.json` file for detailed training configuration.

	## Citation

	If you use this model, please cite the LLM-AdaMerge paper:

	```bibtex
	@article{llmadamerge2024,
	title={LLM-AdaMerge: Adaptive Model Merging for Large Language Models},
	author={...},
	year={2024}
	}
	```