| | --- |
| | tags: |
| | - merge |
| | - task_wise |
| | - llm-adamerge |
| | base_model: deepseek-ai/deepseek-coder-7b-base-v1.5 |
| | --- |
| | |
| | # Merged Model using LLM-AdaMerge (task_wise) |
| | |
| | This model was created by merging multiple fine-tuned models using the LLM-AdaMerge approach with task_wise merging. |
| |
|
| | ## Merge Details |
| |
|
| | - **Merge Type**: task_wise |
| | - **Base Model**: deepseek-ai/deepseek-coder-7b-base-v1.5 |
| | - **Number of Models Merged**: 2 |
| | - **Models Merged**: math, code |
| | - **Final Training Loss**: N/A |
| | - **Training Epochs**: 0 |
| | |
| | ## Lambda Coefficients |
| | |
| | The following lambda coefficients were learned during training: |
| | |
| | |
| | Task-wise lambda coefficients are stored in the `learned_lambdas.json` file. |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | model = AutoModelForCausalLM.from_pretrained("your-username/model-name") |
| | tokenizer = AutoTokenizer.from_pretrained("your-username/model-name") |
| | |
| | # Use the model |
| | inputs = tokenizer("Hello, how are you?", return_tensors="pt") |
| | outputs = model.generate(**inputs) |
| | print(tokenizer.decode(outputs[0])) |
| | ``` |
| |
|
| | ## Training Configuration |
| |
|
| | See the uploaded `training_config.json` file for detailed training configuration. |
| |
|
| | ## Citation |
| |
|
| | If you use this model, please cite the LLM-AdaMerge paper: |
| |
|
| | ```bibtex |
| | @article{llmadamerge2024, |
| | title={LLM-AdaMerge: Adaptive Model Merging for Large Language Models}, |
| | author={...}, |
| | year={2024} |
| | } |
| | ``` |
| |
|