Davidsv
/

Qwen15-DeepSeekCoder-Merge

@@ -1,23 +1,37 @@
 ---
 base_model:
 - Qwen/Qwen1.5-7B-Chat
 - deepseek-ai/deepseek-coder-6.7b-instruct
 tags:
 - merge
 - mergekit
-- lazymergekit
-- Qwen/Qwen1.5-7B-Chat
-- deepseek-ai/deepseek-coder-6.7b-instruct
 ---
-# Qwen15-DeepSeekCoder-Merge
-Qwen15-DeepSeekCoder-Merge is a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
-* [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat)
-* [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct)
-## 🧩 Configuration
 ```yaml
 slices:
   - sources:
@@ -32,27 +46,26 @@ parameters:
 dtype: bfloat16
 ```
-## 💻 Usage
-```python
-!pip install -qU transformers accelerate
-from transformers import AutoTokenizer
-import transformers
-import torch
-model = "Davidsv/Qwen15-DeepSeekCoder-Merge"
-messages = [{"role": "user", "content": "What is a large language model?"}]
-tokenizer = AutoTokenizer.from_pretrained(model)
-prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-pipeline = transformers.pipeline(
-    "text-generation",
-    model=model,
-    torch_dtype=torch.float16,
-    device_map="auto",
-)
-outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
-print(outputs[0]["generated_text"])
-```

 ---
+license: apache-2.0
 base_model:
 - Qwen/Qwen1.5-7B-Chat
 - deepseek-ai/deepseek-coder-6.7b-instruct
 tags:
 - merge
 - mergekit
+- qwen
+- deepseek
+- coder
+- slerp
 ---
+# Qwen15-DeepSeek-Coder-Merge
+This is a merge of pre-trained language models created using MergeKit, combining the foundational capabilities of Qwen 1.5 with DeepSeek Coder's programming expertise through an efficient SLERP fusion.
+## About Me
+I'm David Soeiro-Vuong, a third-year Computer Science student working as an apprentice at TW3 Partners, a company specialized in Generative AI. Passionate about artificial intelligence and language models optimization, I focus on creating efficient model merges that balance performance and capabilities.
+🔗 [Connect with me on LinkedIn](https://www.linkedin.com/in/david-soeiro-vuong-a28b582ba/)
+## Merge Details
+### Merge Method
+This model uses SLERP (Spherical Linear Interpolation) with carefully tuned parameters to achieve optimal performance balance:
+- **Weighted Blend**: t=0.6 provides a slightly stronger influence from the DeepSeek Coder model
+- **Complete Layer Merging**: Full layer-range coverage ensures comprehensive knowledge transfer
+- **Format**: bfloat16 precision for efficient memory usage
+### Models Merged
+* [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat) - Alibaba's Qwen 1.5 chat model known for its strong conversational capabilities and instruction following
+* [deepseek-ai/deepseek-coder-6.7b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) - DeepSeek's specialized coding model with excellent programming language understanding and code generation abilities
+### Configuration
 ```yaml
 slices:
   - sources:
 dtype: bfloat16
 ```
+## Model Capabilities
+This merge combines:
+- Qwen 1.5's strong instruction following and general knowledge capabilities
+- DeepSeek Coder's specialized programming expertise and code generation abilities
+- Enhanced technical understanding and explanation capabilities
+- Fully open architecture with no usage restrictions
+The resulting model provides enhanced performance on tasks requiring both conversational fluency and programming expertise, such as:
+- Code generation across multiple programming languages
+- Technical documentation and explanations
+- Algorithm implementation and problem-solving
+- Software development assistance with natural language understanding
+- Debugging and code optimization suggestions
+## Limitations
+- Inherits limitations from both base models
+- May exhibit inconsistent behavior for certain advanced programming tasks
+- No additional alignment or fine-tuning beyond the base models' training
+- Model was created through parameter merging without additional training data
+- Slight model size mismatch (7B vs 6.7B) may introduce some parameter interpolation artifacts
+## License
+This model is released under the Apache 2.0 license, consistent with the underlying models' licenses.