| | --- |
| | license: apache-2.0 |
| | --- |
| | |
| | Base Model: Just Merged ~ No Training Gates After Merge |
| |
|
| | ### Model Overview |
| |
|
| | I have developed a Mixture of Experts (MoE) architecture with two always-active experts designed to work together for Python instruction tuning. Each expert possesses a distinct skill: |
| |
|
| | - **Expert 1**: Specializes in generating Mermaid diagrams, primarily from Python code, which requires a deep understanding of code structures and logic. |
| | - **Expert 2**: Focuses on strict context obedience, ensuring that the model only generates outputs based on the provided instructions. |
| |
|
| | ### Why Always-Active MoE is Optimal |
| |
|
| | In this model, both experts are always active for each token, allowing them to complement each other: |
| |
|
| | - **Expert 1’s competence in Python structures** enhances the model's ability to generate correct and structured Python code. |
| | - **Expert 2’s context obedience** ensures that the output remains aligned with the user’s instructions, preventing unnecessary or irrelevant outputs, such as Mermaid diagrams, unless explicitly requested. |
| |
|
| | This setup allows me to efficiently train the model for Python instruction following. By leveraging both experts simultaneously, I ensure that the model generates syntactically correct Python code while strictly adhering to user prompts. |
| |
|
| |
|