| | --- |
| | language: |
| | - code |
| | tags: |
| | - code |
| | - git |
| | - merge-conflict |
| | - codet5 |
| | - code-generation |
| | - conflict-resolution |
| | license: apache-2.0 |
| | datasets: |
| | - custom |
| | metrics: |
| | - exact_match |
| | - bleu |
| | library_name: transformers |
| | pipeline_tag: text2text-generation |
| | widget: |
| | - text: | |
| | Resolve the following merge conflict in python. |
| | |
| | BASE VERSION: |
| | def add(x, y): |
| | return x + y |
| | |
| | OURS VERSION: |
| | def add(a, b): |
| | return a + b |
| | |
| | THEIRS VERSION: |
| | def add(x, y): |
| | result = x + y |
| | return result |
| | example_title: "Simple Function Merge" |
| | - text: | |
| | Resolve the following merge conflict in javascript. |
| | |
| | BASE VERSION: |
| | function multiply(x, y) { |
| | return x * y; |
| | } |
| | |
| | OURS VERSION: |
| | function multiply(a, b) { |
| | // Calculate product |
| | return a * b; |
| | } |
| | |
| | THEIRS VERSION: |
| | function multiply(x, y) { |
| | const result = x * y; |
| | return result; |
| | } |
| | example_title: "JavaScript Function with Comments" |
| | --- |
| | |
| | # AutoMerge AI - CodeT5 Merge Conflict Resolver |
| |
|
| |  |
| |  |
| |  |
| |
|
| | ## Model Description |
| |
|
| | **AutoMerge AI** is a fine-tuned CodeT5-small model designed to automatically resolve Git merge conflicts. It takes three versions of code (base, ours, theirs) and generates an intelligently merged resolution. |
| |
|
| | ### Key Features |
| |
|
| | - 🔄 **Three-way merge resolution** - Uses base, ours, and theirs versions for context-aware merging |
| | - 💻 **Multi-language support** - Trained on Python, JavaScript, Java, C++, and more |
| | - 🎯 **High accuracy** - Trained on 21,219 real-world merge conflict scenarios |
| | - ⚡ **Fast inference** - Based on CodeT5-small (60.5M parameters) for quick resolutions |
| | - 🛠️ **Production-ready** - Successfully resolves variable naming, structural, and semantic conflicts |
| |
|
| | ## Model Details |
| |
|
| | - **Base Model**: [Salesforce/codet5-small](https://huggingface.co/Salesforce/codet5-small) |
| | - **Model Size**: 60.5M parameters |
| | - **Training Data**: 21,219 three-way merge conflict samples |
| | - **Task**: Text-to-text generation (conflict resolution) |
| | - **Languages**: Python, JavaScript and TypeScript |
| |
|
| | ## Quick Start |
| |
|
| | ### Installation |
| |
|
| | ```bash |
| | pip install transformers torch |
| | ``` |
| |
|
| | ### Basic Usage |
| |
|
| | ```python |
| | from transformers import T5ForConditionalGeneration, RobertaTokenizer |
| | |
| | # Load model and tokenizer |
| | model = T5ForConditionalGeneration.from_pretrained("ankit-ml11/automerge-codet5") |
| | tokenizer = RobertaTokenizer.from_pretrained("ankit-ml11/automerge-codet5") |
| | |
| | # Prepare input |
| | base = "def add(x, y):\n return x + y" |
| | ours = "def add(a, b):\n return a + b" |
| | theirs = "def add(x, y):\n result = x + y\n return result" |
| | |
| | input_text = f"""Resolve the following merge conflict in python. |
| | |
| | BASE VERSION: |
| | {base} |
| | |
| | OURS VERSION: |
| | {ours} |
| | |
| | THEIRS VERSION: |
| | {theirs} |
| | """ |
| | |
| | # Generate resolution |
| | inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True) |
| | outputs = model.generate(**inputs, max_length=512, num_beams=5, early_stopping=True) |
| | resolved = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| | |
| | print(resolved) |
| | # Output: def add(a, b):\n return a + b |
| | ``` |
| |
|
| | ## Input Format |
| |
|
| | The model expects input in this **exact format**: |
| |
|
| | ``` |
| | Resolve the following merge conflict in {language}. |
| | |
| | BASE VERSION: |
| | {base_code} |
| | |
| | OURS VERSION: |
| | {ours_code} |
| | |
| | THEIRS VERSION: |
| | {theirs_code} |
| | ``` |
| |
|
| | Where: |
| | - `{language}` - Programming language (e.g., python, javascript, java) |
| | - `{base_code}` - Code from the common ancestor commit |
| | - `{ours_code}` - Code from your branch (HEAD) |
| | - `{theirs_code}` - Code from the branch being merged |
| |
|
| | ## Advanced Usage |
| |
|
| | ### Complete Python Class |
| |
|
| | ```python |
| | from transformers import T5ForConditionalGeneration, RobertaTokenizer |
| | import torch |
| | |
| | class AutoMergeResolver: |
| | def __init__(self, model_name="ankit-ml11/automerge-codet5"): |
| | self.model = T5ForConditionalGeneration.from_pretrained(model_name) |
| | self.tokenizer = RobertaTokenizer.from_pretrained(model_name) |
| | self.model.eval() |
| | |
| | def resolve_conflict(self, base, ours, theirs, language="python"): |
| | """ |
| | Resolve a three-way merge conflict. |
| | |
| | Args: |
| | base: Code from common ancestor |
| | ours: Code from your branch |
| | theirs: Code from other branch |
| | language: Programming language |
| | |
| | Returns: |
| | Resolved code as string |
| | """ |
| | input_text = f"""Resolve the following merge conflict in {language}. |
| | |
| | BASE VERSION: |
| | {base} |
| | |
| | OURS VERSION: |
| | {ours} |
| | |
| | THEIRS VERSION: |
| | {theirs} |
| | """ |
| | |
| | inputs = self.tokenizer( |
| | input_text, |
| | return_tensors="pt", |
| | max_length=512, |
| | truncation=True, |
| | padding=True |
| | ) |
| | |
| | with torch.no_grad(): |
| | outputs = self.model.generate( |
| | **inputs, |
| | max_length=512, |
| | num_beams=5, |
| | early_stopping=True, |
| | no_repeat_ngram_size=3 |
| | ) |
| | |
| | return self.tokenizer.decode(outputs[0], skip_special_tokens=True) |
| | |
| | # Usage |
| | resolver = AutoMergeResolver() |
| | resolved = resolver.resolve_conflict( |
| | base="def calculate(x, y): return x + y", |
| | ours="def calculate(a, b): return a + b", |
| | theirs="def calculate(x, y): result = x + y; return result" |
| | ) |
| | print(resolved) |
| | ``` |
| |
|
| | ### Parsing Git Conflict Markers |
| |
|
| | ```python |
| | def parse_git_conflict(conflict_text): |
| | """Parse standard Git conflict markers""" |
| | lines = conflict_text.split('\n') |
| | ours, base, theirs = [], [], [] |
| | section = None |
| | |
| | for line in lines: |
| | if line.startswith('<<<<<<<'): |
| | section = 'ours' |
| | elif line.startswith('|||||||'): |
| | section = 'base' |
| | elif line.startswith('======='): |
| | section = 'theirs' |
| | elif line.startswith('>>>>>>>'): |
| | section = None |
| | elif section == 'ours': |
| | ours.append(line) |
| | elif section == 'base': |
| | base.append(line) |
| | elif section == 'theirs': |
| | theirs.append(line) |
| | |
| | return { |
| | 'base': '\n'.join(base) or '\n'.join(ours), # Fallback to ours if no base |
| | 'ours': '\n'.join(ours), |
| | 'theirs': '\n'.join(theirs) |
| | } |
| | |
| | # Example usage |
| | git_conflict = """<<<<<<< HEAD |
| | def multiply(a, b): |
| | return a * b |
| | ||||||| merged common ancestors |
| | def multiply(x, y): |
| | return x * y |
| | ======= |
| | def multiply(x, y): |
| | product = x * y |
| | return product |
| | >>>>>>> feature-branch""" |
| | |
| | parsed = parse_git_conflict(git_conflict) |
| | resolved = resolver.resolve_conflict(parsed['base'], parsed['ours'], parsed['theirs']) |
| | ``` |
| |
|
| | ### GPU Acceleration |
| |
|
| | ```python |
| | import torch |
| | |
| | # Initialize with GPU support |
| | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
| | model = T5ForConditionalGeneration.from_pretrained("YOUR_USERNAME/automerge-codet5") |
| | model.to(device) |
| | |
| | # Move inputs to GPU |
| | inputs = tokenizer(input_text, return_tensors="pt").to(device) |
| | outputs = model.generate(**inputs, max_length=512) |
| | ``` |
| |
|
| | ## Use Cases |
| |
|
| | ### 1. Automated Merge Conflict Resolution |
| |
|
| | Integrate into CI/CD pipelines to automatically resolve simple conflicts: |
| |
|
| | ```python |
| | # In your CI/CD script |
| | resolver = AutoMergeResolver() |
| | |
| | for conflict_file in get_conflict_files(): |
| | with open(conflict_file, 'r') as f: |
| | conflict = f.read() |
| | |
| | parsed = parse_git_conflict(conflict) |
| | resolved = resolver.resolve_conflict(**parsed) |
| | |
| | with open(conflict_file, 'w') as f: |
| | f.write(resolved) |
| | ``` |
| |
|
| | ### 2. IDE Integration |
| |
|
| | Create plugins for VS Code, IntelliJ, or other IDEs: |
| |
|
| | ```python |
| | # VS Code extension example |
| | def resolve_conflict_in_editor(conflict_text): |
| | resolver = AutoMergeResolver() |
| | parsed = parse_git_conflict(conflict_text) |
| | return resolver.resolve_conflict(**parsed) |
| | ``` |
| |
|
| | ### 3. Git Merge Driver |
| |
|
| | Configure as a custom Git merge driver: |
| |
|
| | ```bash |
| | # .git/config |
| | [merge "automerge"] |
| | name = AutoMerge AI conflict resolver |
| | driver = python resolve.py %A %O %B %L |
| | ``` |
| |
|
| | ### 4. Code Review Assistant |
| |
|
| | Suggest resolutions during code review: |
| |
|
| | ```python |
| | # Suggest multiple resolutions |
| | def suggest_resolutions(base, ours, theirs, num_suggestions=3): |
| | outputs = model.generate( |
| | **inputs, |
| | max_length=512, |
| | num_beams=10, |
| | num_return_sequences=num_suggestions, |
| | early_stopping=True |
| | ) |
| | |
| | return [tokenizer.decode(out, skip_special_tokens=True) for out in outputs] |
| | ``` |
| |
|
| | ## Model Performance |
| |
|
| | The model has been trained on diverse merge conflict scenarios: |
| |
|
| | | Conflict Type | Examples | Model Behavior | |
| | |--------------|----------|----------------| |
| | | Variable Renaming | `x,y` → `a,b` | Preserves semantic meaning | |
| | | Comment Addition | Added docs | Retains documentation | |
| | | Code Restructuring | Inline → Multi-line | Chooses cleaner structure | |
| | | Logic Changes | Different algorithms | Context-aware selection | |
| |
|
| | ### Example Resolutions |
| |
|
| | **Example 1: Variable Naming** |
| |
|
| | ``` |
| | BASE: def add(x, y): return x + y |
| | OURS: def add(a, b): return a + b |
| | THEIRS: def add(x, y): return x + y |
| | |
| | RESOLVED: def add(a, b): return a + b |
| | ``` |
| |
|
| | **Example 2: Documentation** |
| |
|
| | ``` |
| | BASE: def multiply(x, y): return x * y |
| | OURS: def multiply(a, b): |
| | # Calculate product |
| | return a * b |
| | THEIRS: def multiply(x, y): |
| | result = x * y |
| | return result |
| | |
| | RESOLVED: def multiply(a, b): |
| | # Calculate product |
| | return a * b |
| | ``` |
| |
|
| | ## Limitations |
| |
|
| | 1. **Context Length**: Maximum input length is ~512 tokens |
| | 2. **Complex Logic**: May struggle with very complex semantic conflicts |
| | 3. **Testing Required**: Always review and test generated resolutions |
| | 4. **Language Coverage**: Best performance on Python, JavaScript, Java (most common in training data) |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | - **Size**: 21,219 three-way merge conflict samples |
| | - **Source**: Real-world Git repositories |
| | - **Preprocessing**: |
| | - Filtered conflicts with resolution length > 50 characters |
| | - Removed conflicts where ours == theirs |
| | - Limited code length to 10,000 characters |
| | - Balanced across multiple programming languages |
| |
|
| | ### Training Hyperparameters |
| |
|
| | ```python |
| | - Base Model: Salesforce/codet5-small |
| | - Max Input Length: 512 tokens |
| | - Max Output Length: 512 tokens |
| | - Batch Size: 8 |
| | - Learning Rate: 5e-5 |
| | - Optimizer: AdamW |
| | - Epochs: 3-5 |
| | - Beam Search: 5 beams during inference |
| | ``` |
| |
|
| | ### Evaluation Metrics |
| |
|
| | The model is evaluated on: |
| | - Exact match accuracy |
| | - BLEU score |
| | - Human evaluation of semantic correctness |
| |
|
| | ## Citation |
| |
|
| | If you use this model in your research, please cite: |
| |
|
| | ```bibtex |
| | @misc{automerge-codet5, |
| | author = {Ankit Adhikari, Aeron Panta, Bikrant Pudasaini, Bishwash Chaudhari}, |
| | title = {AutoMerge AI: Automated Git Merge Conflict Resolution with CodeT5}, |
| | year = {2026}, |
| | publisher = {HuggingFace}, |
| | url = {https://huggingface.co/ankit-ml11/automerge-codet5} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | This model is released under the Apache 2.0 License, same as the base CodeT5 model. |
| |
|
| | ## Acknowledgments |
| |
|
| | - Built on [Salesforce/codet5-small](https://huggingface.co/Salesforce/codet5-small) |
| | - Inspired by research in automated program repair and code generation |
| | - Thanks to the open-source community for Git conflict datasets |
| |
|
| | ## Model Card Authors |
| |
|
| | [Ankit Adhikari/IOE Purwanchal Campus] |
| |
|
| | ## Contact |
| |
|
| | - **Issues**: Please report issues on [GitHub](https://github.com/YOUR_USERNAME/automerge-ai) |
| | - **Email**: ankitadankit@gmail.com |
| | - **HuggingFace**: [ankit-ml11](https://huggingface.co/ankit-ml11) |
| |
|
| | ## Additional Resources |
| |
|
| | - [Model Demo](https://huggingface.co/spaces/ankit-ml11/automerge-demo) |
| | - [GitHub Repository](https://github.com/YOUR_USERNAME/automerge-ai) |
| |
|
| | --- |
| |
|
| | **Note**: This model is a tool to assist with merge conflict resolution. Always review and test the generated code before committing to production. The model may not handle all edge cases perfectly, and human oversight is recommended for critical code changes. |
| |
|