YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

AutoMerge AI - CodeT5 Merge Conflict Resolver

Model Description

AutoMerge AI is a fine-tuned CodeT5-small model designed to automatically resolve Git merge conflicts. It takes three versions of code (base, ours, theirs) and generates an intelligently merged resolution.

Key Features

🔄 Three-way merge resolution - Uses base, ours, and theirs versions for context-aware merging
💻 Multi-language support - Trained on Python, JavaScript, Java, C++, and more
🎯 High accuracy - Trained on 21,219 real-world merge conflict scenarios
⚡ Fast inference - Based on CodeT5-small (60.5M parameters) for quick resolutions
🛠️ Production-ready - Successfully resolves variable naming, structural, and semantic conflicts

Model Details

Base Model: Salesforce/codet5-small
Model Size: 60.5M parameters
Training Data: 21,219 three-way merge conflict samples
Task: Text-to-text generation (conflict resolution)
Languages: Python, JavaScript and TypeScript

Quick Start

Installation

pip install transformers torch

Basic Usage

from transformers import T5ForConditionalGeneration, RobertaTokenizer

# Load model and tokenizer
model = T5ForConditionalGeneration.from_pretrained("ankit-ml11/automerge-codet5")
tokenizer = RobertaTokenizer.from_pretrained("ankit-ml11/automerge-codet5")

# Prepare input
base = "def add(x, y):\n    return x + y"
ours = "def add(a, b):\n    return a + b"
theirs = "def add(x, y):\n    result = x + y\n    return result"

input_text = f"""Resolve the following merge conflict in python.

BASE VERSION:
{base}

OURS VERSION:
{ours}

THEIRS VERSION:
{theirs}
"""

# Generate resolution
inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=512, num_beams=5, early_stopping=True)
resolved = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(resolved)
# Output: def add(a, b):\n    return a + b

Input Format

The model expects input in this exact format:

Resolve the following merge conflict in {language}.

BASE VERSION:
{base_code}

OURS VERSION:
{ours_code}

THEIRS VERSION:
{theirs_code}

Where:

{language} - Programming language (e.g., python, javascript, java)
{base_code} - Code from the common ancestor commit
{ours_code} - Code from your branch (HEAD)
{theirs_code} - Code from the branch being merged

Advanced Usage

Complete Python Class

from transformers import T5ForConditionalGeneration, RobertaTokenizer
import torch

class AutoMergeResolver:
    def __init__(self, model_name="ankit-ml11/automerge-codet5"):
        self.model = T5ForConditionalGeneration.from_pretrained(model_name)
        self.tokenizer = RobertaTokenizer.from_pretrained(model_name)
        self.model.eval()
    
    def resolve_conflict(self, base, ours, theirs, language="python"):
        """
        Resolve a three-way merge conflict.
        
        Args:
            base: Code from common ancestor
            ours: Code from your branch
            theirs: Code from other branch
            language: Programming language
            
        Returns:
            Resolved code as string
        """
        input_text = f"""Resolve the following merge conflict in {language}.

BASE VERSION:
{base}

OURS VERSION:
{ours}

THEIRS VERSION:
{theirs}
"""
        
        inputs = self.tokenizer(
            input_text,
            return_tensors="pt",
            max_length=512,
            truncation=True,
            padding=True
        )
        
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_length=512,
                num_beams=5,
                early_stopping=True,
                no_repeat_ngram_size=3
            )
        
        return self.tokenizer.decode(outputs[0], skip_special_tokens=True)

# Usage
resolver = AutoMergeResolver()
resolved = resolver.resolve_conflict(
    base="def calculate(x, y): return x + y",
    ours="def calculate(a, b): return a + b",
    theirs="def calculate(x, y): result = x + y; return result"
)
print(resolved)

Parsing Git Conflict Markers

def parse_git_conflict(conflict_text):
    """Parse standard Git conflict markers"""
    lines = conflict_text.split('\n')
    ours, base, theirs = [], [], []
    section = None
    
    for line in lines:
        if line.startswith('<<<<<<<'):
            section = 'ours'
        elif line.startswith('|||||||'):
            section = 'base'
        elif line.startswith('======='):
            section = 'theirs'
        elif line.startswith('>>>>>>>'):
            section = None
        elif section == 'ours':
            ours.append(line)
        elif section == 'base':
            base.append(line)
        elif section == 'theirs':
            theirs.append(line)
    
    return {
        'base': '\n'.join(base) or '\n'.join(ours),  # Fallback to ours if no base
        'ours': '\n'.join(ours),
        'theirs': '\n'.join(theirs)
    }

# Example usage
git_conflict = """<<<<<<< HEAD
def multiply(a, b):
    return a * b
||||||| merged common ancestors
def multiply(x, y):
    return x * y
=======
def multiply(x, y):
    product = x * y
    return product
>>>>>>> feature-branch"""

parsed = parse_git_conflict(git_conflict)
resolved = resolver.resolve_conflict(parsed['base'], parsed['ours'], parsed['theirs'])

GPU Acceleration

import torch

# Initialize with GPU support
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = T5ForConditionalGeneration.from_pretrained("YOUR_USERNAME/automerge-codet5")
model.to(device)

# Move inputs to GPU
inputs = tokenizer(input_text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_length=512)

Use Cases

1. Automated Merge Conflict Resolution

Integrate into CI/CD pipelines to automatically resolve simple conflicts:

# In your CI/CD script
resolver = AutoMergeResolver()

for conflict_file in get_conflict_files():
    with open(conflict_file, 'r') as f:
        conflict = f.read()
    
    parsed = parse_git_conflict(conflict)
    resolved = resolver.resolve_conflict(**parsed)
    
    with open(conflict_file, 'w') as f:
        f.write(resolved)

2. IDE Integration

Create plugins for VS Code, IntelliJ, or other IDEs:

# VS Code extension example
def resolve_conflict_in_editor(conflict_text):
    resolver = AutoMergeResolver()
    parsed = parse_git_conflict(conflict_text)
    return resolver.resolve_conflict(**parsed)

3. Git Merge Driver

Configure as a custom Git merge driver:

# .git/config
[merge "automerge"]
    name = AutoMerge AI conflict resolver
    driver = python resolve.py %A %O %B %L

4. Code Review Assistant

Suggest resolutions during code review:

# Suggest multiple resolutions
def suggest_resolutions(base, ours, theirs, num_suggestions=3):
    outputs = model.generate(
        **inputs,
        max_length=512,
        num_beams=10,
        num_return_sequences=num_suggestions,
        early_stopping=True
    )
    
    return [tokenizer.decode(out, skip_special_tokens=True) for out in outputs]

Model Performance

The model has been trained on diverse merge conflict scenarios:

Conflict Type	Examples	Model Behavior
Variable Renaming	`x,y` → `a,b`	Preserves semantic meaning
Comment Addition	Added docs	Retains documentation
Code Restructuring	Inline → Multi-line	Chooses cleaner structure
Logic Changes	Different algorithms	Context-aware selection

Example Resolutions

Example 1: Variable Naming

BASE:  def add(x, y): return x + y
OURS:  def add(a, b): return a + b
THEIRS: def add(x, y): return x + y

RESOLVED: def add(a, b): return a + b

Example 2: Documentation

BASE:  def multiply(x, y): return x * y
OURS:  def multiply(a, b):
           # Calculate product
           return a * b
THEIRS: def multiply(x, y):
           result = x * y
           return result

RESOLVED: def multiply(a, b):
              # Calculate product
              return a * b

Limitations

Context Length: Maximum input length is ~512 tokens
Complex Logic: May struggle with very complex semantic conflicts
Testing Required: Always review and test generated resolutions
Language Coverage: Best performance on Python, JavaScript, Java (most common in training data)

Training Details

Training Data

Size: 21,219 three-way merge conflict samples
Source: Real-world Git repositories
Preprocessing:
- Filtered conflicts with resolution length > 50 characters
- Removed conflicts where ours == theirs
- Limited code length to 10,000 characters
- Balanced across multiple programming languages

Training Hyperparameters

- Base Model: Salesforce/codet5-small
- Max Input Length: 512 tokens
- Max Output Length: 512 tokens
- Batch Size: 8
- Learning Rate: 5e-5
- Optimizer: AdamW
- Epochs: 3-5
- Beam Search: 5 beams during inference

Evaluation Metrics

The model is evaluated on:

Exact match accuracy
BLEU score
Human evaluation of semantic correctness

Citation

If you use this model in your research, please cite:

@misc{automerge-codet5,
  author = {Ankit Adhikari, Aeron Panta, Bikrant Pudasaini, Bishwash Chaudhari},
  title = {AutoMerge AI: Automated Git Merge Conflict Resolution with CodeT5},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ankit-ml11/automerge-codet5}
}

License

This model is released under the Apache 2.0 License, same as the base CodeT5 model.

Acknowledgments

Built on Salesforce/codet5-small
Inspired by research in automated program repair and code generation
Thanks to the open-source community for Git conflict datasets

Model Card Authors

[Ankit Adhikari/IOE Purwanchal Campus]

Contact

Issues: Please report issues on GitHub
Email: ankitadankit@gmail.com
HuggingFace: ankit-ml11

Additional Resources

Note: This model is a tool to assist with merge conflict resolution. Always review and test the generated code before committing to production. The model may not handle all edge cases perfectly, and human oversight is recommended for critical code changes.

Downloads last month: 26

Safetensors

Model size

60.5M params

Tensor type

F32