Instructions to use aiXcoder/aiXapply-4B-RL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use aiXcoder/aiXapply-4B-RL with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="aiXcoder/aiXapply-4B-RL")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("aiXcoder/aiXapply-4B-RL")
model = AutoModelForCausalLM.from_pretrained("aiXcoder/aiXapply-4B-RL")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use aiXcoder/aiXapply-4B-RL with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "aiXcoder/aiXapply-4B-RL"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aiXcoder/aiXapply-4B-RL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/aiXcoder/aiXapply-4B-RL

SGLang

How to use aiXcoder/aiXapply-4B-RL with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "aiXcoder/aiXapply-4B-RL" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aiXcoder/aiXapply-4B-RL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "aiXcoder/aiXapply-4B-RL" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aiXcoder/aiXapply-4B-RL",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use aiXcoder/aiXapply-4B-RL with Docker Model Runner:
```
docker model run hf.co/aiXcoder/aiXapply-4B-RL
```

aiXapply-4B-RL

File size: 19,873 Bytes

---
license: apache-2.0
base_model: Qwen/Qwen3-4B
base_model_relation: finetune
library_name: transformers
pipeline_tag: text-generation
language:
  - code
tags:
  - qwen3
  - code
  - code-generation
  - full-file-apply
  - apply-model
  - openai-compatible
  - ide
datasets:
  - aiXcoder/aiXapply_test_data
metrics:
  - accuracy
model-index:
  - name: aiXapply-4B-RL
    results:
      - task:
          type: text-generation
          name: Full-File Apply
        dataset:
          type: aiXcoder/aiXapply_test_data
          name: aiXapply main benchmark
          split: main_test_data
        metrics:
          - type: accuracy
            name: Average equivalence accuracy
            value: 0.938
---

# aiXapply-4B-RL

<p align="center">
  <a href="#overview">Overview</a> |
  <a href="#resources">Resources</a> |
  <a href="#quick-start">Quick Start</a> |
  <a href="#continue-integration">Continue Integration</a> |
  <a href="#dataset">Dataset</a> |
  <a href="#training">Training</a> |
  <a href="#evaluation">Evaluation</a> |
  <a href="#results">Results</a> |
  <a href="#citation">Citation</a>
</p>

<p align="center">
  <a href="https://github.com/aixcoder-plugin/aiXapply-4B/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Apache--2.0-blue.svg" alt="Apache-2.0 license"></a>
  <img src="https://img.shields.io/badge/GitHub-aiXapply--4B-black.svg" alt="GitHub repository">
  <img src="https://img.shields.io/badge/HuggingFace-Test%20Data-yellow.svg" alt="Hugging Face test dataset">
  <img src="https://img.shields.io/badge/Task-Full--File%20Apply-green.svg" alt="Full-file Apply task">
  <img src="https://img.shields.io/badge/Model-4B-orange.svg" alt="4B model">
  <img src="https://img.shields.io/badge/Endpoint-OpenAI--Compatible-lightgrey.svg" alt="OpenAI-compatible endpoint">
</p>

**aiXapply-4B-RL** is the reinforcement-learning / GRPO post-trained aiXapply model for **Full-File Apply**. Given an original file and a localized update snippet, it generates the complete updated file while preserving everything outside the requested edit.

This RL model is optimized for task-level correctness, locality, and robustness under alternative edit representations. Use it when you want the RL-aligned variant reported in the latency/accuracy frontier and cross-format experiments. For the highest in-distribution full-file Apply accuracy and long-context fidelity, also see [`aiXcoder/aiXapply-4B-SFT`](https://huggingface.co/aiXcoder/aiXapply-4B-SFT).

This model is part of the official artifact release for paper:

> **AiXapply: Fast and Reliable Full-File Code Integration with Specialized Small Models for IDE Workflows**

## Overview

Modern coding assistants often produce a local edit snippet first. The hard downstream step is applying that snippet to the original file without changing unrelated code. Unified diffs are compact but brittle, and search-and-replace is easy to generate but depends on exact string matching. aiXapply treats this downstream step as a standalone code-integration task.

In an IDE workflow, an upstream coding assistant proposes an update snippet, aiXapply expands it into a complete updated file, and the IDE presents the resulting diff for review. See the [code repository](https://github.com/aixcoder-plugin/aiXapply-4B) for figures, scripts, and full experiment details.

The repository includes:

| Component | Path |
| --- | --- |
| OpenAI-compatible inference scripts | `experiments/aiXapply/` |
| Experiment entrypoints for full-file Apply, unified diff, and search-and-replace | `experiments/` |
| Shared evaluation and six-class error taxonomy | `experiments/evaluation/` |
| Multi-language data construction pipeline | `data_generation/` |
| SFT and RL training scripts | `training/sft/`, `training/rl/` |
| Continue IDE integration adapter | `continue_config/` |

### Highlights

- **High accuracy**: aiXapply-SFT reaches **94.4%** average equivalence accuracy on the 1,637-sample main benchmark, close to Qwen3.5-397B-A17B (94.8%) and above DeepSeek-V3.2 (91.6%).
- **Fast full-file generation**: with n-gram speculative decoding, aiXapply reaches **1.06s** average latency and **2692 tokens/s** on a single A100 40GB GPU.
- **Deployment-ready apply backend**: the model can be served behind an OpenAI-compatible endpoint and used as a dedicated `apply` model in Continue.
- **Reproducible pipeline**: data generation, training, inference, scoring, and error classification scripts are included.

## Resources

This release is split into one GitHub repository and three Hugging Face artifacts:

| Artifact        | Release target                                                                      | Description                                                                                                                                                                                   |
| --------------- | ----------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Code repository | [GitHub](https://github.com/aixcoder-plugin/aiXapply-4B)                            | Open-source project repository containing inference scripts, data construction code, training recipes, evaluation tools, Continue integration, and documentation.                             |
| Test dataset    | [Hugging Face Dataset](https://huggingface.co/datasets/aiXcoder/aiXapply_test_data) | Public evaluation set for Full-File Apply, covering 20 programming languages and file formats. Use this artifact to reproduce benchmark scores without rebuilding the training data pipeline. |
| RL model        | [Hugging Face Model](https://huggingface.co/aiXcoder/aiXapply-4B-RL)                | 4B Apply model post-trained with reinforcement learning / GRPO. It is optimized for task-level correctness, locality, and robustness under alternative edit representations.                  |
| SFT model       | [Hugging Face Model](https://huggingface.co/aiXcoder/aiXapply-4B-SFT)              | 4B Apply model trained with supervised fine-tuning. It provides strong in-distribution accuracy and better long-context structural preservation in our experiments.                           |


## Task Definition

Full-File Apply takes:

```text
<language>{language}</language>
<source_file>{original full file}</source_file>
<update_snippet>{localized update snippet}</update_snippet>
```

and returns:

```text
<update_file>{complete updated file}</update_file>
```

The task has three core requirements:

- **Complete output**: the model must return the full updated file, not a patch or partial fragment.
- **No side effects**: content outside the requested edit region should remain identical to the source file.
- **Placeholder expansion**: markers such as `// ... existing code ...` mean "copy the corresponding original content exactly"; placeholders must not appear in the final output.

If anchors in the update snippet are ambiguous or cannot be located safely, the model should fail conservatively rather than hallucinate an unrelated edit.

## Quick Start

### Install

```bash
git clone --depth 1 --recurse-submodules https://github.com/aixcoder-plugin/aiXapply-4B.git
cd aiXapply-4B

python -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt
```

For model serving, install a `vllm` build compatible with your CUDA and PyTorch environment.

### Serve a Model with vLLM

```bash
export WEIGHT_DIR=/path/to/aiXapply-4B-RL  # or /path/to/aiXapply-4B-SFT
export SERVE_MODEL_NAME=aiXapply-4B-RL

CUDA_VISIBLE_DEVICES=0 vllm serve "$WEIGHT_DIR" \
  --host 0.0.0.0 \
  --port 12003 \
  --served-model-name "$SERVE_MODEL_NAME" \
  --tensor-parallel-size 1 \
  --enable-chunked-prefill \
  --kv-cache-dtype auto \
  --max-num-batched-tokens 4096 \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.95 \
  --speculative-config '{"method":"ngram","num_speculative_tokens":128,"prompt_lookup_max":7}'
```

Use `--max-model-len 262144` only if your serving setup has enough memory for the full long-context configuration.

### Call the Endpoint

```python
from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:12003/v1", api_key="local")

system_prompt = """You are a deterministic Code Patching Engine. Your task is to synthesize a "Updated File" by applying a partial "Update Snippet" to the provided "Source File".

### Algorithm
1. **Context Matching**: Analyze the `Update Snippet` to identify the context anchors (the lines of code surrounding the changes). Locate the exact corresponding block in the `Source File`. The match must be unique.
2. **Code Merging**: Replace the matched block in the `Source File` with the logic from the `Update Snippet`.
3. **Expansion**: The `Update Snippet` contains omission markers (e.g., `// ... existing code ...`). You MUST replace these markers with the original, unchanged lines from the `Source File`.
4. **Output Generation**: Output the FULL content of the resulting file.

### Constraints
- **NO Laziness**: Never output comments like `// ... rest of code ...` in the final output. You must write out every single line of the final code.
- **Strict Fidelity**: Preserve the original indentation style (spaces/tabs) and comments of the Source File for all unchanged parts.
- **Safety**: If the context in the snippet is ambiguous or cannot be found, output nothing inside the tags.

### Output Format
<update_file>[Your final code here]</update_file>"""

user_prompt = """<language>{language}</language>

<source_file>{source_file}</source_file>

<update_snippet>{update_snippet}</update_snippet>

Please generate the full updated code strictly following the instructions."""


LANGUAGE = "python"
SOURCE_FILE = """def add(a, b):
    return a + b

def main():
    print(add(1, 2))
"""
UPDATE_SNIPPET = """#  ... existing code ...
def main():
    print(add(7, 8))
"""


response = client.chat.completions.create(
    model="aiXapply-4B-RL",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt.format(language=LANGUAGE, source_file=SOURCE_FILE, update_snippet=UPDATE_SNIPPET)},
    ],
    temperature=0,
)

print(response.choices[0].message.content)
```

## Continue Integration

`continue_config/` contains an adapter for using aiXapply as Continue's dedicated Apply backend.

The recommended local workflow is:

```text
Continue -> continue_apply_proxy.py -> OpenAI-compatible aiXapply endpoint
```

Start the proxy:

```bash
cd continue_config
export APPLY_PROXY_UPSTREAM_CHAT_URL="http://127.0.0.1:12003/v1/chat/completions"
export APPLY_PROXY_HOST="127.0.0.1"
export APPLY_PROXY_PORT="14124"
python3 continue_apply_proxy.py
```

Then merge the `apply` model block from `continue_config/continue.config.yaml.example` into your Continue config. The proxy strips `<update_file>...</update_file>` tags before returning the result to Continue and supports streaming responses.

See [`continue_config/README.md`](https://github.com/aixcoder-plugin/aiXapply-4B/blob/main/continue_config/README.md) for configuration details and troubleshooting.

## Dataset

The public test dataset is released separately on Hugging Face. It contains the benchmark examples used to evaluate aiXapply and comparable models. Each example follows the Apply format:

```text
<source_file, update_snippet, update_file>
```

The broader training-data construction pipeline is included in this repository. It synthesizes Apply examples from real-world commits, including CommitPack-style records with `(old_file, new_file, commit_message)`.

![aiXapply dataset construction pipeline](assets/figures/aiXapply-dataset_pipeline.jpg)

*Figure 2: Dataset construction pipeline. Raw CommitPack records are sampled, consistency-verified, solvability-filtered, and split into train/test sets.*

High-level pipeline:

1. **Sampling and filtering**: keep localized same-file edits and balance languages/formats.
2. **Change description generation**: make the intent of each commit explicit.
3. **Snippet synthesis**: produce a localized `update_snippet` and full-file ground truth.
4. **Consistency verification**: ensure every diff is explained by the snippet and no extra change is introduced.
5. **Solvability filtering**: remove ambiguous or non-reproducible samples, then convert to training format.

Dataset scale:

| Split | Samples | Notes |
| --- | ---: | --- |
| Train | 19,347 | Multi-language Apply training examples |
| Test | 1,637 | Public Hugging Face test dataset |

The test set covers C, C++, Dockerfile, Go, HTML, INI, Java, JavaScript, JSON, Makefile, Markdown, Python, reStructuredText, Rust, Shell, SQL, Text, TypeScript, XML, and YAML.

See [`data_generation/README.md`](https://github.com/aixcoder-plugin/aiXapply-4B/blob/main/data_generation/README.md) for scripts, configs, and reconstruction steps.

## Training

aiXapply is trained from a Qwen3-4B backbone with two complementary strategies:

- **SFT**: direct supervised learning from `(source_file, update_snippet)` to `update_file`.
- **RL / GRPO**: task-level optimization with rewards based on equivalence, patch correctness, and side-effect penalties.

The released model artifacts are `aiXapply-4B-SFT` and `aiXapply-4B-RL`. Use the SFT model as the default choice for high full-file Apply accuracy and long-context fidelity; use the RL model when you want the RL-aligned variant used in the latency/accuracy frontier and cross-format experiments.

### SFT

```bash
python -m pip install --extra-index-url https://download.pytorch.org/whl/cu128 -r training/sft/requirements.txt

cd training/sft
WANDB_PROJECT=aiXapply_sft \
WANDB_RUN_NAME=qwen3-4b-sft \
accelerate launch --config_file fsdp_config.yaml run_sft.py \
  --train_dataset_path /path/to/train.parquet \
  --test_dataset_path /path/to/test.parquet \
  --model_name /path/to/Qwen3-4B \
  --output_dir checkpoints/full_finetune
```

Update `training/sft/fsdp_config.yaml` for your machine, especially `num_processes` and context-parallel settings.

### RL / GRPO

The RL setup uses veRL. A typical training environment can be started with:

```bash
docker pull verlai/verl:vllm011.latest

export WORKSPACE=/path/to/workspace
docker create -it --runtime=nvidia --gpus all --net=host --ipc=host \
  --cap-add=SYS_ADMIN \
  -v "$WORKSPACE:$WORKSPACE" \
  --entrypoint /bin/bash \
  --name aixapply_verl \
  verlai/verl:vllm011.latest \
  -c "sleep infinity"

docker start aixapply_verl
docker exec -it aixapply_verl bash
```

Inside the container:

```bash
git submodule update --init --recursive
cd training/rl/verl
pip install -e .
pip install -e .[sglang]
cd ../../..

cd training/rl
MODEL_PATH=/path/to/Qwen3-4B \
TRAIN_FILES=/path/to/train.parquet \
TEST_FILES=/path/to/test.parquet \
bash run_qwen3-4b_sgl_megatron_multi_grpo.sh
```

Training is resource-intensive; the paper experiments use multi-GPU A100-class hardware.

## Evaluation

Run inference:

```bash
python experiments/aiXapply/infer_openai.py \
  --provider local \
  --data-path /path/to/test.parquet
```

The `local` provider in `experiments/aiXapply/infer_openai.py` expects an OpenAI-compatible endpoint at `http://127.0.0.1:12003/v1`. If you serve the model on a different port or with a different served model name, update the local provider config in that script before running evaluation.

Score predictions:

```bash
python experiments/evaluation/run_evaluation.py \
  -i predictions/xxx.jsonl \
  --classify_errors
```

Optional LLM-assisted error classification:

```bash
export OPENAI_BASE_URL="http://your_endpoint/v1"
export OPENAI_MODEL="your_judge_model"

python experiments/evaluation/run_evaluation.py \
  -i predictions/xxx.jsonl \
  --classify_errors \
  --llm
```

The primary metric is **equivalence accuracy**:

- Code files are compared with Pygments token equivalence.
- Structured formats such as JSON, YAML, XML, and INI are parsed or classified as invalid when parsing fails.
- Errors can be grouped into `OUTPUT_INVALID`, `PATCH_NOT_APPLIED`, `PATCH_INCOMPLETE`, `PATCH_INCORRECT`, `WRONG_POSITION`, and `OUT_OF_PATCH_SIDE_EFFECT`.

See [`experiments/README.md`](https://github.com/aixcoder-plugin/aiXapply-4B/blob/main/experiments/README.md) and [`experiments/evaluation/README.md`](https://github.com/aixcoder-plugin/aiXapply-4B/blob/main/experiments/evaluation/README.md) for the full experiment layout.

## Results

aiXapply-RL keeps full-file Apply accuracy while reducing latency to an interactive range in the latency/accuracy frontier experiments.

### Main Benchmark

Average equivalence accuracy on the 1,637-example aiXapply test set:

| Model | Avg Accuracy |
| --- | ---: |
| Qwen3-4B baseline | 0.626 |
| Fast-Apply-7B | 0.620 |
| DeepSeek-V3.2 | 0.916 |
| GLM-5 | 0.921 |
| aiXapply-RL | 0.938 |
| aiXapply-SFT | 0.944 |
| Qwen3.5-397B-A17B | 0.948 |

### Editing Paradigms

Under the same DeepSeek-V3.2 model, full-file Apply improves one-shot accuracy over common edit representations:

| Representation | Accuracy | Avg Latency |
| --- | ---: | ---: |
| Unified diff | 0.560 | 14.22s |
| Search-and-replace | 0.749 | 28.48s |
| Full-file Apply | 0.916 | 108.96s |
| aiXapply-RL full-file Apply | 0.938 | 1.44s |

### Speculative Decoding

| Method | Avg Latency | P95 Latency | Throughput |
| --- | ---: | ---: | ---: |
| No speculation | 28.83s | 90.23s | 102.04 tokens/s |
| Suffix default | 5.75s | 20.74s | 509.54 tokens/s |
| N-gram default | 2.17s | 6.94s | 1343.99 tokens/s |
| N-gram best (`n=7`, `k=128`) | 1.06s | 3.38s | 2692.01 tokens/s |

### Generalization

| Setting | DeepSeek-V3.2 | aiXapply-RL | aiXapply-SFT |
| --- | ---: | ---: | ---: |
| Long context | 0.588 | 0.647 | 0.843 |
| Untrained languages avg. | 0.932 | 0.938 | 0.941 |
| Random placeholders avg. | 0.932 | 0.948 | 0.951 |
| Chunk file avg. | 0.850 | 0.881 | 0.900 |

### Industrial Deployment

In the aiXcoder IDE plugin, aiXapply is deployed as a dedicated Apply service after the upstream model generates an update snippet. In production traces, the Apply stage drops from **50s** average latency to **1.89s**, with P95 latency reduced from **89s** to **3.78s**. The setup also offloads full-file generation from the upstream large model, improving serving capacity and reducing cost.

## Repository Notes

- The current release focuses on single-file Apply. Multi-file edits and interactive multi-step editing are future work.
- aiXapply optimizes deterministic integration, not semantic validation. You should still run tests and review generated diffs before accepting edits.
- Do not commit secrets, checkpoints, datasets, or generated prediction artifacts unless they are intentionally part of a release.

## Contributing

Contributions are welcome. Please read [`CONTRIBUTING.md`](https://github.com/aixcoder-plugin/aiXapply-4B/blob/main/CONTRIBUTING.md) before opening issues or pull requests.

For useful bug reports, include the script or endpoint you ran, the command/configuration, the observed output or traceback, and enough model/provider context to reproduce the problem.

## License

This model is licensed under the Apache License 2.0. See the [code repository LICENSE](https://github.com/aixcoder-plugin/aiXapply-4B/blob/main/LICENSE) for details.

## Citation

If you find aiXapply useful, please cite:

```bibtex
@misc{jiang2026aixapply,
  title = {AiXapply: Fast and Reliable Full-File Code Integration with Specialized Small Models for IDE Workflows},
  author = {Jiang, Siyuan and Cai, Xiang and Wang, Peixu and Han, Yu and Dong, Yihong and Ning, Wei and Guo, Xuyuan and Wen, Jincheng and Zhao, Wei and Li, Ge},
  year = {2026},
  url = {https://github.com/aixcoder-plugin/aiXapply-4B}
}
```