# 🔍 Obfuscated Variable Renaming with aixcoder

This repository hosts a **aixcoder–based model** fine-tuned to **rename obfuscated variables in source code**, improving readability while preserving program semantics.

The model is designed for use cases such as **malware analysis, reverse engineering, digital forensics, and general program comprehension**.

---

## 🚀 Task Overview

**Task:** Code Deobfuscation / Variable Renaming  
**Base Model:** aixcoder 
**Input:** Source code with obfuscated variable names  
**Output:** Semantically equivalent source code with readable variable names  

### Example

**Input**
```javascript
function _0x12af(a, b) {
  let _0x9c3e = a * b;
  return _0x9c3e + 10;
}
```

**Output**
```javascript
function multiplyAndAdd(a, b) {
  let product = a * b;
  return product + 10;
}
```

---

## 🧠 Model Description

- **Architecture:** aixcoder (Transformer-based)
- **Fine-tuning Objective:** Context-aware variable renaming
- **Approach:** AST-guided identifier alignment + sequence generation
- **Languages:** JavaScript (primary), extendable to others

The model learns to infer meaningful variable names from **usage context**, not from superficial patterns.

---

## 🏗 Training Details

### Dataset
- Paired samples of:
  - Obfuscated code
  - Original / readable code
- Variable mappings extracted using **AST-based analysis**
- Realistic obfuscation patterns (minifiers, packers, name mangling)

### Training Objectives
- Identifier-aware sequence-to-sequence learning
- Contextual name prediction
- Syntax preservation

---

## 📦 Installation

```bash
pip install transformers torch accelerate
```

---

## ▶️ Usage

### Inference Example

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Neo111x/aixcoder-renaming"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True
)

code = '''
function _0x12af(a, b) {
  let _0x9c3e = a * b;
  return _0x9c3e + 10;
}
'''

inputs = tokenizer(code, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    do_sample=False
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## 🧪 Evaluation

- Identifier exact-match accuracy
- AST equivalence checks
- Manual readability assessment

---

## ⚠️ Limitations

- Generated names are **semantic approximations**, not original identifiers
- Performance degrades on:
  - Extremely short contexts
  - Heavy control-flow flattening
- Single-file scope only

---

## 🔐 Ethical Considerations

This model is intended for:
- Malware and binary analysis
- Digital forensics and incident response (DFIR)
- Code maintenance and auditing

It should **not** be used to violate software licenses or intellectual property rights.

---

## 🧩 Future Work

- Multi-language support (C/C++, Python)
- Function and class renaming
- Control-flow–aware modeling
- Integration with decompilers and IR tools

---

## 📜 License

Specify the license here (e.g., Apache-2.0, MIT).

---

## 📖 Citation

```bibtex
@misc{aixcoder_code_variable_renamer,
  title={Context-Aware Variable Renaming for Obfuscated Code using aixcoder},
  author={Your Name},
  year={2026},
  url={https://huggingface.co/Neo111x/aixcoder-renaming}
}
```