File size: 3,395 Bytes
a256a4d
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
4de9808
a256a4d
 
2809080
a256a4d
2809080
a256a4d
 
 
 
 
 
 
2809080
a256a4d
 
 
 
 
 
 
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
 
 
 
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
 
 
 
 
 
2809080
a256a4d
 
 
 
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
 
 
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
 
2809080
a256a4d
2809080
a256a4d
 
 
 
 
 
2809080
a256a4d
 
 
 
 
 
2809080
a256a4d
 
 
 
 
 
2809080
a256a4d
 
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
 
 
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
 
 
 
 
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
 
 
 
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
 
 
 
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
2809080
a256a4d
9089f56
d881c11
a256a4d
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# 🔍 Obfuscated Variable Renaming with aixcoder

This repository hosts a **aixcoder–based model** fine-tuned to **rename obfuscated variables in source code**, improving readability while preserving program semantics.

The model is designed for use cases such as **malware analysis, reverse engineering, digital forensics, and general program comprehension**.

---

## 🚀 Task Overview

**Task:** Code Deobfuscation / Variable Renaming  
**Base Model:** aixcoder 
**Input:** Source code with obfuscated variable names  
**Output:** Semantically equivalent source code with readable variable names  

### Example

**Input**
```javascript
function _0x12af(a, b) {
  let _0x9c3e = a * b;
  return _0x9c3e + 10;
}
```

**Output**
```javascript
function multiplyAndAdd(a, b) {
  let product = a * b;
  return product + 10;
}
```

---

## 🧠 Model Description

- **Architecture:** aixcoder (Transformer-based)
- **Fine-tuning Objective:** Context-aware variable renaming
- **Approach:** AST-guided identifier alignment + sequence generation
- **Languages:** JavaScript (primary), extendable to others

The model learns to infer meaningful variable names from **usage context**, not from superficial patterns.

---

## 🏗 Training Details

### Dataset
- Paired samples of:
  - Obfuscated code
  - Original / readable code
- Variable mappings extracted using **AST-based analysis**
- Realistic obfuscation patterns (minifiers, packers, name mangling)

### Training Objectives
- Identifier-aware sequence-to-sequence learning
- Contextual name prediction
- Syntax preservation

---

## 📦 Installation

```bash
pip install transformers torch accelerate
```

---

## ▶️ Usage

### Inference Example

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Neo111x/aixcoder-renaming"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True
)

code = '''
function _0x12af(a, b) {
  let _0x9c3e = a * b;
  return _0x9c3e + 10;
}
'''

inputs = tokenizer(code, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    do_sample=False
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## 🧪 Evaluation

- Identifier exact-match accuracy
- AST equivalence checks
- Manual readability assessment

---

## ⚠️ Limitations

- Generated names are **semantic approximations**, not original identifiers
- Performance degrades on:
  - Extremely short contexts
  - Heavy control-flow flattening
- Single-file scope only

---

## 🔐 Ethical Considerations

This model is intended for:
- Malware and binary analysis
- Digital forensics and incident response (DFIR)
- Code maintenance and auditing

It should **not** be used to violate software licenses or intellectual property rights.

---

## 🧩 Future Work

- Multi-language support (C/C++, Python)
- Function and class renaming
- Control-flow–aware modeling
- Integration with decompilers and IR tools

---

## 📜 License

Specify the license here (e.g., Apache-2.0, MIT).

---

## 📖 Citation

```bibtex
@misc{aixcoder_code_variable_renamer,
  title={Context-Aware Variable Renaming for Obfuscated Code using aixcoder},
  author={Your Name},
  year={2026},
  url={https://huggingface.co/Neo111x/aixcoder-renaming}
}
```