File size: 3,395 Bytes
a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 4de9808 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 2809080 a256a4d 9089f56 d881c11 a256a4d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
# 🔍 Obfuscated Variable Renaming with aixcoder
This repository hosts a **aixcoder–based model** fine-tuned to **rename obfuscated variables in source code**, improving readability while preserving program semantics.
The model is designed for use cases such as **malware analysis, reverse engineering, digital forensics, and general program comprehension**.
---
## 🚀 Task Overview
**Task:** Code Deobfuscation / Variable Renaming
**Base Model:** aixcoder
**Input:** Source code with obfuscated variable names
**Output:** Semantically equivalent source code with readable variable names
### Example
**Input**
```javascript
function _0x12af(a, b) {
let _0x9c3e = a * b;
return _0x9c3e + 10;
}
```
**Output**
```javascript
function multiplyAndAdd(a, b) {
let product = a * b;
return product + 10;
}
```
---
## 🧠 Model Description
- **Architecture:** aixcoder (Transformer-based)
- **Fine-tuning Objective:** Context-aware variable renaming
- **Approach:** AST-guided identifier alignment + sequence generation
- **Languages:** JavaScript (primary), extendable to others
The model learns to infer meaningful variable names from **usage context**, not from superficial patterns.
---
## 🏗 Training Details
### Dataset
- Paired samples of:
- Obfuscated code
- Original / readable code
- Variable mappings extracted using **AST-based analysis**
- Realistic obfuscation patterns (minifiers, packers, name mangling)
### Training Objectives
- Identifier-aware sequence-to-sequence learning
- Contextual name prediction
- Syntax preservation
---
## 📦 Installation
```bash
pip install transformers torch accelerate
```
---
## ▶️ Usage
### Inference Example
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "Neo111x/aixcoder-renaming"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
trust_remote_code=True
)
code = '''
function _0x12af(a, b) {
let _0x9c3e = a * b;
return _0x9c3e + 10;
}
'''
inputs = tokenizer(code, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=1024,
do_sample=False
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## 🧪 Evaluation
- Identifier exact-match accuracy
- AST equivalence checks
- Manual readability assessment
---
## ⚠️ Limitations
- Generated names are **semantic approximations**, not original identifiers
- Performance degrades on:
- Extremely short contexts
- Heavy control-flow flattening
- Single-file scope only
---
## 🔐 Ethical Considerations
This model is intended for:
- Malware and binary analysis
- Digital forensics and incident response (DFIR)
- Code maintenance and auditing
It should **not** be used to violate software licenses or intellectual property rights.
---
## 🧩 Future Work
- Multi-language support (C/C++, Python)
- Function and class renaming
- Control-flow–aware modeling
- Integration with decompilers and IR tools
---
## 📜 License
Specify the license here (e.g., Apache-2.0, MIT).
---
## 📖 Citation
```bibtex
@misc{aixcoder_code_variable_renamer,
title={Context-Aware Variable Renaming for Obfuscated Code using aixcoder},
author={Your Name},
year={2026},
url={https://huggingface.co/Neo111x/aixcoder-renaming}
}
``` |