Variables-Renaming / README.md
Neo111x's picture
Update README.md
2330962 verified
# 🔍 Obfuscated Variable Renaming with Qwen-Code
This repository hosts a **Qwen-Code–based model** fine-tuned to **rename obfuscated variables in source code**, improving readability while preserving program semantics.
The model is designed for use cases such as **malware analysis, reverse engineering, digital forensics, and general program comprehension**.
---
## 🚀 Task Overview
**Task:** Code Deobfuscation / Variable Renaming
**Base Model:** Qwen-Code
**Input:** Source code with obfuscated variable names
**Output:** Semantically equivalent source code with readable variable names
### Example
**Input**
```javascript
function _0x12af(a, b) {
let _0x9c3e = a * b;
return _0x9c3e + 10;
}
```
**Output**
```javascript
function multiplyAndAdd(a, b) {
let product = a * b;
return product + 10;
}
```
---
## 🧠 Model Description
- **Architecture:** Qwen-Code (Transformer-based)
- **Fine-tuning Objective:** Context-aware variable renaming
- **Approach:** AST-guided identifier alignment + sequence generation
- **Languages:** JavaScript (primary), extendable to others
The model learns to infer meaningful variable names from **usage context**, not from superficial patterns.
---
## 🏗 Training Details
### Dataset
- Paired samples of:
- Obfuscated code
- Original / readable code
- Variable mappings extracted using **AST-based analysis**
- Realistic obfuscation patterns (minifiers, packers, name mangling)
### Training Objectives
- Identifier-aware sequence-to-sequence learning
- Contextual name prediction
- Syntax preservation
---
## 📦 Installation
```bash
pip install transformers torch accelerate
```
---
## ▶️ Usage
### Inference Example
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "Neo111x/Variables-Renaming"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
trust_remote_code=True
)
code = '''
function _0x12af(a, b) {
let _0x9c3e = a * b;
return _0x9c3e + 10;
}
'''
inputs = tokenizer(code, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## 🧪 Evaluation
- Identifier exact-match accuracy
- AST equivalence checks
- Manual readability assessment
---
## ⚠️ Limitations
- Generated names are **semantic approximations**, not original identifiers
- Performance degrades on:
- Extremely short contexts
- Heavy control-flow flattening
- Single-file scope only
---
## 🔐 Ethical Considerations
This model is intended for:
- Malware and binary analysis
- Digital forensics and incident response (DFIR)
- Code maintenance and auditing
It should **not** be used to violate software licenses or intellectual property rights.
---
## 🧩 Future Work
- Multi-language support (C/C++, Python)
- Function and class renaming
- Control-flow–aware modeling
- Integration with decompilers and IR tools
---
## 📜 License
Specify the license here (e.g., Apache-2.0, MIT).
---
## 📖 Citation
```bibtex
@misc{qwen_code_variable_renamer,
title={Context-Aware Variable Renaming for Obfuscated Code using Qwen-Code},
author={Your Name},
year={2026},
url={https://huggingface.co/Neo111x/Variables-Renaming}
}
```