Neo111x
/

Variables-Renaming

Model card Files Files and versions

Variables-Renaming / README.md

Neo111x's picture

Update README.md

2330962 verified 7 days ago

|

history blame contribute delete

3.4 kB

	# 🔍 Obfuscated Variable Renaming with Qwen-Code

	This repository hosts a Qwen-Code–based model fine-tuned to rename obfuscated variables in source code, improving readability while preserving program semantics.

	The model is designed for use cases such as malware analysis, reverse engineering, digital forensics, and general program comprehension.

	---

	## 🚀 Task Overview

	Task: Code Deobfuscation / Variable Renaming
	Base Model: Qwen-Code
	Input: Source code with obfuscated variable names
	Output: Semantically equivalent source code with readable variable names

	### Example

	Input
	```javascript
	function _0x12af(a, b) {
	let _0x9c3e = a * b;
	return _0x9c3e + 10;
	}
	```

	Output
	```javascript
	function multiplyAndAdd(a, b) {
	let product = a * b;
	return product + 10;
	}
	```

	---

	## 🧠 Model Description

	- Architecture: Qwen-Code (Transformer-based)
	- Fine-tuning Objective: Context-aware variable renaming
	- Approach: AST-guided identifier alignment + sequence generation
	- Languages: JavaScript (primary), extendable to others

	The model learns to infer meaningful variable names from usage context, not from superficial patterns.

	---

	## 🏗 Training Details

	### Dataset
	- Paired samples of:
	- Obfuscated code
	- Original / readable code
	- Variable mappings extracted using AST-based analysis
	- Realistic obfuscation patterns (minifiers, packers, name mangling)

	### Training Objectives
	- Identifier-aware sequence-to-sequence learning
	- Contextual name prediction
	- Syntax preservation

	---

	## 📦 Installation

	```bash
	pip install transformers torch accelerate
	```

	---

	## ▶️ Usage

	### Inference Example

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "Neo111x/Variables-Renaming"

	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	trust_remote_code=True
	)

	code = '''
	function _0x12af(a, b) {
	let _0x9c3e = a * b;
	return _0x9c3e + 10;
	}
	'''

	inputs = tokenizer(code, return_tensors="pt")
	outputs = model.generate(
	**inputs,
	max_new_tokens=256,
	do_sample=False
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	---

	## 🧪 Evaluation

	- Identifier exact-match accuracy
	- AST equivalence checks
	- Manual readability assessment

	---

	## ⚠️ Limitations

	- Generated names are semantic approximations, not original identifiers
	- Performance degrades on:
	- Extremely short contexts
	- Heavy control-flow flattening
	- Single-file scope only

	---

	## 🔐 Ethical Considerations

	This model is intended for:
	- Malware and binary analysis
	- Digital forensics and incident response (DFIR)
	- Code maintenance and auditing

	It should not be used to violate software licenses or intellectual property rights.

	---

	## 🧩 Future Work

	- Multi-language support (C/C++, Python)
	- Function and class renaming
	- Control-flow–aware modeling
	- Integration with decompilers and IR tools

	---

	## 📜 License

	Specify the license here (e.g., Apache-2.0, MIT).

	---

	## 📖 Citation

	```bibtex
	@misc{qwen_code_variable_renamer,
	title={Context-Aware Variable Renaming for Obfuscated Code using Qwen-Code},
	author={Your Name},
	year={2026},
	url={https://huggingface.co/Neo111x/Variables-Renaming}
	}
	```