| # 🔍 Obfuscated Variable Renaming with Qwen-Code | |
| This repository hosts a **Qwen-Code–based model** fine-tuned to **rename obfuscated variables in source code**, improving readability while preserving program semantics. | |
| The model is designed for use cases such as **malware analysis, reverse engineering, digital forensics, and general program comprehension**. | |
| --- | |
| ## 🚀 Task Overview | |
| **Task:** Code Deobfuscation / Variable Renaming | |
| **Base Model:** Qwen-Code | |
| **Input:** Source code with obfuscated variable names | |
| **Output:** Semantically equivalent source code with readable variable names | |
| ### Example | |
| **Input** | |
| ```javascript | |
| function _0x12af(a, b) { | |
| let _0x9c3e = a * b; | |
| return _0x9c3e + 10; | |
| } | |
| ``` | |
| **Output** | |
| ```javascript | |
| function multiplyAndAdd(a, b) { | |
| let product = a * b; | |
| return product + 10; | |
| } | |
| ``` | |
| --- | |
| ## 🧠 Model Description | |
| - **Architecture:** Qwen-Code (Transformer-based) | |
| - **Fine-tuning Objective:** Context-aware variable renaming | |
| - **Approach:** AST-guided identifier alignment + sequence generation | |
| - **Languages:** JavaScript (primary), extendable to others | |
| The model learns to infer meaningful variable names from **usage context**, not from superficial patterns. | |
| --- | |
| ## 🏗 Training Details | |
| ### Dataset | |
| - Paired samples of: | |
| - Obfuscated code | |
| - Original / readable code | |
| - Variable mappings extracted using **AST-based analysis** | |
| - Realistic obfuscation patterns (minifiers, packers, name mangling) | |
| ### Training Objectives | |
| - Identifier-aware sequence-to-sequence learning | |
| - Contextual name prediction | |
| - Syntax preservation | |
| --- | |
| ## 📦 Installation | |
| ```bash | |
| pip install transformers torch accelerate | |
| ``` | |
| --- | |
| ## ▶️ Usage | |
| ### Inference Example | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| model_id = "Neo111x/Variables-Renaming" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| device_map="auto", | |
| trust_remote_code=True | |
| ) | |
| code = ''' | |
| function _0x12af(a, b) { | |
| let _0x9c3e = a * b; | |
| return _0x9c3e + 10; | |
| } | |
| ''' | |
| inputs = tokenizer(code, return_tensors="pt") | |
| outputs = model.generate( | |
| **inputs, | |
| max_new_tokens=256, | |
| do_sample=False | |
| ) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| --- | |
| ## 🧪 Evaluation | |
| - Identifier exact-match accuracy | |
| - AST equivalence checks | |
| - Manual readability assessment | |
| --- | |
| ## ⚠️ Limitations | |
| - Generated names are **semantic approximations**, not original identifiers | |
| - Performance degrades on: | |
| - Extremely short contexts | |
| - Heavy control-flow flattening | |
| - Single-file scope only | |
| --- | |
| ## 🔐 Ethical Considerations | |
| This model is intended for: | |
| - Malware and binary analysis | |
| - Digital forensics and incident response (DFIR) | |
| - Code maintenance and auditing | |
| It should **not** be used to violate software licenses or intellectual property rights. | |
| --- | |
| ## 🧩 Future Work | |
| - Multi-language support (C/C++, Python) | |
| - Function and class renaming | |
| - Control-flow–aware modeling | |
| - Integration with decompilers and IR tools | |
| --- | |
| ## 📜 License | |
| Specify the license here (e.g., Apache-2.0, MIT). | |
| --- | |
| ## 📖 Citation | |
| ```bibtex | |
| @misc{qwen_code_variable_renamer, | |
| title={Context-Aware Variable Renaming for Obfuscated Code using Qwen-Code}, | |
| author={Your Name}, | |
| year={2026}, | |
| url={https://huggingface.co/Neo111x/Variables-Renaming} | |
| } | |
| ``` |