# 🔍 Obfuscated Variable Renaming with aixcoder This repository hosts a **aixcoder–based model** fine-tuned to **rename obfuscated variables in source code**, improving readability while preserving program semantics. The model is designed for use cases such as **malware analysis, reverse engineering, digital forensics, and general program comprehension**. --- ## 🚀 Task Overview **Task:** Code Deobfuscation / Variable Renaming **Base Model:** aixcoder **Input:** Source code with obfuscated variable names **Output:** Semantically equivalent source code with readable variable names ### Example **Input** ```javascript function _0x12af(a, b) { let _0x9c3e = a * b; return _0x9c3e + 10; } ``` **Output** ```javascript function multiplyAndAdd(a, b) { let product = a * b; return product + 10; } ``` --- ## 🧠 Model Description - **Architecture:** aixcoder (Transformer-based) - **Fine-tuning Objective:** Context-aware variable renaming - **Approach:** AST-guided identifier alignment + sequence generation - **Languages:** JavaScript (primary), extendable to others The model learns to infer meaningful variable names from **usage context**, not from superficial patterns. --- ## 🏗 Training Details ### Dataset - Paired samples of: - Obfuscated code - Original / readable code - Variable mappings extracted using **AST-based analysis** - Realistic obfuscation patterns (minifiers, packers, name mangling) ### Training Objectives - Identifier-aware sequence-to-sequence learning - Contextual name prediction - Syntax preservation --- ## 📦 Installation ```bash pip install transformers torch accelerate ``` --- ## ▶️ Usage ### Inference Example ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "Neo111x/aixcoder-renaming" tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", trust_remote_code=True ) code = ''' function _0x12af(a, b) { let _0x9c3e = a * b; return _0x9c3e + 10; } ''' inputs = tokenizer(code, return_tensors="pt") outputs = model.generate( **inputs, max_new_tokens=1024, do_sample=False ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` --- ## 🧪 Evaluation - Identifier exact-match accuracy - AST equivalence checks - Manual readability assessment --- ## ⚠️ Limitations - Generated names are **semantic approximations**, not original identifiers - Performance degrades on: - Extremely short contexts - Heavy control-flow flattening - Single-file scope only --- ## 🔐 Ethical Considerations This model is intended for: - Malware and binary analysis - Digital forensics and incident response (DFIR) - Code maintenance and auditing It should **not** be used to violate software licenses or intellectual property rights. --- ## 🧩 Future Work - Multi-language support (C/C++, Python) - Function and class renaming - Control-flow–aware modeling - Integration with decompilers and IR tools --- ## 📜 License Specify the license here (e.g., Apache-2.0, MIT). --- ## 📖 Citation ```bibtex @misc{aixcoder_code_variable_renamer, title={Context-Aware Variable Renaming for Obfuscated Code using aixcoder}, author={Your Name}, year={2026}, url={https://huggingface.co/Neo111x/aixcoder-renaming} } ```