hmnshudhmn24's picture
Update README.md
6ed2e4d verified
---
license: apache-2.0
tags:
- code
- code-refactoring
- bug-detection
- code-translation
- static-analysis
- transformer
- developer-tools
language:
- code
pipeline_tag: other
model_type: transformer
library_name: transformers
datasets:
- custom
trained_on:
- multi-language code repositories
- refactor pairs
- bugfix pairs
- conversion pairs
---
# πŸš€ Universal Code Refactor 32B
Universal Code Refactor 32B is a complete **AI-driven code engineering system** designed to automate large-scale refactoring, bug discovery, language-to-language conversion, and code optimization.
The project includes a full toolkit: **model**, **pipelines**, **refactor engine**, **bug detector**, **conversion engine**, **API**, **CLI**, **Gradio UI**, **datasets**, and **training scripts**.
# 🌟 Features
## πŸ”§ 1. Multi-Language Code Refactoring
Supports intelligent transformations for multiple languages:
- **Python**
- **Java**
- **JavaScript**
Includes:
- Automatic formatting (Black + isort)
- Unused import removal
- Inline simple functions
- Java loop modernization β†’ for-each syntax
- JavaScript `var β†’ let` transformation
- Structural code cleanup
- Rule-based + AST-based hybrid refactoring
## 🐞 2. Static Bug Detection
Real AST-based detection, including:
- Possible None/null dereferences
- Unused variables
- Unsafe JavaScript `eval()` usage
- Missing null checks in Java
- Future support for type-based reasoning
## πŸ”„ 3. Multi-Language Code Conversion
Built-in conversions:
- **Python β†’ Java**
- **Java β†’ Python**
Supports:
- Function extraction
- Main() generation
- Basic block translation
- Extendable conversion rules
## πŸ“„ 4. Patch & Diff Generation
Automated patch engine creates:
- Unified diffs
- Patch previews
- Patch cleanliness scores
- Complexity reduction metrics
Useful for PR automation and CI pipelines.
## 🧠 5. Compact Transformer Code Model
The model includes:
- Token embedding
- Positional encoding
- Transformer encoder stack
- Code-token-aware tokenizer
- Modular upgrade path to LLaMA / CodeGen / StarCoder models
## 🌐 6. Deployment Ecosystem
Included ready-to-run components:
### βœ” FastAPI REST Server
```
uvicorn inference.api_server:app --reload
```
### βœ” CLI Tool
```
python inference/cli.py --mode refactor --file example.py
```
### βœ” Gradio Web UI
```
python inference/gradio_app.py
```
### βœ” Docker Container
```
docker build -t universal-refactor .
docker run -p 8000:8000 universal-refactor
```
### βœ” Hugging Face Spaces App
Located inside `/deployment/huggingface_spaces/`
# πŸ“‚ Project Structure
```
Universal-Code-Refactor-32B/
β”‚
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ MODEL_CARD.md
β”‚
β”œβ”€β”€ src/universal_refactor/
β”‚ β”œβ”€β”€ refactor_engine.py
β”‚ β”œβ”€β”€ bug_detector.py
β”‚ β”œβ”€β”€ code_converter.py
β”‚ β”œβ”€β”€ patch_generator.py
β”‚ β”œβ”€β”€ pipelines.py
β”‚ β”œβ”€β”€ tokenizer.py
β”‚ β”œβ”€β”€ model.py
β”‚ β”œβ”€β”€ long_context_manager.py
β”‚ β”œβ”€β”€ utils.py
β”‚ └── embeddings/
β”‚
β”œβ”€β”€ inference/
β”‚ β”œβ”€β”€ api_server.py
β”‚ β”œβ”€β”€ cli.py
β”‚ └── gradio_app.py
β”‚
β”œβ”€β”€ deployment/
β”‚ β”œβ”€β”€ Dockerfile
β”‚ └── huggingface_spaces/
β”‚
β”œβ”€β”€ training/
β”‚ β”œβ”€β”€ pretrain.py
β”‚ β”œβ”€β”€ finetune_refactor.py
β”‚ β”œβ”€β”€ finetune_bugfix.py
β”‚ β”œβ”€β”€ tokenizer_training.py
β”‚ β”œβ”€β”€ long_context_training.py
β”‚ └── distributed/
β”‚
└── datasets/
β”œβ”€β”€ code_repo_raw/
β”œβ”€β”€ multilingual_code_clean/
β”œβ”€β”€ refactor_pairs/
β”œβ”€β”€ bugfix_pairs/
β”œβ”€β”€ conversion_pairs/
└── metadata.json
```
# πŸ›  Installation
## 1. Clone Repository
```
git clone https://github.com/YOUR_USERNAME/universal-code-refactor-32b
cd universal-code-refactor-32b
```
## 2. Install Dependencies
```
pip install -r requirements.txt
```
# πŸš€ Usage Examples
## πŸ”§ Refactor Python Code
```
python inference/cli.py --mode refactor --file sample.py --lang python
```
## πŸ”„ Convert Java β†’ Python
```
python inference/cli.py --mode convert --file MyClass.java --src java --tgt python
```
## 🌐 Run Web UI
```
python inference/gradio_app.py
```
# πŸ“Š Evaluation Tools
The evaluation pipeline computes:
- Cyclomatic complexity reduction
- Patch cleanliness
- Code change metrics
- Structural improvement score
Run evaluation:
```
python evaluation/evaluate.py
```