|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- code |
|
|
- code-refactoring |
|
|
- bug-detection |
|
|
- code-translation |
|
|
- static-analysis |
|
|
- transformer |
|
|
- developer-tools |
|
|
language: |
|
|
- code |
|
|
pipeline_tag: other |
|
|
model_type: transformer |
|
|
library_name: transformers |
|
|
datasets: |
|
|
- custom |
|
|
trained_on: |
|
|
- multi-language code repositories |
|
|
- refactor pairs |
|
|
- bugfix pairs |
|
|
- conversion pairs |
|
|
--- |
|
|
|
|
|
# π Universal Code Refactor 32B |
|
|
|
|
|
Universal Code Refactor 32B is a complete **AI-driven code engineering system** designed to automate large-scale refactoring, bug discovery, language-to-language conversion, and code optimization. |
|
|
The project includes a full toolkit: **model**, **pipelines**, **refactor engine**, **bug detector**, **conversion engine**, **API**, **CLI**, **Gradio UI**, **datasets**, and **training scripts**. |
|
|
|
|
|
|
|
|
|
|
|
# π Features |
|
|
|
|
|
## π§ 1. Multi-Language Code Refactoring |
|
|
Supports intelligent transformations for multiple languages: |
|
|
|
|
|
- **Python** |
|
|
- **Java** |
|
|
- **JavaScript** |
|
|
|
|
|
Includes: |
|
|
- Automatic formatting (Black + isort) |
|
|
- Unused import removal |
|
|
- Inline simple functions |
|
|
- Java loop modernization β for-each syntax |
|
|
- JavaScript `var β let` transformation |
|
|
- Structural code cleanup |
|
|
- Rule-based + AST-based hybrid refactoring |
|
|
|
|
|
|
|
|
|
|
|
## π 2. Static Bug Detection |
|
|
Real AST-based detection, including: |
|
|
|
|
|
- Possible None/null dereferences |
|
|
- Unused variables |
|
|
- Unsafe JavaScript `eval()` usage |
|
|
- Missing null checks in Java |
|
|
- Future support for type-based reasoning |
|
|
|
|
|
|
|
|
|
|
|
## π 3. Multi-Language Code Conversion |
|
|
Built-in conversions: |
|
|
|
|
|
- **Python β Java** |
|
|
- **Java β Python** |
|
|
|
|
|
Supports: |
|
|
- Function extraction |
|
|
- Main() generation |
|
|
- Basic block translation |
|
|
- Extendable conversion rules |
|
|
|
|
|
|
|
|
|
|
|
## π 4. Patch & Diff Generation |
|
|
Automated patch engine creates: |
|
|
|
|
|
- Unified diffs |
|
|
- Patch previews |
|
|
- Patch cleanliness scores |
|
|
- Complexity reduction metrics |
|
|
|
|
|
Useful for PR automation and CI pipelines. |
|
|
|
|
|
|
|
|
|
|
|
## π§ 5. Compact Transformer Code Model |
|
|
The model includes: |
|
|
|
|
|
- Token embedding |
|
|
- Positional encoding |
|
|
- Transformer encoder stack |
|
|
- Code-token-aware tokenizer |
|
|
- Modular upgrade path to LLaMA / CodeGen / StarCoder models |
|
|
|
|
|
|
|
|
|
|
|
## π 6. Deployment Ecosystem |
|
|
Included ready-to-run components: |
|
|
|
|
|
### β FastAPI REST Server |
|
|
``` |
|
|
uvicorn inference.api_server:app --reload |
|
|
``` |
|
|
|
|
|
### β CLI Tool |
|
|
``` |
|
|
python inference/cli.py --mode refactor --file example.py |
|
|
``` |
|
|
|
|
|
### β Gradio Web UI |
|
|
``` |
|
|
python inference/gradio_app.py |
|
|
``` |
|
|
|
|
|
### β Docker Container |
|
|
``` |
|
|
docker build -t universal-refactor . |
|
|
docker run -p 8000:8000 universal-refactor |
|
|
``` |
|
|
|
|
|
### β Hugging Face Spaces App |
|
|
Located inside `/deployment/huggingface_spaces/` |
|
|
|
|
|
|
|
|
|
|
|
# π Project Structure |
|
|
|
|
|
``` |
|
|
Universal-Code-Refactor-32B/ |
|
|
β |
|
|
βββ README.md |
|
|
βββ requirements.txt |
|
|
βββ MODEL_CARD.md |
|
|
β |
|
|
βββ src/universal_refactor/ |
|
|
β βββ refactor_engine.py |
|
|
β βββ bug_detector.py |
|
|
β βββ code_converter.py |
|
|
β βββ patch_generator.py |
|
|
β βββ pipelines.py |
|
|
β βββ tokenizer.py |
|
|
β βββ model.py |
|
|
β βββ long_context_manager.py |
|
|
β βββ utils.py |
|
|
β βββ embeddings/ |
|
|
β |
|
|
βββ inference/ |
|
|
β βββ api_server.py |
|
|
β βββ cli.py |
|
|
β βββ gradio_app.py |
|
|
β |
|
|
βββ deployment/ |
|
|
β βββ Dockerfile |
|
|
β βββ huggingface_spaces/ |
|
|
β |
|
|
βββ training/ |
|
|
β βββ pretrain.py |
|
|
β βββ finetune_refactor.py |
|
|
β βββ finetune_bugfix.py |
|
|
β βββ tokenizer_training.py |
|
|
β βββ long_context_training.py |
|
|
β βββ distributed/ |
|
|
β |
|
|
βββ datasets/ |
|
|
βββ code_repo_raw/ |
|
|
βββ multilingual_code_clean/ |
|
|
βββ refactor_pairs/ |
|
|
βββ bugfix_pairs/ |
|
|
βββ conversion_pairs/ |
|
|
βββ metadata.json |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
# π Installation |
|
|
|
|
|
## 1. Clone Repository |
|
|
``` |
|
|
git clone https://github.com/YOUR_USERNAME/universal-code-refactor-32b |
|
|
cd universal-code-refactor-32b |
|
|
``` |
|
|
|
|
|
## 2. Install Dependencies |
|
|
``` |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
# π Usage Examples |
|
|
|
|
|
## π§ Refactor Python Code |
|
|
``` |
|
|
python inference/cli.py --mode refactor --file sample.py --lang python |
|
|
``` |
|
|
|
|
|
## π Convert Java β Python |
|
|
``` |
|
|
python inference/cli.py --mode convert --file MyClass.java --src java --tgt python |
|
|
``` |
|
|
|
|
|
## π Run Web UI |
|
|
``` |
|
|
python inference/gradio_app.py |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
# π Evaluation Tools |
|
|
|
|
|
The evaluation pipeline computes: |
|
|
|
|
|
- Cyclomatic complexity reduction |
|
|
- Patch cleanliness |
|
|
- Code change metrics |
|
|
- Structural improvement score |
|
|
|
|
|
Run evaluation: |
|
|
``` |
|
|
python evaluation/evaluate.py |
|
|
``` |
|
|
|