Spaces:
Sleeping
Sleeping
| title: Code2Pseudo | |
| emoji: π’ | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.35.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Convert C++ to Pseudocode using a Transformer Model. | |
| # π Code2Pseudo β Transformer-based C++ to Pseudocode Converter | |
| [](LICENSE) | |
| [](https://www.python.org/) | |
| [](https://huggingface.co/spaces/asadsandhu/Code2Pseudo) | |
| [](https://github.com/asadsandhu/Code2Pseudo) | |
| > A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert executable C++ code into high-level pseudocode. Trained on the [SPoC dataset](https://arxiv.org/abs/2005.04326) from Stanford. | |
| --- | |
| ## πΌοΈ Demo | |
| Try it live on **Hugging Face Spaces**: | |
| π https://huggingface.co/spaces/asadsandhu/Code2Pseudo | |
|  | |
| --- | |
| ## π§ Model Architecture | |
| - Built from scratch using the **Transformer** encoder-decoder architecture (PyTorch) | |
| - No pre-trained libraries β 100% custom code | |
| - Token-level sequence generation with greedy decoding | |
| - Custom tokenization and vocabulary building for both C++ and pseudocode | |
| ``` | |
| Input: C++ lines (line-by-line) | |
| Model: Transformer (Encoder-Decoder) | |
| Output: Corresponding pseudocode line | |
| ``` | |
| --- | |
| ## π Dataset | |
| We trained on the **SPoC dataset**: | |
| - β Cleanly aligned C++ β pseudocode line pairs | |
| - β High-quality syntactic coverage | |
| - β Multiple test splits available | |
| - β Custom preprocessing and token handling | |
| > π Licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | |
| --- | |
| ## π Directory Structure | |
| ``` | |
| . | |
| βββ app.py # Gradio web app (C++ β Pseudocode) | |
| βββ train.py # Training script for code-to-pseudocode model | |
| βββ model.pth # Trained model and vocab checkpoint | |
| βββ spoc/ | |
| β βββ train/ | |
| β βββ spoc-train.tsv | |
| β βββ split/spoc-train-eval.tsv | |
| βββ assets/ | |
| β βββ demo.png # Screenshot for README | |
| βββ README.md # This file | |
| ```` | |
| --- | |
| ## π οΈ How to Run Locally | |
| ### βοΈ 1. Clone the Repo | |
| ```bash | |
| git clone https://github.com/asadsandhu/Code2Pseudo.git | |
| cd Code2Pseudo | |
| pip install torch gradio tqdm | |
| ```` | |
| ### π 2. Launch the Web App | |
| Make sure `model.pth` exists (or train it first): | |
| ```bash | |
| python app.py | |
| ``` | |
| The interface will open in your browser. | |
| --- | |
| ## π§ͺ Training the Model | |
| To retrain the transformer model: | |
| ```bash | |
| python train.py | |
| ``` | |
| By default: | |
| * Downloads SPoC dataset from GitHub | |
| * Trains for 10 epochs | |
| * Produces `model.pth` with weights and vocabulary | |
| --- | |
| ## π§ Key Hyperparameters | |
| | Parameter | Value | | |
| | -------------- | ----------- | | |
| | Model Type | Transformer | | |
| | Max Length | 128 | | |
| | Embedding Dim | 256 | | |
| | FFN Dim | 512 | | |
| | Heads | 4 | | |
| | Encoder Layers | 2 | | |
| | Decoder Layers | 2 | | |
| | Batch Size | 64 | | |
| | Epochs | 10 | | |
| | Optimizer | Adam | | |
| | Learning Rate | 1e-4 | | |
| --- | |
| ## π§© Example Input | |
| ```cpp | |
| int main() { | |
| int n , nn , ans = 0 ; | |
| cin > > n ; | |
| for ( int i = 2 ; i < = n - 1 ; i + + ) { | |
| nn = n ; | |
| while ( nn = = 0 ) ans + = nn % i , nn / = i ; | |
| } | |
| o = gcd ( ans , n - 2 ) ; | |
| cout < < ans / 2 / o ( n - 2 ) / o < < endl ; | |
| return 0; | |
| } | |
| ``` | |
| ### β© Output Pseudocode | |
| ```text | |
| create integers n , nn , ans with ans = 0 | |
| read n | |
| for i = 2 to n - 1 inclusive | |
| set nn to n | |
| while nn is 0 , set ans to nn % 12 , set ans to nn % nn , set nn to nn / i | |
| set value of gcd to ans and n - 2 | |
| print ans / 2 / ( n - 2 ) / o | |
| ``` | |
| --- | |
| ## π¦ Deployment | |
| Live demo hosted on: | |
| * **Hugging Face Spaces**: [Code2Pseudo](https://huggingface.co/spaces/asadsandhu/Code2Pseudo) | |
| * **GitHub**: [github.com/asadsandhu/Code2Pseudo](https://github.com/asadsandhu/Code2Pseudo) | |
| --- | |
| ## π Acknowledgements | |
| * π **SPoC Dataset** by Stanford University | |
| Kulal, S., Pasupat, P., & Liang, P. (2020). [SPoC: Search-based Pseudocode to Code](https://arxiv.org/abs/2005.04326) | |
| * π§ Transformer Paper: ["Attention is All You Need"](https://arxiv.org/abs/1706.03762) | |
| --- | |
| ## π§βπ» Author | |
| **Asad Ali** | |
| [GitHub: asadsandhu](https://github.com/asadsandhu) | |
| [Hugging Face: asadsandhu](https://huggingface.co/asadsandhu) | |
| [LinkedIn: asadxali](https://www.linkedin.com/in/asadxali) | |
| --- | |
| ## π License | |
| This project is licensed under the MIT License. | |
| Use, remix, and distribute freely with attribution. |