--- title: Pseudo2Code emoji: πŸ‘€ colorFrom: yellow colorTo: gray sdk: gradio sdk_version: 5.35.0 app_file: app.py pinned: false license: mit short_description: Convert pseudocode to C++ using a Transformer model. --- # πŸš€ Pseudo2Code – Transformer-based Pseudocode to C++ Converter [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/) [![Hugging Face](https://img.shields.io/badge/HuggingFace-Spaces-orange)](https://huggingface.co/spaces/asadsandhu/Pseudo2Code) [![GitHub Repo](https://img.shields.io/badge/GitHub-asadsandhu/Pseudo2Code-black?logo=github)](https://github.com/asadsandhu/Pseudo2Code) > A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert human-written pseudocode into executable C++ code. Trained on the [SPoC dataset](https://arxiv.org/abs/2005.04326) from Stanford. --- ## πŸ–ΌοΈ Demo Try it live on **Hugging Face Spaces**: πŸ‘‰ https://huggingface.co/spaces/asadsandhu/Pseudo2Code ![App Demo](assets/demo.png) --- ## 🧠 Model Architecture - Developed using the **Transformer** architecture from scratch in PyTorch - No pre-trained models (pure from-scratch implementation) - Token-level sequence generation using greedy decoding - Custom vocabulary construction for both pseudocode and C++ output ``` Input: Pseudocode lines (line-by-line) Model: Transformer (Encoder-Decoder) Output: C++ code line for each pseudocode line ``` --- ## πŸ“Š Dataset We used the **SPoC dataset** from Stanford: - βœ… Clean pseudocode–C++ line pairs - βœ… Token-level annotations for syntax handling - βœ… Multiple test splits (generalization to problems/workers) - βœ… Custom preprocessing and vocabulary building implemented > πŸ“Ž Licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) --- ## πŸ“ Directory Structure ``` . β”œβ”€β”€ app.py # Gradio web app for inference β”œβ”€β”€ train.py # Transformer training code β”œβ”€β”€ model.pth # Trained model weights β”œβ”€β”€ spoc/ # Dataset directory β”‚ └── train/ β”‚ β”œβ”€β”€ spoc-train.tsv β”‚ └── split/spoc-train-eval.tsv β”œβ”€β”€ assets/ β”‚ └── demo.png # App screenshot └── README.md # You're here ```` --- ## πŸ› οΈ How to Run Locally ### βš™οΈ 1. Clone Repo & Install Requirements ```bash git clone https://github.com/asadsandhu/Pseudo2Code.git cd Pseudo2Code pip install -r requirements.txt ```` Or manually install: ```bash pip install torch gradio tqdm ``` ### πŸš€ 2. Launch the App Make sure `model.pth` is present (or train using `train.py`): ```bash python app.py ``` The app will open in your browser. --- ## πŸ§ͺ Training the Model You can retrain the model using the `train.py` script: ```bash python train.py ``` By default, it downloads data from the public repo and trains for 10 epochs. Outputs a `model.pth` file with learned weights and vocab. --- ## πŸ”§ Key Hyperparameters | Parameter | Value | | -------------- | ----------- | | Model Type | Transformer | | Max Length | 128 | | Embedding Dim | 256 | | FFN Dim | 512 | | Heads | 4 | | Encoder Layers | 2 | | Decoder Layers | 2 | | Batch Size | 64 | | Epochs | 10 | | Optimizer | Adam | | Learning Rate | 1e-4 | --- ## 🧩 Example Input ```text n , nn, ans = integers with ans =0 Read n for i=2 to n-1 execute set nn to n while nn is not equal to 0, set ans to ans + nn%i, and also set nn= nn/i } set o to gcd(ans, n-2) print out ans/o "/" (n-2)/o ``` ### ⏩ Output C++ ```cpp int main() { int n , nn , ans = 0 ; cin > > n ; for ( int i = 2 ; i < = n - 1 ; i + + ) { nn = n ; while ( nn = = 0 ) ans + = nn % i , nn / = i ; } o = gcd ( ans , n - 2 ) ; cout < < ans / 2 / o ( n - 2 ) / o < < endl ; return 0; } ``` --- ## πŸ“¦ Deployment This app is deployed live on: * **Hugging Face Spaces**: [Pseudo2Code](https://huggingface.co/spaces/asadsandhu/Pseudo2Code) * **GitHub**: [github.com/asadsandhu/Pseudo2Code](https://github.com/asadsandhu/Pseudo2Code) --- ## πŸ™Œ Acknowledgements * πŸ“˜ **SPoC Dataset** by Stanford University Kulal, S., Pasupat, P., & Liang, P. (2020). [SPoC: Search-based Pseudocode to Code](https://arxiv.org/abs/2005.04326) * 🧠 Transformer Paper: ["Attention is All You Need"](https://arxiv.org/abs/1706.03762) --- ## πŸ§‘β€πŸ’» Author **Asad Ali** [GitHub: asadsandhu](https://github.com/asadsandhu) [Hugging Face: asadsandhu](https://huggingface.co/asadsandhu) [LinkedIn: asadxali](https://www.linkedin.com/in/asadxali) --- ## πŸ“„ License This project is licensed under the MIT License. Feel free to use, modify, and share with credit.