shng2025
/

gptesla-small

Text Generation

text-generation-inference

Model card Files Files and versions

Metrics Training metrics Community

Model Card for Model ID

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: Shi Hao Ng, IB DP Student, Marlborough College Malaysia
Model type: Transformer. Decoder only
Language(s) (NLP): Python, HTML, etc.
License: mit
Finetuned from model [optional]: No

Model Sources [optional]

Repository: https://github.com/Ice-Citron/GPTesla

Uses

You input half finished python code, and it will generate python code.

Direct Use

Some level of fine tuning is likely needed or preferred. However I won't be working on this.

[More Information Needed]

Downstream Use [optional]

This can easily be used for IDEs. Not ideal though as it's rarely correct in its answer. Likely mainly attributed to how it's a pretty small model after all.
Even then, I'm already struggling to train it with my 4x Nvidia A100 PCIe 80GB, taking 15 hours!

How to Get Started with the Model

Use the code below to get started with the model.

just follow the instructions on huggingface "use this model". Should work. If not try and contact me.

[More Information Needed]

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

111 million parameter. FP16, 444 Megabytes.
Pretty fast and lightweight model when using T4 GPU.

Evaluation

https://huggingface.co/datasets/shng2025/gptesla-valid

Testing Data, Factors & Metrics

Testing Data

https://huggingface.co/datasets/shng2025/gptesla-train

Factors

Perhaps not accurate because I'm expecting 1 to 1 representation for code. As in reality there's many way of coding to reach the same logic. And a precise way of coding is not required.

Results

1.1 loss/train in the end. Model converged after 150,000 steps.
weights and biases file: https://wandb.ai/marlborough-college-malaysia/gptesla-small/runs/m9sqzqo3?nw=nwusershng2025

Summary

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: 4x Nvidia A100 PCIe + 96x AMD CPU
Hours used: 15 hours
Cloud Provider: Azure
Compute Region: Unclear
Carbon Emitted: [More Information Needed]

Model Architecture and Objective

Based on codeparrot. And using GPT2's architecture but it's weights are random initialised.

Compute Infrastructure

NVMe Link
4x Nvidia A100 PCIe
96x AMD CPU from Azure
900 GB RAM

Hardware

NVMe Link
4x Nvidia A100 PCIe
96x AMD CPU from Azure
900 GB RAM

Software

Python 3.10.14
Latest version of Pytorch, transformer, wandb libraries, etc. installed. Refer to github repo for versions
Accelerate

Citation [optional]

codeparrot used

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

·

Paper for shng2025/gptesla-small

Quantifying the Carbon Emissions of Machine Learning

Paper • 1910.09700 • Published Oct 21, 2019 • 48