| | --- |
| | license: mit |
| | datasets: |
| | - mikasenghaas/wikitext-2 |
| | language: |
| | - en |
| | metrics: |
| | - bleu |
| | - rouge |
| | - perplexity |
| | - accuracy |
| | base_model: |
| | - openai-community/gpt2 |
| | tags: |
| | - Quantized |
| | - Pruned |
| | - Small |
| | - Nano |
| | - SBC |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # Model Card: Pruned & Quantized GPT-2 Fine-Tuned on WikiText-2 |
| |
|
| | ## Model Summary |
| |
|
| | This model is a pruned and quantized version of the GPT-2 architecture, fine-tuned on the WikiText-2 dataset. The pruning and quantization techniques reduce the model's size and computational requirements, making it suitable for deployment in resource-constrained environments, such as edge devices or applications with limited computational power. |
| |
|
| | ## Model Details |
| |
|
| | ### Developed by |
| |
|
| | - **Developer:** [SynSci] |
| | - **Contact:** [swayam.singal@gmail.com] |
| |
|
| | ### Model Description |
| |
|
| | - **Architecture:** GPT-2 (Generative Pre-trained Transformer 2) |
| | - **Model Type:** Transformer-based language model |
| | - **Base Model:** [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) |
| | - **Language:** English |
| | - **License:** MIT |
| | - **Fine-tuned on:** [mikasenghaas/wikitext-2](https://huggingface.co/datasets/mikasenghaas/wikitext-2) |
| | - **Modifications:** |
| | - **Pruning:** Redundant weights removed to decrease model size and inference time. |
| | - **Quantization:** Weights quantized to 8-bit integers to reduce memory footprint and improve efficiency. |
| |
|
| | ### Direct Use |
| |
|
| | - Text generation |
| | - Language modeling |
| | - Autocomplete suggestions |
| | - Educational purposes in NLP and model optimization techniques |
| |
|
| | ### Downstream Use |
| |
|
| | - Integration into applications requiring efficient language models |
| | - Deployment on devices with limited computational resources |
| |
|
| | ### Out-of-Scope Use |
| |
|
| | - Generation of misleading or harmful content |
| | - Applications requiring understanding of languages other than English |
| | - Tasks demanding high-precision language understanding beyond the model's capabilities |
| |
|
| | ## Bias, Risks, and Limitations |
| |
|
| | ### Biases |
| |
|
| | The model inherits biases present in the GPT-2 architecture and the WikiText-2 dataset, which consists of Wikipedia articles. These biases may include underrepresentation of certain topics or perspectives. |
| |
|
| | ### Risks |
| |
|
| | - Potential generation of biased or inappropriate content |
| | - Misinterpretation of generated text as factual information |
| |
|
| | ### Limitations |
| |
|
| | - Reduced performance compared to the full-sized GPT-2 model due to pruning and quantization |
| | - Limited to English language understanding and generation |
| | - Not suitable for tasks requiring real-time processing of large-scale data |
| |
|
| | ### Recommendations |
| |
|
| | Users should: |
| |
|
| | - Implement content filtering mechanisms to prevent the generation of inappropriate content. |
| | - Avoid using the model for critical applications without thorough evaluation. |
| | - Be aware of the model's limitations in understanding nuanced language and context. |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("swayamsingal/NanoQuant") |
| | model = AutoModelForCausalLM.from_pretrained("swayamsingal/NanoQuant") |
| | |
| | input_text = "Once upon a time" |
| | inputs = tokenizer(input_text, return_tensors="pt") |
| | outputs = model.generate(**inputs, max_new_tokens=50) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | - **Dataset:** [mikasenghaas/wikitext-2](https://huggingface.co/datasets/mikasenghaas/wikitext-2) |
| | - **Description:** A collection of over 100 million tokens extracted from verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License. |
| |
|
| | ### Training Procedure |
| |
|
| | - **Preprocessing:** Standard tokenization and formatting compatible with GPT-2 requirements. |
| | - **Training Regime:** Fine-tuning performed using mixed-precision training to balance performance and resource utilization. |
| | - **Pruning:** Applied magnitude-based pruning to remove weights below a certain threshold. |
| | - **Quantization:** Post-training dynamic quantization to 8-bit integers for weights. |
| |
|
| | ### Hyperparameters |
| |
|
| | - **Learning Rate:** 5e-5 |
| | - **Batch Size:** 32 |
| | - **Epochs:** 3 |
| | - **Optimizer:** AdamW |
| | - **Weight Decay:** 0.01 |
| |
|
| | ### Speeds, Sizes, Times |
| |
|
| | - **Original Model Size:** ~500 MB |
| | - **Pruned & Quantized Model Size:** ~6 MB |
| | - **Training Time:** Approximately 2 hours on a single MPS chip |
| |
|
| | ## Evaluation |
| |
|
| | ### Testing Data |
| |
|
| | - **Dataset:** [mikasenghaas/wikitext-2](https://huggingface.co/datasets/mikasenghaas/wikitext-2) |
| | - **Split:** Validation set used for evaluation |
| |
|
| | ### Metrics |
| |
|
| | - **Perplexity:** 155.43 |
| | - **BLEU Score:** 0.0498 |
| | - **ROUGE-1 Score:** 0.1836 |
| | - **Accuracy:** 93.2% |
| |
|
| | ### Results Summary |
| |
|
| | The pruned and quantized model achieves competitive performance on the WikiText-2 validation set, with a significant reduction in model size and inference time compared to the original GPT-2 model. |
| |
|
| | ## Model Examination |
| |
|
| | While specific interpretability analyses were not conducted, the model's architecture remains consistent with GPT-2, and standard transformer interpretability techniques can be applied. |
| |
|
| | ## Environmental Impact |
| |
|
| | - **Hardware Type:** Macbook MPS [🙂↕️can't afford a good cuda gpu] |
| | - **Training Duration:** 2 hours |
| | - **Energy Consumption:** Approximately 0.5 kWh |
| | - **Carbon Emitted:** Estimated 0.2 kg CO₂ |
| |
|
| | ## Technical Specifications |
| |
|
| | ### Model Architecture and Objective |
| |
|
| | - **Architecture:** Transformer decoder with 12 layers, 12 attention heads, and a hidden size of 768. |
| | - **Objective:** Causal language modeling (predicting the next token in a sequence). |
| |
|
| | ### Compute Infrastructure |
| |
|
| | - **Hardware:** Single NVIDIA V100 GPU |
| | - **Software:** PyTorch, Transformers library by Hugging Face |
| |
|
| | ## Citation |
| |
|
| | If you use this model, please cite: |
| |
|
| | ```bibtex |
| | @misc{NanoQuant, |
| | title={NanoQuant}, |
| | author={swayamsingal}, |
| | year={2025}, |
| | howpublished={\url{https://huggingface.co/swayamsingal/NanoQuant}}, |
| | } |
| | ``` |
| |
|
| | ## Glossary |
| |
|
| | - **Pruning:** The process of removing weights from a neural network to reduce its size and computational requirements. |
| | - **Quantization:** The process of reducing the precision of the weights in a neural network, typically to 8-bit integers, to decrease model size and increase inference speed. |