| # README | |
| ## Model Summary | |
| This is a instruction-tuned version of the [Starcoder2-3B model](https://huggingface.co/bigcode/starcoder2-3b). It has been trained using the same [repository](https://github.com/bigcode-project/starcoder2-self-align) and [dataset](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k) used for Starcoder2-15B. It uses the same prompt generation technique as the Starcoder2-15B mode. So, it can be used as a drop in replacement by just changing the model path. | |
| * [Paper](https://arxiv.org/abs/2402.19173) | |
| ## Intended Use | |
| Running code language models locally. This model can easily run on: | |
| * 8 GB and 10 GB VRAM machines with FP16 | |
| * 6 GB VRAM machines with INT8 | |
| * 4 GB VRAM machines with INT4 | |
| ## Example | |
| **Using FP16** | |
| ```python | |
| import transformers | |
| import torch | |
| pipeline = transformers.pipeline( | |
| model="outputs_starcoder3b_4e", | |
| task="text-generation", | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto", | |
| ) | |
| def respond(instruction: str, response_prefix: str) -> str: | |
| messages = [{"role": "user", "content": instruction}] | |
| prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False) | |
| prompt += response_prefix | |
| teminators = [ | |
| pipeline.tokenizer.eos_token_id, | |
| pipeline.tokenizer.convert_tokens_to_ids("###"), | |
| ] | |
| result = pipeline( | |
| prompt, | |
| max_length=1024, | |
| num_return_sequences=1, | |
| do_sample=False, | |
| eos_token_id=teminators, | |
| pad_token_id=pipeline.tokenizer.eos_token_id, | |
| truncation=True, | |
| ) | |
| response = response_prefix + result[0]["generated_text"][len(prompt) :].split("###")[0].rstrip() | |
| return response | |
| instruction = "Write the Transformer encoder in PyTorch." | |
| response_prefix = "" | |
| print(respond(instruction, response_prefix)) | |
| ``` | |
| *Output:* | |
| ```` | |
| ```python | |
| import torch | |
| import torch.nn as nn | |
| class TransformerEncoder(nn.Module): | |
| def __init__(self, d_model, nhead, num_layers, dim_feedforward=2048, dropout=0.1): | |
| super(TransformerEncoder, self).__init__() | |
| self.encoder_layer = nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward, dropout) | |
| self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers) | |
| def forward(self, src): | |
| return self.transformer_encoder(src) | |
| ``` | |
| ```` | |
| ## Training | |
| * 4 epochs | |
| * Training type: Full fine tuning | |
| * Training time: ~4 hours | |
| * Batch size: 2 | |
| * Gradient accumulation step: 256 | |
| * Sequence length: 1280 | |
| ### Exact Training Command Used | |
| **See the [repository](https://github.com/bigcode-project/starcoder2-self-align) for setup details.** | |
| ``` | |
| MODEL_KEY=bigcode/starcoder2-3b | |
| LR=1e-5 | |
| EPOCH=4 | |
| SEQ_LEN=1280 | |
| WARMUP_RATIO=0.05 | |
| OUTPUT_DIR=outputs_starcoder3b_4e | |
| DATASET_FILE=train_data.jsonl | |
| accelerate launch -m star_align.train \ | |
| --model_key $MODEL_KEY \ | |
| --model_name_or_path $MODEL_KEY \ | |
| --use_flash_attention True \ | |
| --datafile_paths $DATASET_FILE \ | |
| --output_dir $OUTPUT_DIR \ | |
| --bf16 True \ | |
| --num_train_epochs $EPOCH \ | |
| --max_training_seq_length $SEQ_LEN \ | |
| --pad_to_max_length False \ | |
| --per_device_train_batch_size 2 \ | |
| --gradient_accumulation_steps 256 \ | |
| --group_by_length False \ | |
| --ddp_find_unused_parameters False \ | |
| --logging_steps 1 \ | |
| --log_level info \ | |
| --optim adafactor \ | |
| --max_grad_norm -1 \ | |
| --warmup_ratio $WARMUP_RATIO \ | |
| --learning_rate $LR \ | |
| --lr_scheduler_type linear \ | |
| --attention_dropout 0.0 \ | |
| --residual_dropout 0.0 \ | |
| --embedding_dropout 0.0 | |
| ``` | |
| ### Hardware | |
| * 40 GB NVIDIA A100 | |
| ## Attributions | |
| * [Starcoder2 Self Align codebase](https://github.com/bigcode-project/starcoder2-self-align) | |
| * [Starcoder2 Self Align dataset](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k) | |
| * [Starcoder2 paper](https://arxiv.org/abs/2402.19173) | |
| ## License | |
| The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/spaces/bigcode/bigcode-model-license-agreement). |