feat: Add sponsorship and website section

d509550 verified 8 months ago

4.63 kB

	---
	base_model: NousResearch/Meta-Llama-3-8B
	tags:
	- generated_from_trainer
	model-index:
	- name: llama3-8b-redmond-code290k
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.0`
	```yaml
	base_model: NousResearch/Meta-Llama-3-8B
	model_type: LlamaForCausalLM
	tokenizer_type: AutoTokenizer

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	datasets:
	- path: b-mc2/sql-create-context
	type: context_qa.load_v2
	dataset_prepared_path: last_run_prepared
	val_set_size: 0.05
	output_dir: ./artificialguybr/llama3-8b-redmond-code290k

	sequence_len: 8192
	sample_packing: true
	pad_to_sequence_len: true

	wandb_project: artificialguybr/llama3-8b-redmond-code290k
	wandb_entity:

	---
	### 🌐 Website
	You can find more of my models, projects, and information on my official website:
	- [artificialguy.com](https://artificialguy.com/)

	### 💖 Support My Work
	If you find this model useful, please consider supporting my work. It helps me cover server costs and dedicate more time to new open-source projects.
	- Patreon: [Support on Patreon](https://www.patreon.com/user?u=81570187)
	- Ko-fi: [Buy me a Ko-fi](https://ko-fi.com/artificialguybr)
	- Buy Me a Coffee: [Buy me a Coffee](https://buymeacoffee.com/jvkape)
	wandb_watch:
	wandb_name:
	wandb_log_model:

	gradient_accumulation_steps: 8
	micro_batch_size: 1
	num_epochs: 3
	optimizer: paged_adamw_8bit
	lr_scheduler: cosine
	learning_rate: 2e-5

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: false

	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: false
	early_stopping_patience:
	resume_from_checkpoint:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	warmup_steps: 100
	evals_per_epoch: 2
	eval_table_size:
	saves_per_epoch: 1
	debug:
	deepspeed:
	weight_decay: 0.0
	fsdp:
	fsdp_config:
	special_tokens:
	pad_token: <\|end_of_text\|>

	```

	</details><br>

	# LLAMA 3 8B Redmond CODE 290K

	Thanks to [Redmond.ai](https://redmond.ai) for the GPU Support!

	This model is a fine-tuned version of [NousResearch/Meta-Llama-3-8B](https://huggingface.co/NousResearch/Meta-Llama-3-8B) on the [ajibawa-2023/Code-290k-ShareGPT](https://huggingface.co/datasets/ajibawa-2023/Code-290k-ShareGPT) dataset.

	## Model description

	The Code-290k-ShareGPT model is a large language model designed to generate code and explanations in various programming languages, including Python, Java, JavaScript, GO, C++, Rust, Ruby, SQL, MySQL, R, Julia, Haskell, and more. It takes as input a prompt or question and outputs a corresponding code snippet with a detailed explanation.

	The model is trained on a massive dataset of approximately 290,000 conversations, each consisting of two conversations. This dataset is in the Vicuna/ShareGPT format, which allows for efficient training and fine-tuning of the model.

	The model is intended to be used in applications where code generation and explanation are necessary, such as coding assistance, education, and knowledge sharing.

	## Intended uses & limitations
	Intended uses:

	Generating code and explanations in various programming languages

	Assisting in coding tasks and education

	Providing knowledge sharing and documentation

	Integrating with other language models or tools to provide a more comprehensive coding experience

	Limitations:

	The model may not perform well on very rare or niche programming languages

	The model may not generalize well to unseen coding styles or conventions

	The model may not be able to handle extremely complex code or edge cases

	The model may not be able to provide explanations for highly abstract or theoretical concepts

	The model may not be able to handle ambiguous or open-ended prompts## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 2

	### Training results

	Soon

	### Framework versions

	- Transformers 4.40.0.dev0
	- Pytorch 2.2.2+cu121
	- Datasets 2.15.0
	- Tokenizers 0.15.0