SuperbEmphasis
/

The-Omega-Directive-12B-EVISCERATED-FT

Text Generation

text-generation-inference

Model card Files Files and versions

The-Omega-Directive-12B-EVISCERATED-FT / README.md

SuperbEmphasis's picture

Update README.md

751244b verified 7 months ago

|

history blame contribute delete

1.2 kB

	---
	base_model: SuperbEmphasis/The-Omega-Directive-12B-EVISCERATED
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- mistral
	license: apache-2.0
	language:
	- en
	---

	omg it almost works!

	I stripped out the 5 least used layers. and then I used SFT over 4 epochs and a high learning rate.... and its almost good!

	My goal is to make a new Velvet eclipse with these "less used" paramets stripped out. reducing the size significantly to allow for a higher inference speed, and more room for context.


	NOTES
	```
	per_device_train_batch_size = 10,
	gradient_accumulation_steps = 4,
	num_train_epochs = 4, # Set this for 1 full training run.
	learning_rate = 5e-4, # Reduce to 2e-5 for long training runs
	```
	# Uploaded finetuned model

	- Developed by: SuperbEmphasis
	- License: apache-2.0
	- Finetuned from model : SuperbEmphasis/The-Omega-Directive-12B-EVISCERATED

	This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)