Wade5
/

MyModel2

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

MyModel2 / README.md

Wade5's picture

Update README.md

703f1ba verified about 1 year ago

|

history blame contribute delete

3.22 kB

	---
	library_name: transformers
	license: mit
	base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
	tags:
	- generated_from_trainer
	- gguf
	- quantized
	- inference
	model-index:
	- name: MyModel2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# MyModel2

	This model is a fine-tuned version of [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1089

	## Model description

	This is a fine-tuned model available in both SafeTensors and GGUF formats. The GGUF version allows efficient inference with tools like `llama.cpp` and `ctransformers`.

	## Intended uses & limitations

	This model can be used for various natural language processing tasks. However, it may have limitations based on the dataset and fine-tuning constraints.

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 5
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.9498 \| 0.2693 \| 500 \| 0.6119 \|
	\| 0.6245 \| 0.5385 \| 1000 \| 0.5831 \|
	\| 0.5931 \| 0.8078 \| 1500 \| 0.5462 \|
	\| 0.561 \| 1.0770 \| 2000 \| 0.5148 \|
	\| 0.5312 \| 1.3463 \| 2500 \| 0.4750 \|
	\| 0.523 \| 1.6155 \| 3000 \| 0.4421 \|
	\| 0.5121 \| 1.8848 \| 3500 \| 0.4096 \|
	\| 0.4059 \| 2.1540 \| 4000 \| 0.3263 \|
	\| 0.3559 \| 2.4233 \| 4500 \| 0.2780 \|
	\| 0.3409 \| 2.6925 \| 5000 \| 0.2367 \|
	\| 0.3352 \| 2.9618 \| 5500 \| 0.1973 \|
	\| 0.1918 \| 3.2310 \| 6000 \| 0.1652 \|
	\| 0.1826 \| 3.5003 \| 6500 \| 0.1507 \|
	\| 0.1762 \| 3.7695 \| 7000 \| 0.1360 \|
	\| 0.168 \| 4.0388 \| 7500 \| 0.1232 \|
	\| 0.1186 \| 4.3080 \| 8000 \| 0.1193 \|
	\| 0.1227 \| 4.5773 \| 8500 \| 0.1134 \|
	\| 0.1273 \| 4.8465 \| 9000 \| 0.1089 \|

	## Inference

	This model supports inference via GGUF using `llama.cpp` or `ctransformers`.

	### Using `llama.cpp` (CLI)
	```bash
	git clone https://github.com/ggerganov/llama.cpp.git
	cd llama.cpp
	make -j
	./main -m first.gguf -p "Hello, how are you?"
	```

	### Using `ctransformers` (Python)
	```python
	from ctransformers import AutoModelForCausalLM

	model = AutoModelForCausalLM.from_pretrained(
	"your_username/your_model_repo",
	model_file="first.gguf",
	model_type="llama"
	)

	output = model("Hello, how are you?")
	print(output)
	```

	## Framework versions

	- Transformers 4.48.2
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0