Novaeus-Promptist-7B-Instruct / README.md

Improve language tag

68fe1aa verified 9 months ago

5.22 kB

	---
	license: creativeml-openrail-m
	datasets:
	- prithivMLmods/Prompt-Enhancement-Mini
	- gokaygokay/prompt-enhancement-75k
	- gokaygokay/prompt-enhancer-dataset
	language:
	- zho
	- eng
	- fra
	- spa
	- por
	- deu
	- ita
	- rus
	- jpn
	- kor
	- vie
	- tha
	- ara
	base_model:
	- Qwen/Qwen2.5-7B-Instruct
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- Qwen2.5
	- Prompt_Enhance
	- 7B
	- Instruct
	- safetensors
	- pytorch
	- Promptist-Instruct
	- text-generation-inference
	- art
	---

	### Novaeus-Promptist-7B-Instruct Uploaded Model Files

	The Novaeus-Promptist-7B-Instruct is a fine-tuned large language model derived from the Qwen2.5-7B-Instruct base model. It is optimized for prompt enhancement, text generation, and instruction-following tasks, providing high-quality outputs tailored to various applications.

	\| File Name [ Uploaded Files ] \| Size \| Description \| Upload Status \|
	\|--------------------------------------------\|---------------\|------------------------------------------\|-------------------\|
	\| `.gitattributes` \| 1.57 kB \| Git attributes configuration for LFS. \| Uploaded \|
	\| `README.md` \| 400 Bytes \| Documentation about the model. \| Updated \|
	\| `added_tokens.json` \| 657 Bytes \| Custom tokens for tokenizer. \| Uploaded \|
	\| `config.json` \| 860 Bytes \| Configuration for the model. \| Uploaded \|
	\| `generation_config.json` \| 281 Bytes \| Configuration for text generation. \| Uploaded \|
	\| `merges.txt` \| 1.82 MB \| Byte-pair encoding (BPE) merge rules. \| Uploaded \|
	\| `pytorch_model-00001-of-00004.bin` \| 4.88 GB \| Model weights (split part 1). \| Uploaded (LFS) \|
	\| `pytorch_model-00002-of-00004.bin` \| 4.93 GB \| Model weights (split part 2). \| Uploaded (LFS) \|
	\| `pytorch_model-00003-of-00004.bin` \| 4.33 GB \| Model weights (split part 3). \| Uploaded (LFS) \|
	\| `pytorch_model-00004-of-00004.bin` \| 1.09 GB \| Model weights (split part 4). \| Uploaded (LFS) \|
	\| `pytorch_model.bin.index.json` \| 28.1 kB \| Index file for model weights. \| Uploaded \|
	\| `special_tokens_map.json` \| 644 Bytes \| Map of special tokens for tokenizer. \| Uploaded \|
	\| `tokenizer.json` \| 11.4 MB \| Tokenizer data in JSON format. \| Uploaded (LFS) \|
	\| `tokenizer_config.json` \| 7.73 kB \| Tokenizer configuration file. \| Uploaded \|
	\| `vocab.json` \| 2.78 MB \| Vocabulary for tokenizer. \| Uploaded \|

	---
	![Screenshot 2024-12-07 113150.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/pqFaT-78hssi106bfJwpN.png)
	### Key Features:

	1. Prompt Refinement:
	Designed to enhance input prompts by rephrasing, clarifying, and optimizing for more precise outcomes.

	2. Instruction Following:
	Accurately follows complex user instructions for various generation tasks, including creative writing, summarization, and question answering.

	3. Customization and Fine-Tuning:
	Incorporates datasets specifically curated for prompt optimization, enabling seamless adaptation to specific user needs.

	---
	### Training Details:
	- Base Model: [Qwen2.5-7B-Instruct](#)
	- Datasets Used for Fine-Tuning:
	- gokaygokay/prompt-enhancer-dataset: Focuses on prompt engineering with 17.9k samples.
	- gokaygokay/prompt-enhancement-75k: Encompasses a wider array of prompt styles with 73.2k samples.
	- prithivMLmods/Prompt-Enhancement-Mini: A compact dataset (1.16k samples) for iterative refinement.

	---
	### Capabilities:

	- Prompt Optimization:
	Automatically refines and enhances user-input prompts for better generation results.

	- Instruction-Based Text Generation:
	Supports diverse tasks, including:
	- Creative writing (stories, poems, scripts).
	- Summaries and paraphrasing.
	- Custom Q&A systems.

	- Efficient Fine-Tuning:
	Adaptable to additional fine-tuning tasks by leveraging the model's existing high-quality instruction-following capabilities.

	---

	### Usage Instructions:

	1. Setup:
	- Ensure all necessary model files, including shards, tokenizer configurations, and index files, are downloaded and placed in the correct directory.

	2. Load Model:
	Use PyTorch or Hugging Face Transformers to load the model and tokenizer. Ensure `pytorch_model.bin.index.json` is correctly set for efficient shard-based loading.

	3. Customize Generation:
	Adjust parameters in `generation_config.json` to control aspects such as temperature, top-p sampling, and maximum sequence length.

	---