Update README.md
Browse files
README.md
CHANGED
|
@@ -1,26 +1,27 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
- autotrain
|
| 4 |
-
- text-generation-inference
|
| 5 |
-
- text-generation
|
| 6 |
-
- peft
|
| 7 |
-
library_name: transformers
|
| 8 |
-
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
|
| 9 |
-
widget:
|
| 10 |
-
- messages:
|
| 11 |
-
- role: user
|
| 12 |
-
content: What is your favorite condiment?
|
| 13 |
-
license: other
|
| 14 |
-
---
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
```python
|
| 23 |
|
|
|
|
|
|
|
| 24 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 25 |
|
| 26 |
model_path = "PATH_TO_THIS_REPO"
|
|
@@ -32,7 +33,7 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
| 32 |
torch_dtype='auto'
|
| 33 |
).eval()
|
| 34 |
|
| 35 |
-
#
|
| 36 |
messages = [
|
| 37 |
{"role": "user", "content": "hi"}
|
| 38 |
]
|
|
@@ -41,6 +42,34 @@ input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True,
|
|
| 41 |
output_ids = model.generate(input_ids.to('cuda'))
|
| 42 |
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
|
| 43 |
|
| 44 |
-
#
|
| 45 |
print(response)
|
| 46 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
**QuantumAI: Zero LLM Quantum AI Model**
|
| 2 |
+
This is QuantumAI, a cutting-edge text generation model based on Meta-Llama-3.1-8B-Instruct, fine-tuned for conversational tasks using AutoTrain. The model is designed to handle a variety of natural language processing tasks, with a special focus on interactive dialogue, text generation, and inference.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
|
| 4 |
+

|
| 5 |
|
| 6 |
+
*Model Information**
|
| 7 |
+
Base Model: meta-llama/Meta-Llama-3.1-8B
|
| 8 |
+
Fine-tuned Model: meta-llama/Meta-Llama-3.1-8B-Instruct
|
| 9 |
+
Training Framework: AutoTrain
|
| 10 |
+
Training Data: Conversational and text-generation focused dataset
|
| 11 |
+
Tech Stack:
|
| 12 |
+
Transformers
|
| 13 |
+
PEFT (Parameter-Efficient Fine-Tuning)
|
| 14 |
+
TensorBoard (for logging and metrics)
|
| 15 |
+
Safetensors
|
| 16 |
+
Language Model Task: Conversational and Text Generation
|
| 17 |
+
Usage Type: Interactive dialogue and text generation applications
|
| 18 |
+
Quantization: Model supports 4-bit quantization for efficient inference
|
| 19 |
|
| 20 |
+
**Installation and Usage**
|
| 21 |
+
To use this model in your code, follow the instructions below:
|
|
|
|
| 22 |
|
| 23 |
+
python
|
| 24 |
+
Copy code
|
| 25 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 26 |
|
| 27 |
model_path = "PATH_TO_THIS_REPO"
|
|
|
|
| 33 |
torch_dtype='auto'
|
| 34 |
).eval()
|
| 35 |
|
| 36 |
+
# Example usage
|
| 37 |
messages = [
|
| 38 |
{"role": "user", "content": "hi"}
|
| 39 |
]
|
|
|
|
| 42 |
output_ids = model.generate(input_ids.to('cuda'))
|
| 43 |
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
|
| 44 |
|
| 45 |
+
# Output
|
| 46 |
print(response)
|
| 47 |
+
Inference API
|
| 48 |
+
This model is not yet deployed to the Hugging Face Inference API. However, you can deploy it to Inference Endpoints for dedicated serverless inference.
|
| 49 |
+
|
| 50 |
+
Training Process
|
| 51 |
+
The QuantumAI model was trained using AutoTrain with the following configuration:
|
| 52 |
+
|
| 53 |
+
Hardware: CUDA 12.1
|
| 54 |
+
Training Precision: Mixed FP16
|
| 55 |
+
Batch Size: 2
|
| 56 |
+
Learning Rate: 3e-05
|
| 57 |
+
Epochs: 5
|
| 58 |
+
Optimizer: AdamW
|
| 59 |
+
PEFT: Enabled (LoRA with lora_r=16, lora_alpha=32)
|
| 60 |
+
Quantization: Int4 for efficient deployment
|
| 61 |
+
Scheduler: Linear with warmup
|
| 62 |
+
Gradient Accumulation: 4 steps
|
| 63 |
+
Max Sequence Length: 2048 tokens
|
| 64 |
+
Training Metrics
|
| 65 |
+
The model was monitored using TensorBoard during training. Key training metrics included:
|
| 66 |
+
|
| 67 |
+
Training Loss: 1.74
|
| 68 |
+
Learning Rate: Adjusted per epoch, starting at 3e-05.
|
| 69 |
+
Model Features
|
| 70 |
+
Text Generation: Handles various types of user queries and provides coherent responses.
|
| 71 |
+
Conversational AI: Optimized for dialogue generation.
|
| 72 |
+
Efficient Inference: Supports Int4 quantization for faster inference on limited hardware.
|
| 73 |
+
License
|
| 74 |
+
This model is governed under a custom license. Please refer to QuantumAI License. (llama 3.1 license)
|
| 75 |
+
|