aashish1904 commited on
Commit
3fee8cf
·
verified ·
1 Parent(s): 87158ae

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +116 -0
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: apache-2.0
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
+
10
+ ---
11
+
12
+ ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)
13
+
14
+ # QuantFactory/BabyMistral-GGUF
15
+ This is quantized version of [OEvortex/BabyMistral](https://huggingface.co/OEvortex/BabyMistral) created using llama.cpp
16
+
17
+ # Original Model Card
18
+
19
+
20
+ # BabyMistral Model Card
21
+
22
+ ## Model Overview
23
+
24
+ **BabyMistral** is a compact yet powerful language model designed for efficient text generation tasks. Built on the Mistral architecture, this model offers impressive performance despite its relatively small size.
25
+
26
+ ### Key Specifications
27
+
28
+ - **Parameters:** 1.5 billion
29
+ - **Training Data:** 1.5 trillion tokens
30
+ - **Architecture:** Based on Mistral
31
+ - **Training Duration:** 70 days
32
+ - **Hardware:** 4x NVIDIA A100 GPUs
33
+
34
+ ## Model Details
35
+
36
+ ### Architecture
37
+
38
+ BabyMistral utilizes the Mistral AI architecture, which is known for its efficiency and performance. The model scales this architecture to 1.5 billion parameters, striking a balance between capability and computational efficiency.
39
+
40
+ ### Training
41
+ - **Dataset Size:** 1.5 trillion tokens
42
+ - **Training Approach:** Trained from scratch
43
+ - **Hardware:** 4x NVIDIA A100 GPUs
44
+ - **Duration:** 70 days of continuous training
45
+
46
+ ### Capabilities
47
+
48
+ BabyMistral is designed for a wide range of natural language processing tasks, including:
49
+
50
+ - Text completion and generation
51
+ - Creative writing assistance
52
+ - Dialogue systems
53
+ - Question answering
54
+ - Language understanding tasks
55
+
56
+ ## Usage
57
+
58
+ ### Getting Started
59
+
60
+ To use BabyMistral with the Hugging Face Transformers library:
61
+
62
+ ```python
63
+ import torch
64
+ from transformers import AutoModelForCausalLM, AutoTokenizer
65
+
66
+ model = AutoModelForCausalLM.from_pretrained("OEvortex/BabyMistral")
67
+ tokenizer = AutoTokenizer.from_pretrained("OEvortex/BabyMistral")
68
+
69
+ # Define the chat input
70
+ chat = [
71
+ # { "role": "system", "content": "You are BabyMistral" },
72
+ { "role": "user", "content": "Hey there! How are you? 😊" }
73
+ ]
74
+
75
+ inputs = tokenizer.apply_chat_template(
76
+ chat,
77
+ add_generation_prompt=True,
78
+ return_tensors="pt"
79
+ ).to(model.device)
80
+
81
+
82
+ # Generate text
83
+ outputs = model.generate(
84
+ inputs,
85
+ max_new_tokens=256,
86
+ do_sample=True,
87
+ temperature=0.6,
88
+ top_p=0.9,
89
+ eos_token_id=tokenizer.eos_token_id,
90
+
91
+
92
+ )
93
+
94
+ response = outputs[0][inputs.shape[-1]:]
95
+ print(tokenizer.decode(response, skip_special_tokens=True))
96
+
97
+ #I am doing well! How can I assist you today? 😊
98
+
99
+ ```
100
+
101
+ ### Ethical Considerations
102
+
103
+ While BabyMistral is a powerful tool, users should be aware of its limitations and potential biases:
104
+
105
+ - The model may reproduce biases present in its training data
106
+ - It should not be used as a sole source of factual information
107
+ - Generated content should be reviewed for accuracy and appropriateness
108
+
109
+
110
+ ### Limitations
111
+
112
+ - May struggle with very specialized or technical domains
113
+ - Lacks real-time knowledge beyond its training data
114
+ - Potential for generating plausible-sounding but incorrect information
115
+
116
+