Update README.md
Browse files
README.md
CHANGED
|
@@ -23,6 +23,45 @@ FastThink-0.5B-Tiny is a reasoning-focused model based on Qwen2.5. We have relea
|
|
| 23 |
|
| 24 |
**Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
# **Dataset Preparation**
|
| 27 |
|
| 28 |
This script is designed to load, process, and combine multiple datasets into a single, standardized format suitable for training conversational AI models. The script uses the `datasets` library to load and manipulate the datasets, and the `chat_templates` library to standardize the conversation format.
|
|
@@ -54,4 +93,23 @@ combined_dataset = combined_dataset.map(formatting_prompts_func, batched=True)
|
|
| 54 |
|
| 55 |
# Print the first few examples to verify the output
|
| 56 |
print(combined_dataset[:50000])
|
| 57 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
**Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias, and tied word embeddings.
|
| 25 |
|
| 26 |
+
# **Quickstart with Transformer**
|
| 27 |
+
|
| 28 |
+
Here provides a code snippet with `apply_chat_template` to show you how to load the tokenizer and model and how to generate contents.
|
| 29 |
+
|
| 30 |
+
```python
|
| 31 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 32 |
+
|
| 33 |
+
model_name = "prithivMLmods/FastThink-0.5B-Tiny"
|
| 34 |
+
|
| 35 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 36 |
+
model_name,
|
| 37 |
+
torch_dtype="auto",
|
| 38 |
+
device_map="auto"
|
| 39 |
+
)
|
| 40 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 41 |
+
|
| 42 |
+
prompt = "Give me a short introduction to large language model."
|
| 43 |
+
messages = [
|
| 44 |
+
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
|
| 45 |
+
{"role": "user", "content": prompt}
|
| 46 |
+
]
|
| 47 |
+
text = tokenizer.apply_chat_template(
|
| 48 |
+
messages,
|
| 49 |
+
tokenize=False,
|
| 50 |
+
add_generation_prompt=True
|
| 51 |
+
)
|
| 52 |
+
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
| 53 |
+
|
| 54 |
+
generated_ids = model.generate(
|
| 55 |
+
**model_inputs,
|
| 56 |
+
max_new_tokens=512
|
| 57 |
+
)
|
| 58 |
+
generated_ids = [
|
| 59 |
+
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
| 60 |
+
]
|
| 61 |
+
|
| 62 |
+
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
# **Dataset Preparation**
|
| 66 |
|
| 67 |
This script is designed to load, process, and combine multiple datasets into a single, standardized format suitable for training conversational AI models. The script uses the `datasets` library to load and manipulate the datasets, and the `chat_templates` library to standardize the conversation format.
|
|
|
|
| 93 |
|
| 94 |
# Print the first few examples to verify the output
|
| 95 |
print(combined_dataset[:50000])
|
| 96 |
+
```
|
| 97 |
+
# **Intended Use**
|
| 98 |
+
1. **Reasoning Tasks**: FastThink-0.5B-Tiny is optimized for reasoning-focused applications, such as logical problem-solving, decision-making, and analytical workflows.
|
| 99 |
+
2. **Instruction Following**: Ideal for scenarios where precise adherence to instructions is required, including generating structured outputs like JSON or tables.
|
| 100 |
+
3. **Multilingual Support**: Suitable for use in multilingual environments, supporting over 29 languages, making it versatile for global applications.
|
| 101 |
+
4. **Coding and Mathematics**: Highly effective in tasks involving coding, debugging, or solving mathematical problems, leveraging expert domain knowledge.
|
| 102 |
+
5. **Role-play Scenarios**: Can simulate conversational agents or personas for role-playing, enhancing chatbot and virtual assistant implementations.
|
| 103 |
+
6. **Long-form Content Creation**: Designed to generate and manage long-form text (up to 8K tokens) while maintaining context, making it ideal for tasks like report writing or storytelling.
|
| 104 |
+
7. **Understanding and Processing Structured Data**: Efficient at interpreting and working with structured data, such as tables or hierarchical formats.
|
| 105 |
+
8. **Low-Resource Applications**: With a smaller parameter size (0.5B), it is well-suited for applications with limited computational resources or edge deployment.
|
| 106 |
+
|
| 107 |
+
# **Limitations**
|
| 108 |
+
1. **Limited Model Size**: As a 0.5B-parameter model, its reasoning and comprehension capabilities are less advanced compared to larger models, particularly for highly complex tasks.
|
| 109 |
+
2. **Contextual Limitations**: Although it supports a context length of up to 128K tokens, its ability to effectively utilize such a long context may vary, particularly in tasks requiring intricate cross-referencing of earlier inputs.
|
| 110 |
+
3. **Accuracy in Domain-Specific Tasks**: While capable in coding and mathematics, it may struggle with highly specialized or esoteric domain knowledge compared to models fine-tuned specifically for those areas.
|
| 111 |
+
4. **Ambiguity Handling**: May misinterpret vague or poorly structured prompts, leading to less accurate or unintended results.
|
| 112 |
+
5. **Long-Context Tradeoffs**: Generating or processing very long outputs (e.g., close to the 8K token limit) could result in decreased coherence or relevance toward the end.
|
| 113 |
+
6. **Multilingual Performance**: Although it supports 29 languages, its proficiency and fluency may vary across languages, with some underrepresented languages possibly seeing reduced performance.
|
| 114 |
+
7. **Resource-Intensive for Long Contexts**: Using its long-context capabilities (128K tokens) can be computationally demanding, requiring significant memory and processing power.
|
| 115 |
+
8. **Dependence on Fine-Tuning**: For highly specialized tasks or domains, additional fine-tuning may be necessary to achieve optimal performance.
|