| | --- |
| | license: creativeml-openrail-m |
| | library_name: transformers |
| | tags: |
| | - deep_think |
| | - reasoning |
| | - chain_of_thought |
| | - chain_of_thinking |
| | - prev_2 |
| | - self_reasoning |
| | language: |
| | - en |
| | base_model: |
| | - prithivMLmods/Llama-Thinker-3B-Preview |
| | pipeline_tag: text-generation |
| | --- |
| |  |
| |
|
| | # **Llama-Thinker-3B-Preview2** |
| |
|
| | Llama-Thinker-3B-Preview2 is a pretrained and instruction-tuned generative model designed for multilingual applications. These models are trained using synthetic datasets based on long chains of thought, enabling them to perform complex reasoning tasks effectively. |
| |
|
| | Model Architecture: [ Based on Llama 3.2 ] is an autoregressive language model that uses an optimized transformer architecture. The tuned versions undergo supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. |
| |
|
| | # **Use with transformers** |
| |
|
| | Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function. |
| |
|
| | Make sure to update your transformers installation via `pip install --upgrade transformers`. |
| |
|
| | ```python |
| | import torch |
| | from transformers import pipeline |
| | |
| | model_id = "prithivMLmods/Llama-Thinker-3B-Preview2" |
| | pipe = pipeline( |
| | "text-generation", |
| | model=model_id, |
| | torch_dtype=torch.bfloat16, |
| | device_map="auto", |
| | ) |
| | messages = [ |
| | {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}, |
| | {"role": "user", "content": "Who are you?"}, |
| | ] |
| | outputs = pipe( |
| | messages, |
| | max_new_tokens=256, |
| | ) |
| | print(outputs[0]["generated_text"][-1]) |
| | ``` |
| |
|
| | Note: You can also find detailed recipes on how to use the model locally, with `torch.compile()`, assisted generations, quantised and more at [`huggingface-llama-recipes`](https://github.com/huggingface/huggingface-llama-recipes) |
| |
|
| | # **Use with `llama`** |
| |
|
| | Please, follow the instructions in the [repository](https://github.com/meta-llama/llama) |
| |
|
| | To download Original checkpoints, see the example command below leveraging `huggingface-cli`: |
| |
|
| | ``` |
| | huggingface-cli download prithivMLmods/Llama-Thinker-3B-Preview2 --include "original/*" --local-dir Llama-Thinker-3B-Preview2 |
| | ``` |
| |
|
| | Hereβs a version tailored for the **Llama-Thinker-3B-Preview2-GGUF** model: |
| |
|
| | --- |
| |
|
| | # **How to Run Llama-Thinker-3B-Preview2 on Ollama Locally** |
| |
|
| | This guide demonstrates how to run the **Llama-Thinker-3B-Preview2-GGUF** model locally using Ollama. The model is instruction-tuned for multilingual tasks and complex reasoning, making it highly versatile for a wide range of use cases. By the end, you'll be equipped to run this and other open-source models with ease. |
| |
|
| | --- |
| |
|
| | ## Example 1: How to Run the Llama-Thinker-3B-Preview2 Model |
| |
|
| | The **Llama-Thinker-3B-Preview2** model is a pretrained and instruction-tuned LLM, designed for complex reasoning tasks across multiple languages. In this guide, we'll interact with it locally using Ollama, with support for quantized models. |
| |
|
| | ### Step 1: Download the Model |
| |
|
| | First, download the **Llama-Thinker-3B-Preview2-GGUF** model using the following command: |
| |
|
| | ```bash |
| | ollama run llama-thinker-3b-preview2.gguf |
| | ``` |
| |
|
| | ### Step 2: Model Initialization and Download |
| |
|
| | Once the command is executed, Ollama will initialize and download the necessary model files. You should see output similar to this: |
| |
|
| | ```plaintext |
| | pulling manifest |
| | pulling a12cd3456efg... 100% ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 3.2 GB |
| | pulling 9f87ghijklmn... 100% ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 6.5 KB |
| | verifying sha256 digest |
| | writing manifest |
| | removing any unused layers |
| | success |
| | >>> Send a message (/? for help) |
| | ``` |
| |
|
| | ### Step 3: Interact with the Model |
| |
|
| | Once the model is fully loaded, you can interact with it by sending prompts. For example, let's ask: |
| |
|
| | ```plaintext |
| | >>> How can you assist me today? |
| | ``` |
| |
|
| | A sample response might look like this [may / maynot be identical]: |
| |
|
| | ```plaintext |
| | I am Llama-Thinker-3B-Preview2, an advanced AI language model designed to assist with complex reasoning, multilingual tasks, and general-purpose queries. Here are a few things I can help you with: |
| | |
| | 1. Answering complex questions in multiple languages. |
| | 2. Assisting with creative writing, content generation, and problem-solving. |
| | 3. Providing detailed summaries and explanations. |
| | 4. Translating text across different languages. |
| | 5. Generating ideas for personal or professional use. |
| | 6. Offering insights on technical topics. |
| | |
| | Feel free to ask me anything you'd like assistance with! |
| | ``` |
| |
|
| | ### Step 4: Exit the Program |
| |
|
| | To exit the program, simply type: |
| |
|
| | ```plaintext |
| | /exit |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Example 2: Using Multi-Modal Models (Future Use) |
| |
|
| | In the future, Ollama may support multi-modal models where you can input both text and images for advanced interactions. This section will be updated as new capabilities become available. |
| |
|
| | --- |
| |
|
| | ## Notes on Using Quantized Models |
| |
|
| | Quantized models like **llama-thinker-3b-preview2.gguf** are optimized for efficient performance on local systems with limited resources. Here are some key points to ensure smooth operation: |
| |
|
| | 1. **VRAM/CPU Requirements**: Ensure your system has adequate VRAM or CPU resources to handle model inference. |
| | 2. **Model Format**: Use the `.gguf` model format for compatibility with Ollama. |
| |
|
| | --- |
| |
|
| | # **Conclusion** |
| |
|
| | Running the **Llama-Thinker-3B-Preview2** model locally using Ollama provides a powerful way to leverage open-source LLMs for complex reasoning and multilingual tasks. By following this guide, you can explore other models and expand your use cases as new models become available. |
| |
|
| | --- |