--- library_name: transformers base_model: - Qwen/Qwen2.5-3B-Instruct license: apache-2.0 datasets: - amphora/QwQ-LongCoT-130K - amphora/QwQ-LongCoT-130K-2 - amphora/verfiable-25k - amphora/m-math500 language: - en - zh pipeline_tag: text-generation tags: - Math - Code - Thinker - Reasoning - 3B - QwQ - Mini - text-generation-inference - SFT - trl --- ![8.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/Faj50x1HAODJAIy_R94se.png) # **PocketThinker-QwQ-3B-Instruct** > PocketThinker-QwQ-3B-Instruct is based on the Qwen2.5-3B-Instruct architecture, designed as a lightweight and efficient reasoning assistant. It serves as the pocket-sized version of QwQ-LCoT-7B-Instruct, optimized for fast inference while maintaining strong problem-solving and computational capabilities. This model is fine-tuned for enhanced structured reasoning, minimal token wastage, and high-quality technical responses. ## **Key Improvements** 1. **Optimized for Coding**: Specializes in generating structured, efficient code with minimal redundancy for smooth execution. 2. **Compact yet Powerful**: Maintains strong problem-solving capabilities within a smaller 3B parameter architecture, ensuring accessibility on resource-limited devices. 3. **Advanced Reasoning Capabilities**: Excels in algorithmic problem-solving, mathematical reasoning, and structured technical explanations. 4. **Efficient Memory Utilization**: Reduces computational overhead while maintaining high-quality outputs. 5. **Focused Output Generation**: Avoids unnecessary token generation, ensuring concise and relevant responses. ## **Quickstart with transformers** Here is a code snippet to load the tokenizer and model using `apply_chat_template` for structured input formatting: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "prithivMLmods/PocketThinker-QwQ-3B-Instruct" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Write a Python function to find the Fibonacci sequence." messages = [ {"role": "system", "content": "You are an advanced coding assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=6090 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response) ``` ## **Intended Use** 1. **Code Generation & Optimization**: Supports developers in writing, refining, and optimizing code across multiple programming languages. 2. **Algorithm & Mathematical Problem Solving**: Delivers precise solutions and structured explanations for complex problems. 3. **Technical Documentation & Explanation**: Assists in generating well-structured documentation for libraries, APIs, and coding concepts. 4. **Debugging Assistance**: Helps identify and correct errors in code snippets. 5. **Educational Support**: Simplifies programming topics for students and learners with clear explanations. 6. **Structured Data Processing**: Generates structured outputs like JSON, XML, and tables for data science applications. ## **Limitations** 1. **Hardware Constraints**: Although lighter than larger models, still requires a moderately powerful GPU or TPU for optimal performance. 2. **Potential Bias in Responses**: Outputs may reflect biases present in training data. 3. **Limited Creativity**: May generate variable results in non-technical, creative tasks. 4. **No Real-Time Awareness**: Lacks access to real-world events beyond its training cutoff. 5. **Error Propagation in Long Responses**: Minor mistakes in early outputs may affect overall coherence in lengthy responses. 6. **Prompt Sensitivity**: The effectiveness of responses depends on well-structured prompts.