File size: 2,997 Bytes

6ceb866
 
872eb56
 
 
 
 
 
 
 
 
6ceb866
 
872eb56
6ceb866
872eb56
6ceb866
872eb56
6ceb866
872eb56
 
 
 
 
 
 
 
 
6ceb866
872eb56
 
 
6ceb866
872eb56
6ceb866
872eb56
6ceb866

---
license: mit
tags:
- gpt2
- distilled
- small-llm
- llama3
- alpaca
- reasoning
- chatbot
- conversational
---

# Lazarus — The Best Small LLM

**Lazarus** is a next-generation small LLM based on `gpt2-medium`, distilled from LLaMA 3 3B using the [`vicgalle/alpaca-gpt4`](https://huggingface.co/datasets/vicgalle/alpaca-gpt4) dataset.

Lazarus demonstrates exceptional performance for its size, especially in question understanding and step-by-step reasoning — outperforming many small LLMs. Larger versions are in development.

---

## 🔍 Example Responses

**User:** What is AI?  
**Lazarus:** Artificial Intelligence (AI) is a technology that helps humans solve complex problems and communicate with each other. It can help humans solve complex problems by analyzing and modeling data, and can help them understand complex information.

**User:** What is the Pythagorean Theorem?  
**Lazarus:** The Pythagorean theorem is a theorem used to find the hypotenuse of a triangle.

---

## 🚀 Usage

You can interact with Lazarus using the script below:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

print("CUDA Available:", torch.cuda.is_available())

model_name = "./gpt2-lora-magpie-mistral"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

system_prompt = (
    "Your name is Lazarus. You are an intelligent AI assistant. You help users with whatever they need. "
    "You always think before answering, and explain your reasoning out loud step by step.\n"
)

chat_history = []

def chat():
    print("Chatting with GPT-2 (type 'exit' to quit)\n")

    while True:
        user_input = input("You: ")
        if user_input.lower() == "exit":
            break

        chat_history.append(f"You: {user_input}")
        recent_history = chat_history[-6:]
        full_prompt = system_prompt + "\n".join(recent_history) + "\nAI:"

        inputs = tokenizer(full_prompt, return_tensors="pt", truncation=True).to(device)

        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_length=inputs["input_ids"].shape[1] + 150,
                pad_token_id=tokenizer.eos_token_id,
                do_sample=True,
                top_k=100,
                top_p=0.92,
                temperature=0.7,
                eos_token_id=tokenizer.eos_token_id
            )

        response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
        response = response.strip()

        bad_responses = {"I hope that", "I don't know", "", "I'm excited"}
        if response in bad_responses:
            print("AI: [Regenerating due to low-quality response]")
            continue

        print(f"AI: {response}")
        chat_history.append(f"AI: {response}")

if __name__ == "__main__":
    chat()