Game of Thrones Q&A Model (PEFT / QLoRA fine-tuned)

🧠 Model Overview

Model name: hash-map/got_model
Base model: google/gemma-2-2b-it
Fine-tuning method: QLoRA (via PEFT)
Task: Contextual Question Answering on Game of Thrones
Summary: A lightweight instruction-tuned question-answering model specialized in the Game of Thrones / A Song of Ice and Fire universe. It generates concise, faithful answers when given relevant context + a question.

Description:
This model was fine-tuned on the hash-map/got_qa_pairs dataset using QLoRA (4-bit quantization + Low-Rank Adaptation) to keep memory usage low while adapting the powerful gemma-2-2b-it model to answer questions about characters, events, houses, lore, battles, and plot points — only when provided with relevant context.

It is not a general-purpose LLM and performs poorly on questions without appropriate context or outside the GoT domain.

🧩 Intended Use

Direct Use

Answering factual questions about Game of Thrones when supplied with relevant book/show text chunks
Building simple RAG-style (Retrieval-Augmented Generation) applications for GoT fans, wikis, quizzes, chatbots, etc.
Educational tools, reading comprehension demos, or lore-exploration apps

Out-of-Scope Use

General-purpose chat or open-domain QA
Questions about real history, other fictional universes, current events, politics, etc.
High-stakes applications (legal, medical, safety-critical decisions)
Generating creative fan-fiction or long-form narrative text (it is optimized for short factual answers)

📥 Context Retrieval Strategy (included in repo)

A simple keyword-based lexical retrieval system is provided to help select relevant context chunks:

import re
import json
from collections import defaultdict, Counter

CHUNKS_FILE = "/kaggle/input/got-dataset/contexts.json"   # list of {text, source, chunk_id}

def tokenize(text):
    return re.findall(r"\b[a-zA-Z]{3,}\b", text.lower())

contexts = []
token_to_ctx = defaultdict(list)

with open(CHUNKS_FILE, "r", encoding="utf-8") as f:
    data = json.load(f)

for idx, item in enumerate(data):
    text = item["text"]
    contexts.append(item)

    for tok in tokenize(text):
        token_to_ctx[tok].append(idx)

print(f"Indexed {len(contexts)} chunks")

def retrieve_2_contexts(question, token_to_ctx, contexts):
    q_tokens = tokenize(question)
    scores = Counter()
    for tok in q_tokens:
        for ctx_id in token_to_ctx.get(tok, []):
            scores[ctx_id] += 1
    if not scores:
        return ""
    top_ids = [cid for cid, _ in scores.most_common(2)]
    return " ".join([contexts[cid]["text"] for cid in top_ids])

This is a basic sparse retrieval method (similar to TF-IDF without IDF). can create faiss for better retreival using these contexts

🧑‍💻 How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Replace with your actual repo
model_name = "hash-map/got_model"

tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-2b-it",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model = PeftModel.from_pretrained(base_model, model_name)

def answer_question(context: str, question: str, max_new_tokens=96) -> str:
    prompt = f"""Context:
{context}

Question:
{question}

Answer:"""

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=False,
            temperature=0.0,
            eos_token_id=tokenizer.eos_token_id
        )
    answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Extract only the answer part after "Answer:"
    return answer.split("Answer:")[-1].strip()

# Example
context = retrieve_2_contexts("Who killed Joffrey Baratheon?", token_to_ctx, contexts)
print(answer_question(context, "Who killed Joffrey Baratheon?"))

⚠️ Bias, Risks & Limitations

Domain limitation: Extremely poor performance on non-GoT topics
Retrieval dependency: Answers are only as good as the retrieved context — lexical method can miss semantically similar but lexically different passages
Hallucinations: Can still invent facts when context is ambiguous, incomplete or contradictory
Toxicity & bias: Inherits biases present in the base Gemma model + any biases in the GoT dataset (e.g. gender roles, violence portrayal typical of the series)
No safety tuning: No built-in refusal or content filtering
Hugging Face Token Necessity for google gemma model need hf access token. To access the gemma repo,you need hf token

Recommendations:

contexts are fine u can try with other retreiver but make sure total token length is <200
Manually verify outputs for important use cases
Consider adding a guardrail/moderation step in applications

📚 Citation

@misc{got-qa-gemma2-2026,
  author       = {Appala Sai Sumanth},
  title        = {Gemma-2-2b-it Fine-tuned for Game of Thrones Question Answering},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/hash-map/got_model}}
}

Framework versions

transformers >= 4.42
peft 0.13.2
torch >= 2.1
bitsandbytes >= 0.43 (for 4-bit inference if desired)

Downloads last month: 1

Model tree for hash-map/got_QA_fine_tuned_model

Base model

google/gemma-2-2b

Finetuned

google/gemma-2-2b-it

Adapter

(466)

this model

hash-map
/

got_QA_fine_tuned_model