hash-map's picture
Update README.md
2830c66 verified
---
license: mit
datasets:
- hash-map/got_qa_pairs
language:
- en
base_model:
- google/gemma-2-2b-it
pipeline_tag: question-answering
library_name: peft
tags:
- got
- q&a
- RAG
- transformers
- peft
- bitsandbytes
---
# Game of Thrones Q&A Model (PEFT / QLoRA fine-tuned)
## 🧠 Model Overview
**Model name:** hash-map/got_model
**Base model:** `google/gemma-2-2b-it`
**Fine-tuning method:** QLoRA (via PEFT)
**Task:** Contextual Question Answering on *Game of Thrones*
**Summary:** A lightweight instruction-tuned question-answering model specialized in the *Game of Thrones* / *A Song of Ice and Fire* universe. It generates concise, faithful answers when given relevant context + a question.
**Description:**
This model was fine-tuned on the `hash-map/got_qa_pairs` dataset using QLoRA (4-bit quantization + Low-Rank Adaptation) to keep memory usage low while adapting the powerful `gemma-2-2b-it` model to answer questions about characters, events, houses, lore, battles, and plot points — **only when provided with relevant context**.
It is **not** a general-purpose LLM and performs poorly on questions without appropriate context or outside the GoT domain.
## 🧩 Intended Use
### Direct Use
- Answering factual questions about *Game of Thrones* when supplied with relevant book/show text chunks
- Building simple RAG-style (Retrieval-Augmented Generation) applications for GoT fans, wikis, quizzes, chatbots, etc.
- Educational tools, reading comprehension demos, or lore-exploration apps
### Out-of-Scope Use
- General-purpose chat or open-domain QA
- Questions about real history, other fictional universes, current events, politics, etc.
- High-stakes applications (legal, medical, safety-critical decisions)
- Generating creative fan-fiction or long-form narrative text (it is optimized for short factual answers)
## 📥 Context Retrieval Strategy (included in repo)
A simple **keyword-based lexical retrieval** system is provided to help select relevant context chunks:
```python
import re
import json
from collections import defaultdict, Counter
CHUNKS_FILE = "/kaggle/input/got-dataset/contexts.json" # list of {text, source, chunk_id}
def tokenize(text):
return re.findall(r"\b[a-zA-Z]{3,}\b", text.lower())
contexts = []
token_to_ctx = defaultdict(list)
with open(CHUNKS_FILE, "r", encoding="utf-8") as f:
data = json.load(f)
for idx, item in enumerate(data):
text = item["text"]
contexts.append(item)
for tok in tokenize(text):
token_to_ctx[tok].append(idx)
print(f"Indexed {len(contexts)} chunks")
def retrieve_2_contexts(question, token_to_ctx, contexts):
q_tokens = tokenize(question)
scores = Counter()
for tok in q_tokens:
for ctx_id in token_to_ctx.get(tok, []):
scores[ctx_id] += 1
if not scores:
return ""
top_ids = [cid for cid, _ in scores.most_common(2)]
return " ".join([contexts[cid]["text"] for cid in top_ids])
```
This is a basic sparse retrieval method (similar to TF-IDF without IDF).
can create faiss for better retreival using these contexts
## 🧑‍💻 How to Use
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Replace with your actual repo
model_name = "hash-map/got_model"
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-2-2b-it",
torch_dtype=torch.bfloat16,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, model_name)
def answer_question(context: str, question: str, max_new_tokens=96) -> str:
prompt = f"""Context:
{context}
Question:
{question}
Answer:"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
temperature=0.0,
eos_token_id=tokenizer.eos_token_id
)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Extract only the answer part after "Answer:"
return answer.split("Answer:")[-1].strip()
# Example
context = retrieve_2_contexts("Who killed Joffrey Baratheon?", token_to_ctx, contexts)
print(answer_question(context, "Who killed Joffrey Baratheon?"))
```
## ⚠️ Bias, Risks & Limitations
- **Domain limitation:** Extremely poor performance on non-GoT topics
- **Retrieval dependency:** Answers are only as good as the retrieved context — lexical method can miss semantically similar but lexically different passages
- **Hallucinations:** Can still invent facts when context is ambiguous, incomplete or contradictory
- **Toxicity & bias:** Inherits biases present in the base Gemma model + any biases in the GoT dataset (e.g. gender roles, violence portrayal typical of the series)
- **No safety tuning:** No built-in refusal or content filtering
- **Hugging Face Token Necessity** for google gemma model need hf access token. To access the gemma repo,you need hf token
**Recommendations:**
- contexts are fine u can try with other retreiver but make sure total token length is <200
- Manually verify outputs for important use cases
- Consider adding a guardrail/moderation step in applications
## 📚 Citation
```bibtex
@misc{got-qa-gemma2-2026,
author = {Appala Sai Sumanth},
title = {Gemma-2-2b-it Fine-tuned for Game of Thrones Question Answering},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/hash-map/got_model}}
}
```
## Framework versions
- `transformers` >= 4.42
- `peft` 0.13.2
- `torch` >= 2.1
- `bitsandbytes` >= 0.43 (for 4-bit inference if desired)
---