Update README.md

2830c66 verified about 24 hours ago

5.75 kB

	---
	license: mit
	datasets:
	- hash-map/got_qa_pairs
	language:
	- en
	base_model:
	- google/gemma-2-2b-it
	pipeline_tag: question-answering
	library_name: peft
	tags:
	- got
	- q&a
	- RAG
	- transformers
	- peft
	- bitsandbytes
	---

	# Game of Thrones Q&A Model (PEFT / QLoRA fine-tuned)

	## 🧠 Model Overview

	Model name: hash-map/got_model
	Base model: `google/gemma-2-2b-it`
	Fine-tuning method: QLoRA (via PEFT)
	Task: Contextual Question Answering on Game of Thrones
	Summary: A lightweight instruction-tuned question-answering model specialized in the Game of Thrones / A Song of Ice and Fire universe. It generates concise, faithful answers when given relevant context + a question.

	Description:
	This model was fine-tuned on the `hash-map/got_qa_pairs` dataset using QLoRA (4-bit quantization + Low-Rank Adaptation) to keep memory usage low while adapting the powerful `gemma-2-2b-it` model to answer questions about characters, events, houses, lore, battles, and plot points — only when provided with relevant context.

	It is not a general-purpose LLM and performs poorly on questions without appropriate context or outside the GoT domain.

	## 🧩 Intended Use

	### Direct Use
	- Answering factual questions about Game of Thrones when supplied with relevant book/show text chunks
	- Building simple RAG-style (Retrieval-Augmented Generation) applications for GoT fans, wikis, quizzes, chatbots, etc.
	- Educational tools, reading comprehension demos, or lore-exploration apps

	### Out-of-Scope Use
	- General-purpose chat or open-domain QA
	- Questions about real history, other fictional universes, current events, politics, etc.
	- High-stakes applications (legal, medical, safety-critical decisions)
	- Generating creative fan-fiction or long-form narrative text (it is optimized for short factual answers)

	## 📥 Context Retrieval Strategy (included in repo)

	A simple keyword-based lexical retrieval system is provided to help select relevant context chunks:

	```python
	import re
	import json
	from collections import defaultdict, Counter

	CHUNKS_FILE = "/kaggle/input/got-dataset/contexts.json" # list of {text, source, chunk_id}

	def tokenize(text):
	return re.findall(r"\b[a-zA-Z]{3,}\b", text.lower())

	contexts = []
	token_to_ctx = defaultdict(list)

	with open(CHUNKS_FILE, "r", encoding="utf-8") as f:
	data = json.load(f)

	for idx, item in enumerate(data):
	text = item["text"]
	contexts.append(item)

	for tok in tokenize(text):
	token_to_ctx[tok].append(idx)

	print(f"Indexed {len(contexts)} chunks")

	def retrieve_2_contexts(question, token_to_ctx, contexts):
	q_tokens = tokenize(question)
	scores = Counter()
	for tok in q_tokens:
	for ctx_id in token_to_ctx.get(tok, []):
	scores[ctx_id] += 1
	if not scores:
	return ""
	top_ids = [cid for cid, _ in scores.most_common(2)]
	return " ".join([contexts[cid]["text"] for cid in top_ids])
	```

	This is a basic sparse retrieval method (similar to TF-IDF without IDF).
	can create faiss for better retreival using these contexts

	## 🧑‍💻 How to Use

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel
	import torch

	# Replace with your actual repo
	model_name = "hash-map/got_model"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	base_model = AutoModelForCausalLM.from_pretrained(
	"google/gemma-2-2b-it",
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)
	model = PeftModel.from_pretrained(base_model, model_name)

	def answer_question(context: str, question: str, max_new_tokens=96) -> str:
	prompt = f"""Context:
	{context}

	Question:
	{question}

	Answer:"""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=max_new_tokens,
	do_sample=False,
	temperature=0.0,
	eos_token_id=tokenizer.eos_token_id
	)
	answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
	# Extract only the answer part after "Answer:"
	return answer.split("Answer:")[-1].strip()

	# Example
	context = retrieve_2_contexts("Who killed Joffrey Baratheon?", token_to_ctx, contexts)
	print(answer_question(context, "Who killed Joffrey Baratheon?"))
	```

	## ⚠️ Bias, Risks & Limitations

	- Domain limitation: Extremely poor performance on non-GoT topics
	- Retrieval dependency: Answers are only as good as the retrieved context — lexical method can miss semantically similar but lexically different passages
	- Hallucinations: Can still invent facts when context is ambiguous, incomplete or contradictory
	- Toxicity & bias: Inherits biases present in the base Gemma model + any biases in the GoT dataset (e.g. gender roles, violence portrayal typical of the series)
	- No safety tuning: No built-in refusal or content filtering
	- Hugging Face Token Necessity for google gemma model need hf access token. To access the gemma repo,you need hf token

	Recommendations:
	- contexts are fine u can try with other retreiver but make sure total token length is <200
	- Manually verify outputs for important use cases
	- Consider adding a guardrail/moderation step in applications

	## 📚 Citation

	```bibtex
	@misc{got-qa-gemma2-2026,
	author = {Appala Sai Sumanth},
	title = {Gemma-2-2b-it Fine-tuned for Game of Thrones Question Answering},
	year = {2026},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/hash-map/got_model}}
	}
	```

	## Framework versions

	- `transformers` >= 4.42
	- `peft` 0.13.2
	- `torch` >= 2.1
	- `bitsandbytes` >= 0.43 (for 4-bit inference if desired)

	---