--- license: mit datasets: - hash-map/got_qa_pairs language: - en base_model: - google/gemma-2-2b-it pipeline_tag: question-answering library_name: peft tags: - got - q&a - RAG - transformers - peft - bitsandbytes --- # Game of Thrones Q&A Model (PEFT / QLoRA fine-tuned) ## 🧠 Model Overview **Model name:** hash-map/got_model **Base model:** `google/gemma-2-2b-it` **Fine-tuning method:** QLoRA (via PEFT) **Task:** Contextual Question Answering on *Game of Thrones* **Summary:** A lightweight instruction-tuned question-answering model specialized in the *Game of Thrones* / *A Song of Ice and Fire* universe. It generates concise, faithful answers when given relevant context + a question. **Description:** This model was fine-tuned on the `hash-map/got_qa_pairs` dataset using QLoRA (4-bit quantization + Low-Rank Adaptation) to keep memory usage low while adapting the powerful `gemma-2-2b-it` model to answer questions about characters, events, houses, lore, battles, and plot points — **only when provided with relevant context**. It is **not** a general-purpose LLM and performs poorly on questions without appropriate context or outside the GoT domain. ## 🧩 Intended Use ### Direct Use - Answering factual questions about *Game of Thrones* when supplied with relevant book/show text chunks - Building simple RAG-style (Retrieval-Augmented Generation) applications for GoT fans, wikis, quizzes, chatbots, etc. - Educational tools, reading comprehension demos, or lore-exploration apps ### Out-of-Scope Use - General-purpose chat or open-domain QA - Questions about real history, other fictional universes, current events, politics, etc. - High-stakes applications (legal, medical, safety-critical decisions) - Generating creative fan-fiction or long-form narrative text (it is optimized for short factual answers) ## 📥 Context Retrieval Strategy (included in repo) A simple **keyword-based lexical retrieval** system is provided to help select relevant context chunks: ```python import re import json from collections import defaultdict, Counter CHUNKS_FILE = "/kaggle/input/got-dataset/contexts.json" # list of {text, source, chunk_id} def tokenize(text): return re.findall(r"\b[a-zA-Z]{3,}\b", text.lower()) contexts = [] token_to_ctx = defaultdict(list) with open(CHUNKS_FILE, "r", encoding="utf-8") as f: data = json.load(f) for idx, item in enumerate(data): text = item["text"] contexts.append(item) for tok in tokenize(text): token_to_ctx[tok].append(idx) print(f"Indexed {len(contexts)} chunks") def retrieve_2_contexts(question, token_to_ctx, contexts): q_tokens = tokenize(question) scores = Counter() for tok in q_tokens: for ctx_id in token_to_ctx.get(tok, []): scores[ctx_id] += 1 if not scores: return "" top_ids = [cid for cid, _ in scores.most_common(2)] return " ".join([contexts[cid]["text"] for cid in top_ids]) ``` This is a basic sparse retrieval method (similar to TF-IDF without IDF). can create faiss for better retreival using these contexts ## 🧑‍💻 How to Use ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel import torch # Replace with your actual repo model_name = "hash-map/got_model" tokenizer = AutoTokenizer.from_pretrained(model_name) base_model = AutoModelForCausalLM.from_pretrained( "google/gemma-2-2b-it", torch_dtype=torch.bfloat16, device_map="auto" ) model = PeftModel.from_pretrained(base_model, model_name) def answer_question(context: str, question: str, max_new_tokens=96) -> str: prompt = f"""Context: {context} Question: {question} Answer:""" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=max_new_tokens, do_sample=False, temperature=0.0, eos_token_id=tokenizer.eos_token_id ) answer = tokenizer.decode(outputs[0], skip_special_tokens=True) # Extract only the answer part after "Answer:" return answer.split("Answer:")[-1].strip() # Example context = retrieve_2_contexts("Who killed Joffrey Baratheon?", token_to_ctx, contexts) print(answer_question(context, "Who killed Joffrey Baratheon?")) ``` ## ⚠️ Bias, Risks & Limitations - **Domain limitation:** Extremely poor performance on non-GoT topics - **Retrieval dependency:** Answers are only as good as the retrieved context — lexical method can miss semantically similar but lexically different passages - **Hallucinations:** Can still invent facts when context is ambiguous, incomplete or contradictory - **Toxicity & bias:** Inherits biases present in the base Gemma model + any biases in the GoT dataset (e.g. gender roles, violence portrayal typical of the series) - **No safety tuning:** No built-in refusal or content filtering - **Hugging Face Token Necessity** for google gemma model need hf access token. To access the gemma repo,you need hf token **Recommendations:** - contexts are fine u can try with other retreiver but make sure total token length is <200 - Manually verify outputs for important use cases - Consider adding a guardrail/moderation step in applications ## 📚 Citation ```bibtex @misc{got-qa-gemma2-2026, author = {Appala Sai Sumanth}, title = {Gemma-2-2b-it Fine-tuned for Game of Thrones Question Answering}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/hash-map/got_model}} } ``` ## Framework versions - `transformers` >= 4.42 - `peft` 0.13.2 - `torch` >= 2.1 - `bitsandbytes` >= 0.43 (for 4-bit inference if desired) ---