Spaces:

AndreasVar
/

LitBench-UI

Runtime error

App Files Files Community

Andreas Varvarigos commited on Mar 11, 2025

Commit

34dc5af

verified ·

1 Parent(s): 908351f

Delete src/no_UI

Browse files

Files changed (4) hide show

src/no_UI/README_eval.md +0 -54
src/no_UI/README_finetune.md +0 -40
src/no_UI/eval_noUI.py +0 -564
src/no_UI/finetune_noUI.py +0 -382

src/no_UI/README_eval.md DELETED Viewed

@@ -1,54 +0,0 @@
-# **Evaluation for Literature-Based Tasks (Without User Interface)**
-## **Overview**
-`eval_noUI.py` is a script designed to **evaluate the performance of LLMs** on various literature-related tasks, including **citation sentence generation, link prediction, abstract completion, title generation, paper retrieval, and introduction-to-abstract generation**. The script provides a **batch evaluation pipeline** to assess models trained with **LitBench** datasets or other domain-specific literature datasets.
-It loads a **citation graph dataset** and constructs evaluation prompts for the defined tasks. The script then uses the specified LLM to generate predictions and compares them against ground-truth outputs using **BERTScore and accuracy metrics**.
----
-## **Usage**
-To run `eval_noUI.py`, execute the following command:
-```bash
-CUDA_VISIBLE_DEVICES=0 python3.10 src/no_UI/eval_noUI.py \
-    -config_path=configs/config_noUI.yaml \
-    -model=lora \
-    -lorapath=models/llama_1b_qlora_uncensored_1_adapter_test_graph
-```
-## **Command-Line Arguments**
-- `config_path`: Path to the configuration file for evaluation.
-- `model`: Model type (e.g., lora).
-- `lorapath`: Path to the LLM model checkpoint.
-- `index`: Index of the checkpoint to use for evaluation.
----
-## **Supported Evaluation Tasks**
-The script evaluates model performance across six key literature-based tasks:
-1. Citation Sentence Generation (test_sentence)
-* Generates a citation sentence describing how Paper A cites Paper B in the related work section.
-* Evaluates output coherence using BERTScore.
-2. Citation Link Prediction (test_LP)
-* Determines if Paper A is likely to cite Paper B based on their titles and abstracts.
-* Evaluates binary classification accuracy.
-3. Abstract Completion (test_abs_completion)
-* Completes a partially given abstract using the model’s understanding.
-* Evaluates precision, recall, and F1-score using BERTScore.
-4. Title Generation (test_title_generate)
-* Predicts a paper’s title based on its abstract.
-* Evaluates BERTScore similarity with ground-truth titles.
-5. Citation Recommendation (test_retrival_e)
-* Given a paper and a set of candidate papers, selects the one most likely to be cited.
-* Evaluates retrieval accuracy.
-6. Introduction to Abstract (test_intro_2_abs)
-* Predicts a paper’s abstract based on its introduction section.
-* Evaluates BERTScore similarity.
----
-## Dependencies
-Ensure you have the required Python libraries installed, following the instructions in [README.md](../../README.md)

src/no_UI/README_finetune.md DELETED Viewed

@@ -1,40 +0,0 @@
-# **Fine-Tuning for Literature-Based LLMs (without User Interface)**
-## **Overview**
-`finetune_noUI.py` is a script designed to **fine-tune large language models (LLMs) on literature-based tasks** using **QLoRA**. The script supports **training domain-specific models** for citation reasoning, abstract generation, retrieval, and more. It leverages **LoRA adapters** to enable efficient fine-tuning on consumer-grade GPUs.
-The script reads **a citation graph dataset**, constructs training prompts for multiple tasks, and fine-tunes an LLM using the QLoRA framework. The resulting **LoRA-adapted model** can then be used for inference or further training.
----
-## **Usage**
-To run `finetune_noUI.py`, execute the following command:
-```bash
-python3.10 src/no_UI/finetune_noUI.py configs/config_noUI.yaml --index 1
-```
-## **Command-Line Arguments**
-- config_path: Path to the YAML configuration
-- index: Index specifying GPU/task number (default: 1).
-## **Supported Fine-Tuning Tasks**
-The script fine-tunes LLMs on seven key literature-based tasks, generating instruction-tuned training data:
-1. **Citation Sentence Generation:** Trains the model to generate citation sentences describing how Paper A cites Paper B in the related work section.
-2. **Citation Link Prediction:** Trains the model to predict whether Paper A is likely to cite Paper B based on their titles and abstracts.
-3. **Abstract Completion:** Trains the model to complete an abstract given a partial abstract and a paper title.
-4. **Title Generation:** Trains the model to generate a paper’s title based on its abstract.
-5. **Citation Recommendation:** Trains the model to select the most relevant paper from a set of candidates that Paper A is likely to cite.
-6. **Introduction to Abstract:** Trains the model to generate an abstract based on a paper’s introduction.
----
-## Dependencies
-Ensure you have the required Python libraries installed, following the instructions in [README.md](../../README.md)

src/no_UI/eval_noUI.py DELETED Viewed

@@ -1,564 +0,0 @@
-import argparse
-import torch
-import json
-import random
-import networkx as nx
-import numpy as np
-from tqdm import tqdm
-from peft import PeftModel
-from transformers import AutoTokenizer, pipeline, AutoModelForCausalLM
-from bert_score import score
-from tqdm import tqdm
-import os
-from utils.utils import read_yaml_file
-"""
-Ad-hoc sanity check to see if model outputs something coherent
-Not a robust inference platform!
-"""
-def get_bert_score(candidate, reference):
-    P, R, F1 = score([candidate], [reference],lang="en")
-    return P, R, F1
-def _generate_LP_prompt(data_point: dict):
-    instruction = "Determine if paper A will cite paper B."
-    prompt_input = ""
-    prompt_input = prompt_input + "Title of Paper A: " + (data_point['s_title'] if data_point['s_title'] != None else 'Unknown') + "\n"
-    prompt_input = prompt_input + "Abstract of Paper A: " + (data_point['s_abs'] if data_point['s_abs'] != None else 'Unknown') + "\n"
-    prompt_input = prompt_input + "Title of Paper B: " + (data_point['t_title'] if data_point['t_title'] != None else 'Unknown') + "\n"
-    prompt_input = prompt_input + "Abstract of Paper B: " + (data_point['t_abs'] if data_point['t_abs'] != None else 'Unknown') + "\n"
-    prompt_input = prompt_input + " Give me a direct answer of yes or no."
-    res = template["prompt_input"].format(instruction=instruction, input=prompt_input)
-    return res
-def _generate_retrival_prompt(data_point: dict):
-    instruction = "Please select the paper that is more likely to be cited by paper A from candidate papers."
-    prompt_input = ""
-    prompt_input = prompt_input + "Title of the Paper A: " + data_point['s_title'] + "\n"
-    prompt_input = prompt_input + "Abstract of the Paper A: " + data_point['s_abs'] + "\n"
-    prompt_input = prompt_input + "candidate papers: " + "\n"
-    for i in range(len(data_point['nei_titles'])):
-        prompt_input = prompt_input + str(i) + '. ' + data_point['nei_titles'][i] + "\n"
-    prompt_input = prompt_input + "Give me the title of your selected paper."
-    res = template["prompt_input"].format(instruction=instruction, input=prompt_input)
-    return res, str(data_point['t_title'])
-def _generate_abstrat_2_title_prompt(data_point: dict):
-    instruction = "Please generate the title of paper based on its abstract"
-    prompt_input = ""
-    prompt_input = prompt_input + "Abstract: " + data_point['abs'] + "\n"
-    res = template["prompt_input"].format(instruction=instruction, input=prompt_input)
-    return res
-def _generate_sentence_prompt(data_point):
-    instruction = "Please generate the citation sentence of how Paper A cites paper B in its related work section. \n"
-    prompt_input = ""
-    prompt_input = prompt_input + "Title of Paper A: " + data_point['s_title'] + '\n' if data_point['s_title'] != None else 'Unknown' + "\n"
-    prompt_input = prompt_input + "Abstract of Paper A: " + data_point['s_abs'] + '\n' if data_point['s_abs'] != None else 'Unknown' + "\n"
-    prompt_input = prompt_input + "Title of Paper B: " + data_point['t_title'] + '\n' if data_point['t_title'] != None else 'Unknown' + "\n"
-    prompt_input = prompt_input + "Abstract of Paper B: " + data_point['t_abs'] + '\n' if data_point['t_abs'] != None else 'Unknown' + "\n"
-    res = template["prompt_input"].format(instruction=instruction, input=prompt_input)
-    return res
-def _generate_abstrat_completion_prompt(data_point: dict):
-    instruction = "Please complete the abstract of a paper."
-    split_abs = data_point['abs'][: int(0.1*len(data_point['abs']))]
-    prompt_input = ""
-    prompt_input = prompt_input + "Title: " + data_point['title'] + "\n"
-    prompt_input = prompt_input + "Part of abstract: " + split_abs
-    res = template["prompt_input"].format(instruction=instruction, input=prompt_input)
-    return res
-def get_llm_response(prompt, task):
-    if task == 'sentence':
-        return pipe_sentence(prompt)
-    if task == 'LP':
-        return pipe_LP(prompt)
-    if task == 'abstract':
-        return pipe_abstract(prompt)
-    if task == 'title':
-        return pipe_title(prompt)
-    if task == 'retrieval':
-        return pipe_retrieval(prompt)
-    if task == 'intro':
-        return pipe_intro(prompt)
-def test_sentence():
-    Bert_p_list = []
-    Bert_r_list = []
-    Bert_f_list = []
-    result_dict = {}
-    # pos test
-    for i in tqdm(range(len(test_data))):
-        source, target = test_data[i][0], test_data[i][1]
-        source_title, source_abs = raw_id_2_tile_abs[source]
-        target_title, target_abs = raw_id_2_tile_abs[target]
-        s_nei = list(nx.all_neighbors(raw_graph, source))
-        s_nei_list = list(set(s_nei) - set([source]) - set([target]))[:10]
-        s_nei_titles = [raw_id_2_tile_abs[i][0] for i in s_nei_list]
-        t_nei = list(nx.all_neighbors(raw_graph, target))
-        t_nei_list = list(set(t_nei) - set([source]) - set([target]))[:10]
-        t_nei_titles = [raw_id_2_tile_abs[i][0] for i in t_nei_list]
-        t_nei_sentence = []
-        for i in range(len(t_nei_list)):
-            tmp_sentence = raw_id_pair_2_sentence[(t_nei_list[i], target)] if (t_nei_list[i], target) in raw_id_pair_2_sentence.keys() else ''
-            if len(tmp_sentence) != 0:
-                t_nei_sentence.append(tmp_sentence)
-        citation_sentence = raw_id_pair_2_sentence[(source, target)] if (source, target) in raw_id_pair_2_sentence.keys() else raw_id_pair_2_sentence[(target, source)]
-        datapoint = {'s_title':source_title, 's_abs':source_abs, 't_title':target_title, 't_abs':target_abs, 's_nei':s_nei_titles, 't_nei':t_nei_titles, 't_nei_sentence':t_nei_sentence, 'sentence': citation_sentence}
-        prompt = _generate_sentence_prompt(datapoint)
-        ans = get_llm_response(prompt, 'sentence')[0]['generated_text']
-        res = ans.strip().split(human_instruction[1])[-1]
-        result_dict[(source, target)] = [source_title, source_abs, target_title, target_abs, citation_sentence, res]
-        Bert_p, Bert_r, Bert_f = get_bert_score(res, citation_sentence)
-        print("Answer is:", ans)
-        print("Stripped result is:", res)
-        print("Citation sentence:", citation_sentence)
-        Bert_p_list.append(Bert_p.item())
-        Bert_r_list.append(Bert_r.item())
-        Bert_f_list.append(Bert_f.item())
-        print([len(Bert_p_list), np.mean(Bert_p_list), np.mean(Bert_r_list), np.mean(Bert_f_list)])
-    return np.mean(Bert_p_list), np.mean(Bert_r_list), np.mean(Bert_f_list)
-def test_LP():
-    result_list = []
-    # pos test
-    for i in tqdm(range(len(test_data))):
-        source, target = test_data[i][0], test_data[i][1]
-        source_title, source_abs = raw_id_2_tile_abs[source]
-        target_title, target_abs = raw_id_2_tile_abs[target]
-        s_nei = list(nx.all_neighbors(raw_graph, source))
-        s_nei_list = list(set(s_nei) - set([source]) - set([target]))[:5]
-        s_nei_titles = [raw_id_2_tile_abs[i][0] for i in s_nei_list]
-        t_nei = list(nx.all_neighbors(raw_graph, target))
-        t_nei_list = list(set(t_nei) - set([source]) - set([target]))[:5]
-        t_nei_titles = [raw_id_2_tile_abs[i][0] for i in t_nei_list]
-        datapoint = {'s_title':source_title, 's_abs':source_abs, 't_title':target_title, 't_abs':target_abs, 's_nei':s_nei_titles, 't_nei':t_nei_titles, 'label':'yes'}
-        prompt = _generate_LP_prompt(datapoint)
-        ans = get_llm_response(prompt, 'LP')[0]['generated_text']
-        res = ans.strip().split(human_instruction[1])[-1]
-        print("Answer is:", res)
-        if 'yes' in res[:4] or 'Yes' in res[:4]:
-            result_list.append(1)
-        else:
-            result_list.append(0)
-        print("Current value:", np.mean(result_list))
-     # neg test
-    for i in tqdm(range(len(test_data))):
-        source, target = test_data[i][0], random.sample(list(graph_data.nodes()), 1)[0]
-        source_title, source_abs = raw_id_2_tile_abs[source]
-        target_title, target_abs = raw_id_2_tile_abs[target]
-        s_nei = list(nx.all_neighbors(raw_graph, source))
-        s_nei_list = list(set(s_nei) - set([source]) - set([target]))[:5]
-        s_nei_titles = [raw_id_2_tile_abs[i][0] for i in s_nei_list]
-        try:
-            t_nei = list(nx.all_neighbors(raw_graph, target))
-        except:
-            t_nei = []
-        t_nei_list = list(set(t_nei) - set([source]) - set([target]))[:5]
-        t_nei_titles = [raw_id_2_tile_abs[i][0] for i in t_nei_list]
-        datapoint = {'s_title':source_title, 's_abs':source_abs, 't_title':target_title, 't_abs':target_abs, 's_nei':s_nei_titles, 't_nei':t_nei_titles, 'label':'no'}
-        prompt = _generate_LP_prompt(datapoint)
-        ans = get_llm_response(prompt, 'LP')[0]['generated_text']
-        res = ans.strip().split(human_instruction[1])[-1]
-        print("Answer is:", res)
-        if 'No' in res[:4] or 'no' in res[:4]:
-            result_list.append(1)
-        else:
-            result_list.append(0)
-        print("Current value:", np.mean(result_list))
-    return np.mean(result_list)
-def test_title_generate():
-    result_dict = {}
-    Bert_p_list = []
-    Bert_r_list = []
-    Bert_f_list = []
-    # pos test
-    for i in tqdm(range(len(test_data))):
-        source, target = test_data[i][0], test_data[i][1]
-        title, abstract = raw_id_2_tile_abs[source]
-        if title == None or abstract == None:
-            continue
-        retrieval_nei = list(nx.all_neighbors(raw_graph, source))
-        retrieval_nei_list = list(set(retrieval_nei) - set([source]) - set([target]))[:5]
-        retrieval_nei_titles = [raw_id_2_tile_abs[i][0] for i in retrieval_nei_list]
-        datapoint = {'title':title, 'abs':abstract, 'retrieval_nei_titles':retrieval_nei_titles}
-        prompt = _generate_abstrat_2_title_prompt(datapoint)
-        ans = get_llm_response(prompt, 'title')[0]['generated_text']
-        res = ans.strip().split(human_instruction[1])[-1]
-        result_dict[source] = [title, abstract, res]
-        print(ans)
-        print(res)
-        print(title)
-        Bert_p, Bert_r, Bert_f = get_bert_score(res, title)
-        Bert_p_list.append(Bert_p.item())
-        Bert_r_list.append(Bert_r.item())
-        Bert_f_list.append(Bert_f.item())
-        print([len(Bert_p_list), np.mean(Bert_p_list), np.mean(Bert_r_list), np.mean(Bert_f_list)])
-    return np.mean(Bert_p_list), np.mean(Bert_r_list), np.mean(Bert_f_list)
-def test_abs_completion():
-    result_dict = {}
-    Bert_p_list = []
-    Bert_r_list = []
-    Bert_f_list = []
-    # pos test
-    for i in tqdm(range(len(test_data))):
-        source, target = test_data[i][0], test_data[i][1]
-        title, abstract = raw_id_2_tile_abs[source]
-        if title == None or abstract == None:
-            continue
-        retrieval_nei = list(nx.all_neighbors(raw_graph, source)) #node_id_2_retrieval_papers[source]
-        retrieval_nei_list = list(set(retrieval_nei) - set([source]) - set([target]))[:5]
-        retrieval_nei_abs = [raw_id_2_tile_abs[i][1] for i in retrieval_nei_list]
-        datapoint = {'title':title, 'abs':abstract, 'nei_abs':retrieval_nei_abs}
-        prompt = _generate_abstrat_completion_prompt(datapoint)
-        ans = get_llm_response(prompt, 'abstract')[0]['generated_text']
-        res = ans.strip().split(human_instruction[1])[-1]
-        result_dict[source] = [title, abstract, res]
-        print(ans)
-        print(res)
-        print(abstract)
-        Bert_p, Bert_r, Bert_f = get_bert_score(res, abstract)
-        Bert_p_list.append(Bert_p.item())
-        Bert_r_list.append(Bert_r.item())
-        Bert_f_list.append(Bert_f.item())
-        print([len(Bert_p_list), np.mean(Bert_p_list), np.mean(Bert_r_list), np.mean(Bert_f_list)])
-    return np.mean(Bert_p_list), np.mean(Bert_r_list), np.mean(Bert_f_list)
-def test_retrival_e():
-    result_list = []
-    # pos test
-    for i in tqdm(range(len(test_data))):
-        source, target = test_data[i][0], test_data[i][1]
-        source_title, source_abs = raw_id_2_tile_abs[source]
-        target_title, _ = raw_id_2_tile_abs[target]
-        neighbors = list(nx.all_neighbors(raw_graph, source))
-        sample_node_list = list(set(raw_graph.nodes()) - set(neighbors) - set([source]) - set([target]))
-        sampled_neg_nodes = random.sample(sample_node_list, 5) + [target]
-        random.shuffle(sampled_neg_nodes)
-        retrieval_nei = list(nx.all_neighbors(raw_graph, source)) #node_id_2_retrieval_papers[source] # neighbors
-        retrieval_nei_list = list(set(retrieval_nei) - set([source]) - set([target]))[:3]
-        retrieval_nei_titles = [raw_id_2_tile_abs[i][0] for i in retrieval_nei_list]
-        datapoint = {'s_title':source_title, 's_abs':source_abs, 't_title':target_title, 'nei_titles':[raw_id_2_tile_abs[node][0] for node in sampled_neg_nodes], 'retrieval_nei_title':retrieval_nei_titles}
-        prompt, _ = _generate_retrival_prompt(datapoint)
-        ans = get_llm_response(prompt, 'retrieval')[0]['generated_text']
-        res = ans.strip().split(human_instruction[1])[-1].lower()
-        target_title = target_title.lower()
-        print(ans)
-        print("###GT: " + target_title)
-        print(res)
-        if target_title in res or res in target_title:
-            result_list.append(1)
-        else:
-            result_list.append(0)
-        print([sum(result_list), len(result_list)])
-    print([sum(result_list), len(result_list)])
-    return np.mean(result_list)
-def _generate_intro_2_abstract_prompt(data_point: dict, context_window):
-    instruction = "Please generate the abstract of paper based on its introduction section."
-    prompt_input = ""
-    prompt_input = prompt_input + "Introduction: " + data_point['intro'] + "\n"
-    # Reduce it to make it fit
-    prompt_input = prompt_input[:int(context_window*2)]
-    res = template["prompt_input"].format(instruction=instruction, input=prompt_input)
-    return res
-def test_intro_2_abs():
-    result_dict = {}
-    Bert_p_list = []
-    Bert_r_list = []
-    Bert_f_list = []
-    # pos test
-    for i in tqdm(range(len(test_data))):
-        source, target = test_data[i][0], test_data[i][1]
-        if source not in raw_id_2_intro:
-            source = target
-        if source not in raw_id_2_intro:
-            continue
-        title, abstract = raw_id_2_tile_abs[source]
-        intro = raw_id_2_intro[source]
-        datapoint = {'abs':abstract, 'intro':intro}
-        prompt = _generate_intro_2_abstract_prompt(datapoint, tokenizer.model_max_length)
-        ans = get_llm_response(prompt, 'intro')[0]['generated_text']
-        res = ans.strip().split(human_instruction[1]+'\n')[-1]
-        result_dict[source] = [title, abstract, res]
-        print(ans)
-        print(res)
-        print(abstract)
-        Bert_p, Bert_r, Bert_f = get_bert_score(res, abstract)
-        Bert_p_list.append(Bert_p.item())
-        Bert_r_list.append(Bert_r.item())
-        Bert_f_list.append(Bert_f.item())
-        print([len(Bert_p_list), np.mean(Bert_p_list), np.mean(Bert_r_list), np.mean(Bert_f_list)])
-    return np.mean(Bert_p_list), np.mean(Bert_r_list), np.mean(Bert_f_list)
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument("-config_path", help="Path to the config YAML file")
-    parser.add_argument("-model", help="Path to the config YAML file")
-    parser.add_argument("-lorapath", help="Path to the config YAML file")
-    parser.add_argument("-prompt_num", help="Path to the config YAML file", default = 1)
-    args = parser.parse_args()
-    config = read_yaml_file(args.config_path)
-    random.seed(42)
-    print("Load model")
-    model_path = config["eval"]["base_model"]
-    tokenizer = AutoTokenizer.from_pretrained(model_path)
-    base_model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, load_in_8bit=True)
-    if base_model.device.type != 'cuda':
-        base_model.to('cuda')
-    tokenizer.model_max_length = 2048
-    tokenizer.pad_token = tokenizer.eos_token
-    adapter_save_path = args.lorapath
-    model = PeftModel.from_pretrained(base_model, adapter_save_path)
-    model = model.merge_and_unload()
-    pipe_LP = pipeline(
-        "text-generation",
-        model=model,
-        tokenizer=tokenizer,
-        max_new_tokens=2,
-        temperature=0.7,
-        top_p=0.95,
-        repetition_penalty=1.15
-    )
-    pipe_title = pipeline(
-        "text-generation",
-        model=model,
-        tokenizer=tokenizer,
-        max_new_tokens=100,
-        temperature=0.7,
-        top_p=0.95,
-        repetition_penalty=1.15
-    )
-    pipe_abstract = pipeline(
-        "text-generation",
-        model=model,
-        tokenizer=tokenizer,
-        max_new_tokens=256,
-        temperature=0.7,
-        top_p=0.95,
-        repetition_penalty=1.15
-    )
-    pipe_intro = pipeline(
-        "text-generation",
-        model=model,
-        tokenizer=tokenizer,
-        max_new_tokens=256,
-        temperature=0.7,
-        top_p=0.95,
-        repetition_penalty=1.15
-    )
-    pipe_sentence = pipeline(
-        "text-generation",
-        model=model,
-        tokenizer=tokenizer,
-        max_new_tokens=64,
-        temperature=0.7,
-        top_p=0.95,
-        repetition_penalty=1.15
-    )
-    pipe_retrieval = pipeline(
-        "text-generation",
-        model=model,
-        tokenizer=tokenizer,
-        max_new_tokens=64,
-        temperature=0.7,
-        top_p=0.95,
-        repetition_penalty=1.15
-    )
-    graph_path = config["eval"]["graph_path"]
-    graph_data = nx.read_gexf(graph_path, node_type=None, relabel=False, version='1.2draft')
-    raw_graph = graph_data
-    test_set_size = 50
-    all_test_nodes = set(list(graph_data.nodes())[:test_set_size])
-    all_train_nodes = set(list(graph_data.nodes())[test_set_size:])
-    raw_id_2_tile_abs = dict()
-    for paper_id in list(graph_data.nodes()):
-        title = graph_data.nodes()[paper_id]['title']
-        abstract = graph_data.nodes()[paper_id]['abstract']
-        raw_id_2_tile_abs[paper_id] = [title, abstract]
-    raw_id_pair_2_sentence = dict()
-    for edge in list(graph_data.edges()):
-        sentence = graph_data.edges()[edge].get('sentence', '')
-        raw_id_pair_2_sentence[edge] = sentence
-    raw_id_2_intro = dict()
-    for paper_id in list(graph_data.nodes())[test_set_size:]:
-        if graph_data.nodes[paper_id]['introduction'] != '':
-            intro = graph_data.nodes[paper_id]['introduction']
-            raw_id_2_intro[paper_id] = intro
-    test_data = []
-    edge_list = []
-    for edge in list(raw_graph.edges()):
-        src, tar = edge
-        if src not in all_test_nodes and tar not in all_test_nodes:
-            edge_list.append(edge)
-        else:
-            test_data.append(edge)
-    with open('configs/alpaca.json') as fp:
-        template = json.load(fp)
-    human_instruction = ['### Input:', '### Response:']
-    LP_score = test_LP()
-    retrieval_score = test_retrival_e()
-    title_p, title_r, title_f = test_title_generate()
-    sentence_p, sentence_r, sentence_f = test_sentence()
-    abstract_p, abstract_r, abstract_f = test_abs_completion()
-    intro_p, intro_r, intro_f = test_intro_2_abs()
-    print("Retrieval Score:", retrieval_score)
-    print("LP Score:", LP_score)
-    print("Title:", [title_p, title_r, title_f])
-    print("Sentence:", [sentence_p, sentence_r, sentence_f])
-    print("Abstract:", [abstract_p, abstract_r, abstract_f])
-    print("Intro:", [intro_p, intro_r, intro_f])
-    results = {
-        "LP_score": LP_score,
-        "retrieval_score": retrieval_score,
-        "title": {
-            "precision": title_p,
-            "recall": title_r,
-            "f1": title_f
-        },
-        "sentence": {
-            "precision": sentence_p,
-            "recall": sentence_r,
-            "f1": sentence_f
-        },
-        "abstract": {
-            "precision": abstract_p,
-            "recall": abstract_r,
-            "f1": abstract_f
-        },
-        "intro": {
-            "precision": intro_p,
-            "recall": intro_r,
-            "f1": intro_f
-        },
-    }
-    graph_name = graph_path.split('/')[-1].split('.')[0]
-    name_save = config["eval"]["model_name"]
-    try:
-        os.mkdir("eval")
-    except:
-        pass
-    with open(f"eval/{name_save}_{graph_name}_results.json", "w") as f:
-        json.dump(results, f)

src/no_UI/finetune_noUI.py DELETED Viewed

@@ -1,382 +0,0 @@
-import json
-import torch
-import random
-import transformers
-import networkx as nx
-from tqdm import tqdm
-from peft import (LoraConfig, get_peft_model,
-                  prepare_model_for_kbit_training)
-from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
-import argparse
-import wandb
-from utils.utils import read_yaml_file
-class QloraTrainer_CS:
-    def __init__(self, config: dict, index, use_predefined_graph=False):
-        self.config = config
-        self.tokenizer = None
-        self.base_model = None
-        self.adapter_model = None
-        self.merged_model = None
-        self.index = index
-        self.transformer_trainer = None
-        self.test_data = None
-        self.use_predefined_graph = use_predefined_graph
-        template_file_path = 'configs/alpaca.json'
-        with open(template_file_path) as fp:
-            self.template = json.load(fp)
-    def load_base_model(self):
-        model_id = self.config['training']['base_model']
-        print(model_id)
-        bnb_config = BitsAndBytesConfig(
-            load_in_8bit=True,
-            bnb_8bit_use_double_quant=True,
-            bnb_8bit_quant_type="nf8",
-            bnb_8bit_compute_dtype=torch.bfloat16
-        )
-        print('load llama 3')
-        tokenizer = AutoTokenizer.from_pretrained(model_id)
-        tokenizer.model_max_length = self.config['training']['tokenizer']["max_length"]
-        if not tokenizer.pad_token:
-            tokenizer.pad_token = tokenizer.eos_token
-        model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, torch_dtype=torch.bfloat16)
-        if model.device.type != 'cuda':
-            model.to('cuda')
-        model.gradient_checkpointing_enable()
-        model = prepare_model_for_kbit_training(model)
-        self.tokenizer = tokenizer
-        self.base_model = model
-    def train(self):
-        # Set up lora config or load pre-trained adapter
-        config = LoraConfig(
-            r=self.config['training']['qlora']['rank'],
-            lora_alpha=self.config['training']['qlora']['lora_alpha'],
-            target_modules=self.config['training']['qlora']['target_modules'],
-            lora_dropout=self.config['training']['qlora']['lora_dropout'],
-            bias="none",
-            task_type="CAUSAL_LM",
-        )
-        model = get_peft_model(self.base_model, config)
-        self._print_trainable_parameters(model)
-        print("Start data preprocessing")
-        train_data = self._process_data_instruction()
-        print('Length of dataset: ', len(train_data))
-        print("Start training")
-        self.transformer_trainer = transformers.Trainer(
-            model=model,
-            train_dataset=train_data,
-            args=transformers.TrainingArguments(
-                per_device_train_batch_size=self.config['training']['trainer_args']["per_device_train_batch_size"],
-                gradient_accumulation_steps=int(self.index),
-                warmup_steps=self.config['training']['trainer_args']["warmup_steps"],
-                num_train_epochs=self.config['training']['trainer_args']["num_train_epochs"],
-                learning_rate=self.config['training']['trainer_args']["learning_rate"],
-                lr_scheduler_type=self.config['training']['trainer_args']["lr_scheduler_type"],
-                fp16=self.config['training']['trainer_args']["fp16"],
-                logging_steps=self.config['training']['trainer_args']["logging_steps"],
-                output_dir=self.config['training']['trainer_args']["trainer_output_dir"],
-                report_to="wandb",
-                save_steps=self.config['training']['trainer_args']["save_steps"],
-            ),
-            data_collator=transformers.DataCollatorForLanguageModeling(self.tokenizer, mlm=False),
-        )
-        model.config.use_cache = False
-        self.transformer_trainer.train()
-        model_save_path = f"{self.config['training']['model_saving']['model_output_dir']}/{self.config['training']['model_saving']['model_name']}_{str(self.index)}_adapter_test_graph"
-        self.transformer_trainer.save_model(model_save_path)
-        self.adapter_model = model
-        print(f"Training complete, adapter model saved in {model_save_path}")
-    def _print_trainable_parameters(self, model):
-        """
-        Prints the number of trainable parameters in the model.
-        """
-        trainable_params = 0
-        all_param = 0
-        for _, param in model.named_parameters():
-            all_param += param.numel()
-            if param.requires_grad:
-                trainable_params += param.numel()
-        print(
-            f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
-        )
-    def _process_data_instruction(self):
-        context_window = self.tokenizer.model_max_length
-        graph_data = nx.read_gexf(self.config["training"]["graph_path"], node_type=None, relabel=False, version='1.2draft')
-        raw_graph = graph_data
-        test_set_size = len(graph_data.nodes()) // 10
-        all_test_nodes = set(list(graph_data.nodes())[:test_set_size])
-        all_train_nodes = set(list(graph_data.nodes())[test_set_size:])
-        raw_id_2_title_abs = dict()
-        for paper_id in list(graph_data.nodes())[test_set_size:]:
-            title = graph_data.nodes()[paper_id]['title']
-            abstract = graph_data.nodes()[paper_id]['abstract']
-            raw_id_2_title_abs[paper_id] = [title, abstract]
-        raw_id_2_title_abs_test = dict()
-        for paper_id in list(graph_data.nodes()):
-            title = graph_data.nodes()[paper_id]['title']
-            abstract = graph_data.nodes()[paper_id]['abstract']
-            raw_id_2_title_abs_test[paper_id] = [title, abstract]
-        raw_id_2_intro = dict()
-        for paper_id in list(graph_data.nodes())[test_set_size:]:
-            if graph_data.nodes[paper_id]['introduction'] != '':
-                intro = graph_data.nodes[paper_id]['introduction']
-                raw_id_2_intro[paper_id] = intro
-        raw_id_pair_2_sentence = dict()
-        for edge in list(graph_data.edges()):
-            sentence = graph_data.edges()[edge]['sentence']
-            raw_id_pair_2_sentence[edge] = sentence
-        test_data = []
-        edge_list = []
-        for edge in list(raw_graph.edges()):
-            src, tar = edge
-            if src not in all_test_nodes and tar not in all_test_nodes:
-                edge_list.append(edge)
-            else:
-                test_data.append(edge)
-        train_num = int(len(edge_list))
-        data_LP = []
-        data_abstract_2_title = []
-        data_paper_retrieval = []
-        data_citation_sentence = []
-        data_abs_completion = []
-        data_intro_2_abs = []
-        for sample in tqdm(random.sample(edge_list, train_num)):
-            source, target = sample[0], sample[1]
-            source_title, source_abs = raw_id_2_title_abs[source]
-            target_title, target_abs = raw_id_2_title_abs[target]
-            # LP prompt
-            rand_ind = random.choice(list(raw_id_2_title_abs.keys()))
-            neg_title, neg_abs = raw_id_2_title_abs[rand_ind]
-            data_LP.append({'s_title':source_title, 's_abs':source_abs, 't_title':target_title, 't_abs':target_abs, 'label':'yes'})
-            data_LP.append({'s_title':source_title, 's_abs':source_abs, 't_title':neg_title, 't_abs':neg_abs, 'label':'no'})
-        for sample in tqdm(random.sample(edge_list, train_num)):
-            source, target = sample[0], sample[1]
-            source_title, source_abs = raw_id_2_title_abs[source]
-            target_title, target_abs = raw_id_2_title_abs[target]
-            # abs_2_title prompt
-            data_abstract_2_title.append({'title':source_title, 'abs':source_abs})
-            data_abstract_2_title.append({'title':target_title, 'abs':target_abs})
-        for sample in tqdm(random.sample(edge_list, train_num)):
-            source, target = sample[0], sample[1]
-            source_title, source_abs = raw_id_2_title_abs[source]
-            target_title, target_abs = raw_id_2_title_abs[target]
-            # paper_retrieval prompt
-            neighbors = list(nx.all_neighbors(raw_graph, source))
-            sample_node_list = list(all_train_nodes - set(neighbors) - set([source]) - set([target]))
-            sampled_neg_nodes = random.sample(sample_node_list, 5) + [target]
-            random.shuffle(sampled_neg_nodes)
-            data_paper_retrieval.append({'title':source_title, 'abs':source_abs, 'sample_title': [raw_id_2_title_abs[node][0] for node in sampled_neg_nodes], 'right_title':target_title})
-        for sample in tqdm(random.sample(edge_list, train_num)):
-            source, target = sample[0], sample[1]
-            source_title, source_abs = raw_id_2_title_abs[source]
-            target_title, target_abs = raw_id_2_title_abs[target]
-            # citation_sentence prompt
-            citation_sentence = raw_id_pair_2_sentence[(source, target)] if (source, target) in raw_id_pair_2_sentence.keys() else raw_id_pair_2_sentence[(target, source)]
-            data_citation_sentence.append({'s_title':source_title, 's_abs':source_abs, 't_title':target_title, 't_abs':target_abs, 'sentence': citation_sentence})
-        for sample in tqdm(random.sample(edge_list, train_num)):
-            source, target = sample[0], sample[1]
-            source_title, source_abs = raw_id_2_title_abs[source]
-            target_title, target_abs = raw_id_2_title_abs[target]
-            # abs_complete prompt
-            data_abs_completion.append({'title':source_title, 'abs':source_abs})
-            data_abs_completion.append({'title':target_title, 'abs':target_abs})
-        for sample in tqdm(random.sample(edge_list, train_num)):
-            source, target = sample[0], sample[1]
-            if source in raw_id_2_intro:
-                source_intro = raw_id_2_intro[source]
-                _, source_abs = raw_id_2_title_abs[source]
-                data_intro_2_abs.append({'intro':source_intro, 'abs':source_abs})
-            if target in raw_id_2_intro:
-                target_intro = raw_id_2_intro[target]
-                _, target_abs = raw_id_2_title_abs[target]
-                data_intro_2_abs.append({'intro':target_intro, 'abs':target_abs})
-        data_prompt = []
-        data_prompt += [self._generate_paper_retrieval_prompt(data_point) for data_point in data_paper_retrieval]
-        data_prompt += [self._generate_LP_prompt(data_point) for data_point in data_LP]
-        data_prompt += [self._generate_abstract_2_title_prompt(data_point) for data_point in data_abstract_2_title]
-        data_prompt += [self._generate_citation_sentence_prompt(data_point) for data_point in data_citation_sentence]
-        data_prompt += [self._generate_abstract_completion_prompt(data_point) for data_point in data_abs_completion]
-        data_prompt += [self._generate_intro_2_abstract_prompt(data_point, context_window) for data_point in data_intro_2_abs]
-        print("Total prompts:", len(data_prompt))
-        random.shuffle(data_prompt)
-        if self.tokenizer.chat_template is None:
-            data_tokenized = [self.tokenizer(sample,  max_length=context_window, truncation=True) for sample in tqdm(data_prompt)]
-        else:
-            data_tokenized = [self.tokenizer.apply_chat_template(sample,  max_length=context_window, truncation=True, tokenize=False) for sample in tqdm(data_prompt)]
-        return data_tokenized
-    def _generate_LP_prompt(self, data_point: dict):
-        instruction = "Determine if paper A will cite paper B."
-        prompt_input = ""
-        prompt_input = prompt_input + "Title of Paper A: " + (data_point['s_title'] if data_point['s_title'] != None else 'Unknown') + "\n"
-        prompt_input = prompt_input + "Abstract of Paper A: " + (data_point['s_abs'] if data_point['s_abs'] != None else 'Unknown') + "\n"
-        prompt_input = prompt_input + "Title of Paper B: " + (data_point['t_title'] if data_point['t_title'] != None else 'Unknown') + "\n"
-        prompt_input = prompt_input + "Abstract of Paper B: " + (data_point['t_abs'] if data_point['t_abs'] != None else 'Unknown') + "\n"
-        if self.tokenizer.chat_template is None:
-            res = self.template["prompt_input"].format(instruction=instruction, input=prompt_input)
-            res = f"{res}{data_point['label']}"
-        else:
-            res = [
-                {"role": "user", "content": self.template["prompt_input"].format(instruction=instruction, input=prompt_input)},
-                {"role": "assistant", "content": data_point['label']}
-            ]
-        return res
-    def _generate_abstract_2_title_prompt(self, data_point: dict):
-        instruction = "Please generate the title of paper based on its abstract."
-        prompt_input = ""
-        prompt_input = prompt_input + "Abstract: " + data_point['abs'] + "\n"
-        if self.tokenizer.chat_template is None:
-            res = self.template["prompt_input"].format(instruction=instruction, input=prompt_input)
-            res = f"{res}{data_point['title']}"
-        else:
-            res = [
-                {"role": "user", "content": self.template["prompt_input"].format(instruction=instruction, input=prompt_input)},
-                {"role": "assistant", "content": data_point['title']}
-            ]
-        return res
-    def _generate_paper_retrieval_prompt(self, data_point: dict):
-        instruction = "Please select the paper that is more likely to be cited by paper A from candidate papers."
-        prompt_input = ""
-        prompt_input = prompt_input + "Title of the Paper A: " + data_point['title'] + "\n"
-        prompt_input = prompt_input + "Abstract of the Paper A: " + data_point['abs'] + "\n"
-        prompt_input = prompt_input + "candidate papers: " + "\n"
-        for i in range(len(data_point['sample_title'])):
-            prompt_input = prompt_input + str(i) + '. ' + data_point['sample_title'][i] + "\n"
-        if self.tokenizer.chat_template is None:
-            res = self.template["prompt_input"].format(instruction=instruction, input=prompt_input)
-            res = f"{res}{data_point['right_title']}"
-        else:
-            res = [
-                {"role": "user", "content": self.template["prompt_input"].format(instruction=instruction, input=prompt_input)},
-                {"role": "assistant", "content": data_point['right_title']}
-            ]
-        return res
-    def _generate_citation_sentence_prompt(self, data_point: dict):
-        instruction = "Please generate the citation sentence of how Paper A cites paper B in its related work section."
-        prompt_input = ""
-        prompt_input = prompt_input + "Title of Paper A: " + (data_point['s_title'] if data_point['s_title'] != None else 'Unknown') + "\n"
-        prompt_input = prompt_input + "Abstract of Paper A: " + (data_point['s_abs'] if data_point['s_abs'] != None else 'Unknown') + "\n"
-        prompt_input = prompt_input + "Title of Paper B: " + (data_point['t_title'] if data_point['t_title'] != None else 'Unknown') + "\n"
-        prompt_input = prompt_input + "Abstract of Paper B: " + (data_point['t_abs'] if data_point['t_abs'] != None else 'Unknown') + "\n"
-        if self.tokenizer.chat_template is None:
-            res = self.template["prompt_input"].format(instruction=instruction, input=prompt_input)
-            res = f"{res}{data_point['sentence']}"
-        else:
-            res = [
-                {"role": "user", "content": self.template["prompt_input"].format(instruction=instruction, input=prompt_input)},
-                {"role": "assistant", "content": data_point['sentence']}
-            ]
-        return res
-    def _generate_abstract_completion_prompt(self, data_point: dict):
-        instruction = "Please complete the abstract of a paper."
-        prompt_input = ""
-        prompt_input = prompt_input + "Title: " + data_point['title'] if data_point['title'] != None else 'Unknown' + "\n"
-        split_abs = data_point['abs'][: int(0.3*len(data_point['abs']))]
-        prompt_input = prompt_input + "Part of abstract: " + split_abs + "\n"
-        if self.tokenizer.chat_template is None:
-            res = self.template["prompt_input"].format(instruction=instruction, input=prompt_input)
-            res = f"{res}{data_point['abs']}"
-        else:
-            res = [
-                {"role": "user", "content": self.template["prompt_input"].format(instruction=instruction, input=prompt_input)},
-                {"role": "assistant", "content": data_point['abs']}
-            ]
-        return res
-    def _generate_intro_2_abstract_prompt(self, data_point: dict, context_window):
-        instruction = "Please generate the abstract of paper based on its introduction section."
-        prompt_input = ""
-        prompt_input = prompt_input + "Introduction: " + data_point['intro'] + "\n"
-        # Reduce it to make it fit
-        prompt_input = prompt_input[:int(context_window*2)]
-        if self.tokenizer.chat_template is None:
-            res = self.template["prompt_input"].format(instruction=instruction, input=prompt_input)
-            res = f"{res}{data_point['abs']}"
-        else:
-            res = [
-                {"role": "user", "content": self.template["prompt_input"].format(instruction=instruction, input=prompt_input)},
-                {"role": "assistant", "content": data_point['abs']}
-            ]
-        return res
-if __name__ == "__main__":
-    wandb.init(project='qlora_train')
-    parser = argparse.ArgumentParser()
-    parser.add_argument("config_path", help="Path to the config YAML file")
-    parser.add_argument("--index", type=int, default=1, help="Index to specify the GPU or task number")
-    args = parser.parse_args()
-    config = read_yaml_file(args.config_path)
-    trainer = QloraTrainer_CS(config, args.index, True)
-    print("Load base model")
-    trainer.load_base_model()
-    print("Start training")
-    trainer.train()