toolreflection / articles.txt

Upload articles.txt with huggingface_hub

219c0b4 verified over 1 year ago

16.1 kB

	26.11.2021 ###################################################################3
	negative samples reduction http://ceur-ws.org/Vol-2007/LEARNER2017_short_1.pdf
	bert for ranking latest review https://arxiv.org/abs/2010.06467
	new sampling approach USEFUL https://arxiv.org/abs/2104.06967
	multitask learning https://github.com/CAMTL/CA-MTL
	distillation https://arxiv.org/pdf/2111.09645.pdf

	22.09.2022 ###################################################################
	New search paradigm
	https://arxiv.org/pdf/2204.10628.pdf
	https://arxiv.org/pdf/2206.02743.pdf
	https://arxiv.org/pdf/2202.06991.pdf

	Auto prompting

	Gurevich Irina
	TU Darmstadt


	#useful#######################################################################
	videos about foundation models
	https://www.youtube.com/playlist?list=PL9t0xVFP90GD8hox0KipBkJcLX_C3ja67


	09.10.2022 #############################################################################
	From "Autoregressive Search Engines: Generating Substrings as Document Identifiers"
	"Query likelihood models" --
	Cicero Nogueira dos Santos, Xiaofei Ma, Ramesh Nallapati, Zhiheng Huang, and Bing Xiang. 2020. Beyond [CLS] through ranking by generation.
	ShengyaoZhuangandGuidoZuccon.2021.TILDE: termindependentlikelihoodmodelforpassagereranking.
	Oleg Lesota, Navid Rekabsaz, Daniel Cohen, Klaus Antonius Grasserbauer, Carsten Eickhoff, and Markus Schedl. 2021. A modern perspective on query likelihood with deep generative retrieval models.

	Prompting to generate queries --
	Angeliki Lazaridou, Elena Gribovskaya, Wojciech Stokowiec, and Nikolai Grigorev. 2022. Internetaugmented language models through few-shot prompting for open-domain question answering.

	11.10.2022 #############################################################################



	18.10.2022 ############################################################################
	Articles with BEIR:

	Researcher: Gautier Izacard

	################################################################################3
	###################################################################################3
	#####################################################################################

	23.02.2023 ############################################################################
	Sparse CLIP (STAIR paper from Apple) https://arxiv.org/pdf/2301.13081.pdf

	#########################################################################################################
	Chain of thought reasoning

	Chain-of-Thought Prompting Elicits Reasoning in Large Language Models https://arxiv.org/pdf/2201.11903.pdf NIPS 2022
	(Кратко -- чуваки просто взяли несколько примеров из датасетов и зафигачили для них промпты (in context learning)
	в стиле пошаговых действий; Это улучшило очень сильно метрики на математике, на всяких логических задачах)

	Large Language Models are Zero-Shot Reasoners https://arxiv.org/pdf/2205.11916.pdf NIPS 2022
	(Чуваки добавляют промрт "Let's think step by step" с помощью него генерируют последовательное решение задачи,
	затем подставляют это решение снова как промпт в модель и получают ответ. Это тоже бустит метрики на арифметике
	и commonsense. Можно сказать, что модель сама может генерировать себе решение задачи.) (нужно почитать подробнее)

	AUTOMATIC CHAIN OF THOUGHT PROMPTING IN LARGE LANGUAGE MODELS https://arxiv.org/pdf/2210.03493.pdf
	(Чуваки хотят придумать auto-cot. Они разбивают вопросы на несколько кластеров,
	затем берут из каждого кластера репрезентативный вопрос и генерируют для него auto-cot.
	Генерация auto-cot не идеальная. Может попасться один кластер, в котором все плохо.
	Авторы делят все вопросы на кластеры (с помощью sentence bert!!!). (Спросить у Димы, как они используют кластеры))

	TO READ Multimodal Chain-of-Thought Reasoning in Language Models https://arxiv.org/pdf/2302.00923.pdf
	(Самый простой способ реализовать multimodal cot -- перевести картинки в текст и реализовать обычный cot.
	LLMs до 100B параметров могут производить галлюцинирующие rationale)

	27.02.2023 ################################################################################
	Выбор коллокаций
	https://nlp.stanford.edu/fsnlp/promo/colloc.pdf

	Large Language models
	TO READ Scaling Laws for Neural Language Models https://arxiv.org/pdf/2001.08361.pdf

	LLAMA https://scontent-ams4-1.xx.fbcdn.net/v/t39.2365-6/333007794_1182140292435357_4481174526219500228_n.pdf?_nc_cat=101&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=Z5B8LP9penMAX_SWEqj&_nc_ht=scontent-ams4-1.xx&oh=00_AfAogQwG27t4J0ui35Jxwf1G31cgj2HiZGtw8v3cHk3szA&oe=6401D9D1
	Чуваки просто взяли много очищенных данных и натренировали модели меньше, чем GPT-3 и PALM, показав,
	что данных для больших моделей нужно больше. У них получилось, что даже в статье Hoffman, где показано,
	что для обучения больших моделей нужно больше данных, была недостаточно хорошая оценка.
	Модель лучше или comparable to 175B gpt-3 или 450B PALM. (Не бьет code-davinci-002 на MMLU)

	TO READ Training compute optimal large language models https://arxiv.org/pdf/2203.15556.pdf

	Toolformer: Language Models Can Teach Themselves to Use Tools https://arxiv.org/pdf/2302.04761.pdf
	Тут взяли GPT-J, аугментировали с помощью нее данные вызовами api, затем дообучили ее на этом.
	Таким образом, GPT-J научилась вызывать калькулятор, поиск по вики,
	переводчик и побеждать большие GPT-3 и OPT на некоторых задачах

	To READ Generating Datasets with Pretrained Language Models https://aclanthology.org/2021.emnlp-main.555.pdf

	28.02.2023 ###########################################################################################################################3

	TO READ Atlas: Few-shot Learning with Retrieval Augmented Language Models https://arxiv.org/pdf/2208.03299.pdf

	TO READ GTP-J

	TO READ Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks https://arxiv.org/pdf/1908.10084.pdf

	TO READ SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes https://arxiv.org/abs/2302.06587

	TO READ LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Retrieval https://arxiv.org/pdf/2302.02908.pdf

	TO READ InPars-v2: Large Language Models as Efficient Dataset Generators for Information Retrieval https://arxiv.org/pdf/2301.01820.pdf

	TO READ ExaRanker: Explanation-Augmented Neural Ranker https://arxiv.org/abs/2301.10521

	01.03.2023 #######################################################################################################

	Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1 from microsoft) https://arxiv.org/pdf/2302.14045.pdf
	Authors combine image embeddings from VIT-L/14 and texts. Then train LLM on it.

	03.03.2023 #######################################################################################################
	DEMONSTRATE–SEARCH–PREDICT: Composing retrieval and language models for knowledge-intensive NLP https://arxiv.org/pdf/2212.14024.pdf
	GPT-3 взаимодействует с Colbert-V2. Примеры взаимодействия: https://colab.research.google.com/github/stanfordnlp/dsp/blob/main/intro.ipynb#scrollTo=773rwc-aMuVD
	(TODO дочитать про последнюю часть ноутбука (qa-v2))

	TO READ Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval https://cs.stanford.edu/~matei/papers/2021/neurips_baleen.pdf

	10.03.2023 #########################################################################
	Scaling Language-Image Pre-training via Masking https://arxiv.org/pdf/2212.00794.pdf
	(authors present FLIP -- new way to train CLIP faster. They simply mask images during pretraining.
	It allows to use larger batch size (not all patches from image are used) and also allows model
	understand image-text distribution faster)

	TO READ Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

	TO READ How to avoid machine learning pitfalls: a guide for academic researchers

	14.03.2023 ##########################################################################
	TO READ Less is more: Pretrain a strong Siamese encoder for dense text
	retrieval using a weak decoder. https://aclanthology.org/2021.emnlp-main.220.pdf
	"We hypothesize that to perform robust retrieval, the [CLS] vector used for computing
	matching scores should encode all the essential information in the passage. "


	SIMLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval https://arxiv.org/pdf/2207.02578.pdf
	Authors claim that improved GLUE score does not result in better retrieval performance
	Основная тема -- авторы обучают вместе энеодер и shallow декодер на задаче, похожей на LM.
	Декодер всего из двух слоев и принимает на вход помимо текста CLS эмбеддинг из энкодера.
	Таким образом CLS эмбеддинги лучше выучиваются. Затем энкодер обучается в стиле contriever.
	(TO DO -- посмотреть в ablation. Возможно, они не проверили, что их претрейнинг помогает)

	TO READ LEXMAE: LEXICON-BOTTLENECKED PRETRAINING FOR LARGE-SCALE RETRIEVAL https://arxiv.org/pdf/2208.14754.pdf

	17.03.2023 ##########################################################################
	ART: Automatic multi-step reasoning and tool-use for large language models https://arxiv.org/pdf/2303.09014v1.pdf

	19.03.2023 #########################################################################
	How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

	04.04.2023 ########################################################################
	TOKEN MERGING: YOUR VIT BUT FASTER https://arxiv.org/pdf/2210.09461.pdf
	Чуваки предлагают ускорять вижуал трансформер при помощи соединения токенов.
	На каждом слое после аттеншона они делят токены на две части (A и B), затем считают скоры между A и B.
	Потом соединяют токены с максимальными симилярити скорами (они также предлагают нормировку на Q и K).
	Таким образом им удалось достичь x2 в скорости при уменьшении качества всего на 0.4%.

	SPLADE: Sparse Lexical and Expansion Model
	for First Stage Ranking https://arxiv.org/pdf/2107.05720.pdf
	Questions -- Weight tying (use input embeddings as embeddings for MLM head) (does original BERT use weight tying)
	Improvements -- log saturation effect, FLOPS-regularizer
	0.322 MRR@10 on MSMARCO 0.665 on TREC DL 2019

	SPLADE v2: Sparse Lexical and Expansion Model for
	Information Retrieval
	Modified pooling mechanism from original splade (from sum to max)
	Extension of model without query expansion (SPLADE-doc)
	Distillation (I did not understand the pipeline)
	SPLADE-doc 0.368 MSMARCO



	TO READ
	Learning to retrieve prompts for in-context learning.
	Selective annotation makes language models better few-shot learners.
	Rethinking the role of demonstrations: What makes in-context learning work?
	Language Model Crossover: Variation through Few-Shot Prompting
	Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback∗
	Active Prompting with Chain-of-Thought for Large Language Models
	ControlNet
	How Does In-Context Learning Help Prompt Tuning?
	BLEU metric

	TO READ!!!!!
	1) Ultra-High Dimensional Sparse Representations with Binarization for
	Efficient Text Retrieval - https://aclanthology.org/2021.emnlp-main.78.pdf UHD-BERT
	2) (query likelihood) TILDE https://espace.library.uq.edu.au/data/UQ_b024b10/arvin2021tilde.pdf?Expires=1680013702&Key-Pair-Id=APKAJKNBJ4MJBJNC6NLQ&Signature=bDdC3xFxyJngCdV69kr3J99~UsnjdFEH6jzRgwy7KkRAZFhbZNTRBJSp6p5cC3hz8dp7lc85-flXx00sBVRd1DqP9sG73-sI6aPNNEDoNxc0eBcZafmbzQ7ARBCAPmpybc4Z2F1RnH29eGW1AExWyQKquBBLQE8li-iLT~jILV5p3YCt-Shzt9HBV7pNUB7zJA3R~GTYVlCiFfLZhy7PvyQ6KH~rJHukWua5ULsuJcicdHg01SKviH2nt9YPuFVV6SDECMJVaALgiZYhCo9GzftC-Sh1BgZLlLFIpGYxU4C1M1xwGykzQUkHKx0CPJu56DtrZGNQGqDWzXIkyvaBPA__
	3) DeepCT - term weightning as regression problem measuring query term recall. !!!
	4) Learning to Tokenize for Generative Retrieval

	RELEVANT DATASETS
	Social media conversations

	TASKS
	WikiHow
	history.stackexchange.com
	*.stackexchange.com
	список источников с QA со ссылками и длинными ответами. Обозначить темы
	Посмотреть, на какие ссылки ссылаются в ответах

	METRICS
	for longform qa -- ROUGE-L

	PROBLEMS

	dataset ELI5 - data leak (article Hurdles to Progress in Long-form Question Answering -- https://arxiv.org/pdf/2103.06332v2.pdf)
	"Our analysis reveals that this result is partially due to significant train / validation overlap in the ELI5 dataset"
	"A human study shows that at least 81% of validation questions have a paraphrase in the training set, and almost all validation questions are topically similar
	to a training set question."
	"While Fan et al. (2019) attempted to identify and remove question overlap using TF-IDF similarity, more complex semantic matching methods & human verification is needed to address this issue in future LFQA datasets."
	"Digging deeper, we identify fundamental issues with using ROUGE-L to evaluate generated answer quality (Figure 1b). Simple baselines such as just repeatedly copying the question, or choosing a random training set answer,
	can outperform LFQA systems such as RAG (Lewis et al., 2020c) in terms of ROUGE-L.
	On the other hand, our system achieves
	higher ROUGE-L than reference human-written
	answers, which is misleading since human A/B
	testers strongly prefer reference answers to our system’s."
	"We conclude that ROUGE-L is not a reliable metric to evaluate LFQA due to its large and
	relatively unconstrained output space (e.g., compared
	to translation or summarization), and we offer suggestions for better automatic & human evaluations
	to enable meaningful progress on this task."
	##################################################################################################################



	TO FIND:
	2/2 "Soft Prompt Decoding for Multilingual Dense Retrieval" was made possible by the first author
	@huang_zhiqi
	, alone with collaborators James Allen and
	@HamedZamani
	Smooth Operators 😎 (for Effective Systematic Review Queries) accepted at #sigir2023 w/
	@fschlatt1
	and
	@martinpotthast

	Webis group
	Universität Tübingen
	AIHannover