Major update on the Talking to Chatbots dataset! Expanded the 'wrapped' dataset (one row per chat) to 2.86k records, and the 'unwrapped' version (one row per conversation turn) to 11k records. The main source is my ChatGPT archive with nearly 2 years of chats. It is still a work in progress as I incorporate chats from other sources and qualitative metrics (SCBN) for responses.
Thought it would only make sense to share this here. Lately, one of my favorite activities has been annotating prompts and putting them into datasets (reddgr/tl-test-learn-promptsreddgr/rq-request-question-promptsreddgr/nli-chatbot-prompt-categorization), which I then use to classify and select chatbot conversations for my website. It's quite fun to use this widget on the lmsys/lmsys-chat-1m, but I also use it on my 2 years of talking to chatbots (soon to be dataset, but still a lot of web scraping and ETL work left)... This one in the picture was probably one of the first prompts I wrote to an LLM: