EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria Paper • 2309.13633 • Published Sep 24, 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models Paper • 2310.08491 • Published Oct 12, 2023 • 55
Aligning Large Language Models through Synthetic Feedback Paper • 2305.13735 • Published May 23, 2023 • 1
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning Paper • 2305.14045 • Published May 23, 2023 • 5
Who Wrote this Code? Watermarking for Code Generation Paper • 2305.15060 • Published May 24, 2023 • 1
Dialogue Summaries as Dialogue States (DS2), Template-Guided Summarization for Few-shot Dialogue State Tracking Paper • 2203.01552 • Published Mar 3, 2022
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2, 2024 • 123
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models Paper • 2406.05761 • Published Jun 9, 2024 • 3
EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records Paper • 2301.07695 • Published Jan 16, 2023 • 2
Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes Paper • 2309.00237 • Published Sep 1, 2023 • 3
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images Paper • 2310.18652 • Published Oct 28, 2023 • 1
Exploration into Translation-Equivariant Image Quantization Paper • 2112.00384 • Published Dec 1, 2021