GenDec: A robust generative Question-decomposition method for Multi-hop reasoning Paper • 2402.11166 • Published Feb 17, 2024 • 1
Exploring the Efficacy of Automatically Generated Counterfactuals for Sentiment Analysis Paper • 2106.15231 • Published Jun 29, 2021
Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning? Paper • 2510.06036 • Published 17 days ago • 6
Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature Paper • 2310.05130 • Published Oct 8, 2023
Detoxifying Large Language Models via Knowledge Editing Paper • 2403.14472 • Published Mar 21, 2024 • 3
USB: A Unified Semi-supervised Learning Benchmark for Classification Paper • 2208.07204 • Published Aug 12, 2022
CycleResearcher: Improving Automated Research via Automated Review Paper • 2411.00816 • Published Oct 28, 2024 • 1
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models Paper • 2410.09671 • Published Oct 12, 2024 • 1
An Empirical Analysis of Uncertainty in Large Language Model Evaluations Paper • 2502.10709 • Published Feb 15
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning Paper • 2503.09501 • Published Mar 12 • 1
Deep Research Agents: A Systematic Examination And Roadmap Paper • 2506.18096 • Published Jun 22 • 3
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published Aug 22 • 151
Direct Preference Optimization Using Sparse Feature-Level Constraints Paper • 2411.07618 • Published Nov 12, 2024 • 17
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity Paper • 2310.07521 • Published Oct 11, 2023
Supervised Knowledge Makes Large Language Models Better In-context Learners Paper • 2312.15918 • Published Dec 26, 2023 • 10
PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts Paper • 2306.04528 • Published Jun 7, 2023 • 3
A Survey on Evaluation of Large Language Models Paper • 2307.03109 • Published Jul 6, 2023 • 42
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization Paper • 2306.05087 • Published Jun 8, 2023 • 6