A curated collection of resources, papers, tools, and implementations that bridge the gap between Retrieval-Augmented Generation (RAG) and Reasoning in Large Language Models and Agents. This repository brings together traditionally separate research domains to enable more powerful Agentic AI systems.
📖 Related Survey: This repository is based on the taxonomy and framework presented in "Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs", featured 🏆 in Hugging Face Daily Papers.
🔍 Dive Deeper: For researchers interested in the latest developments in Agentic Deep Research, including cutting-edge papers and industry-leading deep research products, we recommend exploring our comprehensive collection at Awesome-Deep-Research 🔥🔥🔥.
If you find this repository useful, please cite our papers:
@article{li2025towards,
title={Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs},
author={Li, Yangning and Zhang, Weizhi and Yang, Yuyao and Huang, Wei-Chieh and Wu, Yaozu and Luo, Junyu and Bei, Yuanchen and Zou, Henry Peng and Luo, Xiao and Zhao, Yusheng and others},
journal={arXiv preprint arXiv:2507.09477},
year={2025}
}
@article{zhang2025web,
title={From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents},
author={Zhang, Weizhi and Li, Yangning and Bei, Yuanchen and Luo, Junyu and Wan, Guancheng and Yang, Liangwei and Xie, Chenxuan and Yang, Yuyao and Huang, Wei-Chieh and Miao, Chunyu and others},
journal={arXiv preprint arXiv:2506.18959},
year={2025}
}🔍 Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm that combines the strengths of large language models with external knowledge retrieval. By augmenting language models with relevant information from external sources, RAG systems can provide more accurate, up-to-date, and factual responses while maintaining the generative capabilities of modern LLMs.
- Limitations:
- May retrieve irrelevant or inaccurate information
- Limited by the quality and coverage of external knowledge bases
🧠 Reasoning has recently gained significant popularity as a complementary approach to enhance LLM performance. Reasoning techniques focus on improving the model's ability to process information, perform logical analysis, and arrive at conclusions through structured thinking processes. These methods enable LLMs to tackle complex problems that require multi-step inference, causal understanding, and systematic problem-solving.
- Limitations:
- Often hallucinates or mis-grounds facts
- Struggles with up-to-date or domain-specific information
Although RAG and Reasoning address different aspects of the model's capabilities. they have been developed largely independently, with separate research communities, methodologies, and evaluation benchmarks:
This repository serves as a comprehensive collection that bridges these traditionally separate domains, providing resources for researchers and practitioners interested in combining the strengths of both approaches.
Large Language Models (LLMs) serve as the foundation for modern AI systems, but they face significant limitations in both knowledge access and reasoning capabilities. While RAG excels at providing factual knowledge and reasoning excels at logical processing, real-world problems often require both capabilities simultaneously. Complex queries demand not just access to relevant information, but also the ability to reason through that information systematically.
Real-World Impact: This combination enables AI systems to tackle complex problems that require both knowledge retrieval and sophisticated reasoning, such as scientific research, legal analysis, medical diagnosis, and strategic planning.
The Reasoning-Enhanced RAG methods and RAG-Enhanced Reasoning methods represent one-way enhancements. In contrast, the Synergized RAG-Reasoning System performs reasoning and retrieval iteratively, enabling mutual enhancements.
Below you will find a curated selection of research papers, open-source implementations, and benchmarking datasets that drive progress in RAG and Reasoning.
Latest academic publications and open-source implementations that advance the integration of RAG and Reasoning.
The table linked below covers a diverse range of tasks. Each benchmark is annotated with its domain, knowledge type, reasoning capability, and dataset size.
- Single-hop QA
- Multi-hop QA
- Multi-choice QA
- Multi-step QA
- Multimodal QA
- Long-form QA
- Graph QA
- Code
- Dialog
- Fact Checking
- Text Summarization
Guidelines for contributing to this repository and adding citation information.
📚 Research Papers and Frameworks: This section is organized according to the taxonomy in our research paper, providing resources for researchers and practitioners to explore, implement, and motivate new methods in the field.
-
(AAAI 2025) MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models [Paper] [Code]
-
(ArXiv 2025) Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration [Paper] [Code]
-
(ArXiv 2025) DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning [Paper] [Code]
-
(ArXiv 2025) Credible plan-driven rag method for multi-hop question answering [Paper]
-
(ArXiv 2025) FIND: Fine-grained Information Density Guided Adaptive Retrieval-Augmented Generation for Disease Diagnosis [Paper]
-
(ArXiv 2025) LLM-Independent Adaptive RAG: Let the Question Speak for Itself [Paper] [Code]
-
(ACL 2024) Chain-of-Verification Reduces Hallucination in Large Language Models [Paper]
-
(EMNLP 2024) Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs [Paper] [Code]
-
(EMNLP 2024) Retrieval and Reasoning on KGs: Integrate Knowledge Graphs into Large Language Models for Complex Question Answering [Paper] [Code]
-
(NAACL 2024) Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [Paper] [Code]
-
(SIGIR 2024) Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers? [Paper]
-
(LREC-COLING 2024) RADCoT: Retrieval-Augmented Distillation to Specialization Models for Generating Chain-of-Thoughts in Query Expansion [Paper] [Code]
-
(ArXiv 2024) GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning [Paper] [Code]
-
(ArXiv 2024) RuleRAG: Rule-Guided Retrieval-Augmented Generation with Language Models for Question Answering [Paper] [Code]
-
(ArXiv 2025) DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering [Paper]
-
(EMNLP 2024) SEER: Self-Aligned Evidence Extraction for Retrieval-Augmented Generation [Paper] [Code]
-
(ICLR 2024) Making Retrieval-Augmented Language Models Robust to Irrelevant Context [Paper] [Code]
-
(ACL 2024) BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering [Paper]
-
(AAAI 2025) Improving Retrieval Augmented Language Model with Self-Reasoning [Paper]
-
(ArXiv 2025) RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models [Paper]
-
(ArXiv 2025) AlignRAG: Leveraging Critique Learning for Evidence-Sensitive Retrieval-Augmented Reasoning [Paper] [Code]
-
(EMNLP 2024) Open-RAG: Enhanced Retrieval Augmented Reasoning with Open-Source Large Language Models [Paper] [Code]
-
(EMNLP 2024) TRACE the evidence: Constructing knowledge-grounded reasoning chains for retrieval-augmented generation [Paper] [Code]
-
(ICLR 2025) KBLaM: Knowledge Base augmented Language Model [Paper] [Code]
-
(ArXiv 2025) Assisting Mathematical Formalization with A Learning-based Premise Retriever [Paper] [Code]
-
(ArXiv 2025) ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation [Paper] [Code]
-
(ArXiv 2025) Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding [Paper]
-
(ArXiv 2025) PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation [Paper] [Code]
-
(SIGIR 2024) Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering [Paper]
-
(ICCBR 2024) CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering [Paper] [Code]
-
(LLM4Code 2024) LLM-based and Retrieval-Augmented Control Code Generation [Paper]
-
(ArXiv 2024) MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries [Paper] [Code]
-
(MDPI 2024) CRP-RAG: A Retrieval-Augmented Generation Framework for Supporting Complex Logical Reasoning and Knowledge Planning [Paper]
-
(ICTIR 2025) Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking [Paper] [Code]
-
(NAACL 2025) Step-by-Step Fact Verification System for Medical Claims with Explainable Reasoning [Paper] [Code]
-
(COLM 2024) Web Retrieval Agents for Evidence-Based Misinformation Detection [Paper] [Code]
-
(EMNLP 2024) OPEN-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models [Paper] [Code]
-
(ACL 2024) FRVA: Fact-Retrieval and Verification Augmented Entailment Tree Generation for Explainable Question Answering [Paper]
-
(FEVER 2024) Ragar, your falsehood radar: Rag-augmented reasoning for political fact-checking using multimodal large language models [Paper]
-
(LREC-COLING 2024) PACAR: Automated Fact-Checking with Planning and Customized Action Reasoning using Large Language Models [Paper]
-
(COLING 2025) Efficient Tool Use with Chain-of-Abstraction Reasoning [Paper]
-
(NAACL 2025) Meta-Reasoning Improves Tool Use in Large Language Models [Paper] [Code]
-
(ArXiv 2025) Self-Training Large Language Models for Tool-Use Without Demonstrations [Paper] [Code]
-
(ICLR 2024) Large Language Models As Tool Makers [Paper] [Code]
-
(ICLR 2024) ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs [Paper] [Code]
-
(NeurIPS 2024) AVATAR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [Paper] [Code]
-
(EMNLP 2024) Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval [Paper]
-
(EMNLP 2024) SCIAGENT: Tool-augmented Language Models for Scientific Reasoning [Paper]
-
(EMNLP 2024) RAR: Retrieval-augmented retrieval for code generation in low-resource languages [Paper]
-
(ACL 2024) MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning [Paper] [Code]
-
(LREC-COLING 2024) Towards Autonomous Tool Utilization in Language Models: A Unified, Efficient and Scalable Framework [Paper]
-
(NAACL 2024) Making Language Models Better Tool Learners with Execution Feedback [Paper] [Code]
-
(NeurIPS 2023) ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings [Paper] [Code]
-
(ICLR 2025) Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning [Paper] [Code]
-
(ICLR 2025) Human-like Episodic Memory for Infinite Context LLMs [Paper]
-
(IEEE TPAMI 2025) JARVIS-1: Open-World Multi-Task Agents With Memory-Augmented Multimodal Language Models [Paper]
-
(ArXiv 2025) Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models [Paper]
-
(ArXiv 2025) Review of Case-Based Reasoning for LLM Agents: Theoretical Foundations, Architectural Components, and Cognitive Integration [Paper]
-
(NeurIPS 2024) CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing [Paper] [Code]
-
(CHI EA 2024) "My agent understands me beter": Integrating Dynamic Human-like Memory Recall and Consolidation in LLM-Based [Paper]
-
(ArXiv 2024) Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level [Paper]
-
(ArXiv 2024) RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents [Paper]
-
(ICLR 2025) OpenRAG: Optimizing RAG End-to-End viaIn-ContextRetrievalLearning [Paper]
-
(COLING 2025) PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation [Paper]
-
(IJCAI 2024) Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction [Paper]
-
(NeurIPS 2024) Mixture of Demonstrations for In-Context Learning [Paper] [Code]
-
(EACL 2024) Learning to Retrieve In-Context Examples for Large Language Models [Paper] [Code]
-
(EMNLP 2023) UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation [Paper] [Code]
-
(ArXiv 2023) Dr.ICL: Demonstration-Retrieved In-context Learning [Paper]
-
(ICLR 2025) Long-context llms meet rag: Overcoming challenges for long inputs in rag [Paper]
-
(ArXiv 2025) Chain-of-Retrieval Augmented Generation [Paper] [Code]
-
(ArXiv 2025) CoT-RAG: Integrating Chain of Thought and Retrieval-Augmented Generation to Enhance Reasoning in Large Language Models [Paper]
-
(ArXiv 2025) Rankcot: Refining knowledge for retrieval-augmented generation through ranking chain-of-thoughts [Paper] [Code]
-
(EMNLP 2024) Retrieving, Rethinking and Revising: The Chain-of-Verification Can Improve Retrieval Augmented Generation [Paper]
-
(EMNLP 2024) Chain-of-note: Enhancing robustness in retrieval-augmented language models [Paper]
-
(COLM 2024) Raft: Adapting language model to domain specific rag [Paper] [Code]
-
(ArXiv 2024) Rat: Retrieval augmented thoughts elicit context-aware reasoning in long-horizon generation [Paper] [Code]
-
(ArXiv 2024) TRACE the evidence: Constructing knowledge-grounded reasoning chains for retrieval-augmented generation [Paper] [Code]
-
(ACL 2023) Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions [Paper] [Code]
-
(ACL 2025) ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search [Paper] [Code]
-
(AAAI 2025) RATT: A Thought Structure for Coherent and Correct LLM Reasoning [Paper] [Code]
-
(ArXiv 2025) MCTS-RAG: Enhance Retrieval-Augmented Generation with Monte Carlo Tree Search [Paper] [Code]
-
(ArXiv 2025) Airrag: Activating intrinsic reasoning for retrieval augmented generation via tree-based search [Paper]
-
(ArXiv 2025) Tree-based RAG-Agent Recommendation System: A Case Study in Medical Test Data [Paper]
-
(ArXiv 2024) SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [Paper]
-
(ArXiv 2024) CORAG: A Cost-Constrained Retrieval Optimization System for Retrieval-Augmented Generation [Paper]
-
(ACL 2023) Tree of clarifications: Answering ambiguous questions with retrieval-augmented large language models [Paper] [Code]
-
(EMNLP 2023) Grove: a retrieval-augmented complex story generation framework with a forest of evidence [Paper]
-
(ICLR 2025) Reasoning of Large Language Models over Knowledge Graphs with Super-Relations [Paper] [Code]
-
(ICLR 2025) Simple is Effective: The Roles of Graphs and LLMs in Knowledge-Graph-Based RAG [Paper] [Code]
-
(ICLR 2025) StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization [Paper] [Code]
-
(ArXiv 2025) From RAG to Memory: Non-Parametric Continual Learning for Large Language Models [Paper] [Code]
-
(ArXiv 2025) From Local to Global: A GraphRAG Approach to Query-Focused Summarization [Paper] [Code]
-
(NeurIPS 2024) G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [Paper] [Code]
-
(NeurIPS 2024) HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models [Paper] [Code]
-
(ArXiv 2024) DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature [Paper] [Code]
-
(ArXiv 2024) GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning [Paper] [Code]
-
(ArXiv 2024) LightRAG: Simple and Fast Retrieval-Augmented Generation [Paper] [Code]
-
(ArXiv 2023) Retrieve-Rewrite-Answer: A KG-to-Text Enhanced LLMs Framework for Knowledge Graph Question Answering [Paper] [Code]
-
(ICLR 2022) GreaseLM: Graph REASoning Enhanced Language Models for Question Answering [Paper] [Code]
-
(ACL 2022) Subgraph Retrieval Enhanced Model for Multi-hop KBQA [Paper] [Code]
-
(NAACL 2021) QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering [Paper] [Code]
-
(ACL 2019) PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text [Paper] [Code]
-
(ICLR 2025) Think-on-Graph 2.0: Deep and Faithful Large Language Model Reasoning with Knowledge-guided Retrieval Augmented Generation [Paper] [Code]
-
(ICLR 2024) Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph [Paper] [Code]
-
(ICLR 2024) Reasoning on Graphs: Faithful and Interpretable LLM Reasoning (RoG) [Paper]
-
(ACL 2024) Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs [Paper] [Code]
-
(ACL 2024) GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models [Paper]
-
(AAAI 2024) Knowledge Graph Prompting for Multi-Document Question Answering [Paper] [Code]
-
(WWW 2025) Kag: Boosting llms in professional domains via knowledge augmented generation [Paper] [Code]
-
(EMNLP 2022) Empowering Language Models with Knowledge Graph Reasoning for Question Answering [Paper]
-
(CIS 2024) KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph [Paper] [Code]
-
(ArXiv 2024) HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses [Paper] [Code]
-
(ArXiv 2024) KG-RAG: Bridging the Gap Between Knowledge and Creativity [Paper]
-
(ArXiv 2024) Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval [Paper]
-
(ArXiv 2025) Search-o1: Agentic Search-Enhanced Large Reasoning Models [Paper] [Code]
-
(ArXiv 2025) Plan∗RAG: Efficient Test-Time Planning for Retrieval Augmented Generation [Paper]
-
(ArXiv 2025) Open Deep Search: Democratizing Search with Open-source Reasoning Agents [Paper] [Code]
-
(ArXiv 2025) DeepRAG: Thinking to Retrieval Step by Step for Large Language Models [Paper]
-
(ArXiv 2025) Enhancing Retrieval Systems with Inference-Time Logical Reasoning [Paper]
-
(ArXiv 2025) Self-Taught Agentic Long-Context Understanding [Paper]
-
(ICLR 2024) Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection [Paper] [Code]
-
(KDD Cup 2024) A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning [Paper]
-
(ICLR 2023) ReAct: Synergizing Reasoning and Acting in Language Models [Paper] [Code]
-
(EMNLP 2023) Measuring and Narrowing the Compositionality Gap in Language Models [Paper]
-
(ACL 2023) Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions [Paper]
-
(EMNLP 2024) REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering [Paper]
-
(EMNLP 2024) RAG-Studio: Towards In-Domain Adaptation of Retrieval Augmented Generation Through Self-Alignment [Paper]
-
(ICML 2024) InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining [Paper]
-
(ICML 2024) INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning [Paper] [Code]
-
(ICLR 2024) Ra-dit: Retrieval-augmented dual instruction tuning [Paper]
-
(COLM 2024) RAFT: Adapting Language Model to Domain Specific RAG [Paper]
-
(SR 2024) A fine-tuning enhanced RAG system with quantized influence measure as AI judge [Paper]
-
(ArXiv 2024) SFR-RAG: Towards Contextually Faithful LLMs [Paper] [Code]
-
(NeurIPS 2023) Toolformer: Language Models Can Teach Themselves to Use Tools [Paper]
-
(ArXiv 2025) DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments [Paper] [Code]
-
(ArXiv 2025) Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [Paper] [Code]
-
(ArXiv 2025) RAG-RL: Advancing Retrieval-Augmented Generation via RL and Curriculum Learning [Paper]
-
(ArXiv 2025) R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [Paper] [Code]
-
(ArXiv 2025) ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning [Paper] [Code]
-
(ArXiv 2025) ZeroSearch: Incentivize the Search Capability of LLMs without Searching [Paper] [Code]
-
(ArXiv 2025) ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding [Paper]
-
(ArXiv 2022) WebGPT: Browser-assisted question-answering with human feedback [Paper]
-
(ACL 2025) Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research [Paper] [Code]
-
(ArXiv 2025) Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration [Paper]
-
(ArXiv 2025) Knowledge-Aware Iterative Retrieval for Multi-Agent Systems [Paper]
-
(ArXiv 2025) SLA Management in Reconfigurable Multi-Agent RAG: A Systems Approach to Question Answering [Paper]
-
(ArXiv 2025) SurgRAW: Multi-Agent Workflow with Chain of Thought Reasoning for Surgical Intelligence [Paper]
-
(ArXiv 2025) HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation [Paper] [Code]
-
(ArXiv 2025) RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration [Paper]
-
(ArXiv 2025) MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding [Paper] [Code]
-
(ArXiv 2025) MANTRA: Enhancing Automated Method-Level Refactoring with Contextual RAG and Multi-Agent LLM Collaboration [Paper]
-
(ArXiv 2025) Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering [Paper]
-
(ArXiv 2025) Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning [Paper]
-
(ArXiv 2025) Agentic Information Retrieval [Paper]
-
(ACL 2024) M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions [Paper]
-
(NeurIPS 2024) Chain of Agents: Large Language Models Collaborating on Long-Context Tasks [Paper] [Code]
-
(ArXiv 2024) A Collaborative Multi-Agent Approach to Retrieval-Augmented Generation Across Diverse Data [Paper]
-
(ArXiv 2024) MindSearch: Mimicking Human Minds Elicits Deep AI Searcher [Paper] [Code]
📊 Benchmarks and Datasets: These resources enable standardized evaluation and comparison of RAG and Reasoning methods across various real-world scenarios, supporting research progress and practical deployment.
| Title | Venue & Code | Benchmark Task | Domain | Knowledge Type | Reasoning Capability | Size |
|---|---|---|---|---|---|---|
| TriviaQA | ACL'17 |
Single-hop QA | General | Commonsense, Logical | Deductive | 650,000+ |
| NQ | ACL'19 |
Single-hop QA | General | Commonsense, Logical | Deductive | 307,373 |
| SimpleQA | Arxiv'24 |
Single-hop QA | General | Commonsense | Deductive | 4,326 |
| HotpotQA | EMNLP'18 |
Multi-hop QA | General | Commonsense | Deductive | 113,000 |
| CWQ | NAACL'18 |
Multi-hop QA | General | Commonsense | Deductive | 34,689 |
| IIRC | EMNLP'20 | Multi-hop QA | General | Commonsense, Logical | Deductive | 13,000+ |
| 2WikiMultiHopQA | COLING'20 |
Multi-hop QA | General | Commonsense, Logical | Deductive | 192,606 |
| MuSiQue | ACL'22 |
Multi-hop QA | General | Commonsense, Logical | Deductive | 25,000 |
| TopiOCQA | TACL'22 |
Multi-hop QA | General | Commonsense, Logical | Deductive | 3,920 + 50,574 |
| FRAMES | Arxiv'24 | Multi-hop QA | General | Commonsense, Logical, Arithmetic | Deductive | 824 |
| MINTQA | Arxiv'24 |
Multi-hop QA | General | Commonsense, Logical | Deductive | 10,479 |
| GPQA | COLM'24 |
Multi-hop QA | Science | Logical | Deductive, Abductive | 448 |
| HLE | Arxiv'25 |
Multi-hop QA | Science | Arithmetic, Logical, Multimodal | Deductive, Abductive | 2,500 |
| QuALITY | NAACL'22 |
Multi-choice QA | Narrative | Commonsense, Logical | Deductive, Abductive | 6,737 |
| CC/Bamboogle | EMNLP'23 |
Multi-choice QA | General | Logical | Deductive, Abductive | 125 |
| BIG-Bench | TMLR'23 |
Multi-choice QA | General | Commonsense, Logical | Deductive, Abductive, Inductive, Analogical | 204 |
| ADQA | EMNLP'24 |
Multi-choice QA | Health | Commonsense, Logical | Deductive, Abductive | 446 |
| MMLU-Pro | NeurIPS'24 |
Multi-choice QA | Science | Arithmetic, Commonsense, Logical | Deductive, Inductive | 12,032 |
| StrategyQA | TACL'21 |
Multi-step QA | General | Commonsense, Logical | Deductive | 2,780 |
| CrisisMMD | Arxiv'18 | Multimodal QA | Crisis Response | Commonsense, Multimodal | Abductive | 16,097 |
| ALFWORLD | ICLR'21 |
Multimodal QA | Game | Multimodal | Deductive, Abductive | 3,827 |
| SCIENCEQA | NeurIPS'22 |
Multimodal QA | Science | Logical, Multimodal | Deductive | 21,000+ |
| WebShop | NeurIPS'22 |
Multimodal QA | E-commerce | Multimodal | Inductive, Abductive | 12,087 |
| MMLongBench-DOC | NeurIPS'24 |
Multimodal QA | Narrative | Multimodal | Deductive, Abductive | 1,082 |
| UDA | NeurIPS'24 |
Multimodal QA | Narrative | Multimodal | Deductive | 29,590 |
| LongDocURL | Arxiv'24 |
Multimodal QA | Narrative | Multimodal | Deductive, Abductive | 2,325 |
| SurgCoTBench | Arxiv'25 |
Multimodal QA | Health | Multimodal, Logical | Abductive, Deductive | 14,176 |
| ∞BENCH | ACL'24 |
Long-form QA | Narrative, General | Multimodal, Logical | Inductive, Abductive | 3,946 |
| GRBENCH | ACL'24 |
Graph QA | Narrative, E-commerce, Health | Logical | Deductive, Inductive | 1,740 |
| GraphQA | NeurIPS'24 |
Graph QA | Textual Graph Understanding | Commonsense, Multimodal | Deductive, Abductive | 107,503 |
| Refactoring Oracle | IEEE'22 |
Code | Software | Logical | Deductive | 7,226 |
| LiveCodeBench | ICLR'25 |
Code | General | Logical | Deductive, Abductive | 1,055 |
| ColBench | Arxiv'25 | Code | Software | Logical | Abductive, Inductive | 10,000+ |
| DailyDialog | IJCNLP'17 | Dialog | General | Commonsense | – | 13,118 |
| Fever | NAACL'18 |
Fact Checking | General | Logical | Deductive, Abductive | 185,445 |
| PubHealth | EMNLP'20 |
Fact Checking | Health | Commonsense, Logical | Abductive, Deductive | 11,800 |
| XSum | EMNLP'18 |
Text Summarization | Narrative | Logical, Commonsense | Abductive | 226,711 |
Contributions are welcome! Please feel free to submit pull requests or open issues to suggest new resources.
We welcome contributions to expand this collection! To add your work, please:
-
Submit a Pull Request or Open an Issue with the following information:
- Paper Title: Your paper's full title
- Paper Link: DOI, arXiv, or conference link
- GitHub Repository: Link to your open-source implementation (if available)
- Category: Specify which category under our taxonomy your work belongs to:
- Reasoning-Enhanced RAG: Retrieval Optimization / Integration Enhancement / Generation Enhancement
- RAG-Enhanced Reasoning: External Knowledge Retrieval (Knowledge Base/Web Retrieval/Tool Using) / In-context Retrieval (Prior Experience/Example or Training Data)
- Synergized RAG and Reasoning: Reasoning Workflow (Chain-based/Tree-based/Graph-based) / Agentic Orchestration (Single-Agent/Multi-Agent)
-
Format: Follow the existing format in the README for consistency.
-
Quality: Ensure your work is relevant to RAG and Reasoning integration.
Your contributions help build a comprehensive resource for the research community!


