Session 3 Recap: AI for Literature Review and Research Discovery


February 28, 2026

A stack of academic papers on the left connected by thin flowing lines to a small cluster of glowing nodes on the right, representing a knowledge graph or citation network.

From Tools and Projects to Research Workflows

Our third AI Academy session turned from foundations to practice, taking on one of the most universal tasks in academic life: the literature review. Building on Session 1's workflow thinking and Session 2's persistent knowledge bases, we asked a more pointed question: what does it look like to integrate AI responsibly into the discovery, prioritization, and deep engagement stages of scholarly research?

By early 2026, AI is no longer a peripheral aid for drafting or proofreading; surveys from Elsevier and Wiley both report that a majority of researchers now use AI tools regularly, with Wiley's number jumping from 57% to 84% in a single year. The question is no longer whether to engage, but how to do so with rigor.

The Semantic Discovery Landscape

We toured the category of specialized, AI-enhanced literature tools that have matured over the past year: Elicit for structured extraction from Semantic Scholar, ResearchRabbit for citation network visualization and iterative chaining, Consensus for empirical "yes/no" questions with its signature consensus meter, and Paperguide as an all-in-one assistant. The common thread is semantic discovery: instead of Boolean keyword matching, these platforms use vector embeddings to retrieve conceptually relevant work even when terminology varies across disciplines.

Alongside these specialized tools, the major AI providers have rolled out their own deep research capabilities: OpenAI Deep Research, Gemini Deep Research, Claude's research mode, and Perplexity Deep Research. These are not literature tools per se; they are general-purpose agentic research systems that can autonomously plan, search, backtrack, and synthesize over dozens of steps.

When should you reach for which? The answer depends on where you sit in the research process. Specialized tools excel at systematic, reproducible searches and structured data extraction across many papers. Deep research agents excel when you are exploring a new area and do not yet know what to search for, or when your question spans heterogeneous sources: academic papers, preprints, technical documentation, industry reports.

A Hybrid Workflow

The recommended workflow we walked through is iterative and hybrid:

  1. Orientation via deep research: start broad, let an agent map the landscape and surface candidate seminal papers.
  2. Systematic expansion: feed those seeds into ResearchRabbit to build a citation network, or Elicit to extract structured data from promising candidates.
  3. Prioritization with NotebookLM: upload abstracts or full texts of your candidate papers and use categorization and relevance prompts to distill a focused reading list.
  4. Deep engagement: read the top few papers yourself, top to bottom, with highlighter in hand.

NotebookLM earned particular emphasis here. Because it is grounded exclusively in the sources you provide, responses come with inline citations and effectively zero hallucinations. Its recent integration with Gemini Gems is a genuine step forward: you can now attach an entire NotebookLM corpus to a Gem without re-uploading, dynamically improving the Gem with a living knowledge base.

Discussion: The Wicked Question

The conversation that followed the presentation was the real heart of the session, and it opened up the tension that every faculty member in the room is wrestling with.

Erica raised what Professor Barba called "the wicked question": with these tools, a student could write a competent literature review without understanding anything in it. From a doctoral mentoring perspective, the problem is acute. A lit review is not just a deliverable; it is how students come to know a field: who is working on what, which threads of argument matter, where the disagreements lie. That understanding is forged by struggling through the actual language of the papers. Can the tools deliver their benefits without eroding that deeper learning?

Professor Barba connected this directly to an earlier exchange about a draft policy in biomedical engineering that had proposed banning AI tools from qualifying exams outright. Her position was unambiguous: as a researcher, she wants to use these tools and wants to teach students to use them fully and responsibly. Banning is a knee-jerk reaction; the harder, more honest task is designing for responsible use.

Several threads emerged from the group. One proposal was to assess process rather than product: ask students to document and articulate how they moved from 350 candidate papers down to five, what judgments they made at each stage, and why. That articulation is something the tool cannot supply. Another suggestion was to lean on oral examinations and demands for novelty as complementary checks. A third was the arithmetic analogy: we all use calculators now, but we learned long division first. Is there an equivalent foundational phase for literature work, and if so, what does it look like?

A parallel view, offered from the capstone teaching experience, pressed the other direction: the power these tools bring is simply too great to look away from. Undergraduates who graduate without AI fluency will be behind the curve. The answer is both: automate and amplify the discovery phase, then roll up your sleeves and read the papers that matter. The threshold where automation ends and deep reading begins is where the real learning lives.

The paywall problem surfaced too, as a practical limit on what any of these tools can do. None of the current deep research agents can reach behind institutional subscriptions, which excludes much of the journal-of-record literature in most engineering fields. This remains an open gap, and one worth pushing vendors and libraries to close.

 

Tools of the Trade

Elicit — An AI-powered literature review assistant built on Semantic Scholar that helps researchers find papers and extract structured data (methods, samples, outcomes) across batches of articles. Best for systematic reviews and empirical domains where comparable fields can be pulled from many papers at once.

ResearchRabbit — A visual citation-network tool that takes seed papers and iteratively maps related and citing work, letting you discover literature through relationship chaining rather than keyword search. Strong for tracing research genealogies and expanding coverage once you have a starting point.

Consensus — A semantic search engine focused on empirical "yes/no" research questions, with a signature consensus meter that summarizes where the evidence converges or conflicts across studies. Useful for rapid scoping of well-defined empirical claims.

Paperguide — An all-in-one AI research assistant that combines discovery, summarization, note-taking, and writing support in a single workspace, aiming to cover the full lifecycle from search to draft.

ChatGPT Deep Research — An agentic research mode inside ChatGPT that autonomously searches, reads, and synthesizes across web-accessible technical and scholarly sources, producing analyst-style reports with inline citations and optional quantitative analysis via Python.

Gemini Deep Research — Google's agentic research system, tightly linked with Google Search and Workspace, with a large context window for processing many full papers and direct export to Google Docs for collaborative editing.

Claude Research Mode — Anthropic's extended autonomous research capability that decomposes complex questions and runs multi-step investigations lasting up to roughly 45 minutes, returning nuanced, citation-backed syntheses well suited to careful reasoning tasks.

Perplexity Deep Research — A fast multi-step research agent with transparent source tracking, good for rapid evidence gathering and scoping when you need a quick but well-cited overview of a topic.

NotebookLM — Google's source-grounded knowledge workspace, where you upload up to 50 sources (300 in the Pro tier) and interact with them through chat, summaries, mind maps, audio overviews, and slide decks, with every response citing specific passages from your uploaded materials. Uniquely positioned for the prioritization and deep-engagement stages of a literature review because it never draws from outside training data.

Gemini Gems (with NotebookLM integration) — Custom Gemini assistants that can now attach an entire NotebookLM corpus as a living knowledge base, so your Gem improves dynamically as you add sources to the underlying notebook: a recent integration that meaningfully changes how persistent research assistants can be built.

 

A Note from the Broader Conversation

A colleague reported briefly on a university-level AI discussion held the day before, which had stayed largely at the level of "what can you do with the chat interface?" The contrast with what we do in the Academy, Prof. Barba noted, is exactly the point: moving beyond the chatbox into workflows, knowledge bases, and agentic tools is where the leverage is. Other institutions are asking the same questions about assessment in the age of AI, and nobody has clean answers yet. We are, collectively, stumbling toward wisdom.

 

What's Next

Session 4 will take the qualitative shift implicit in deep research and name it directly: the move from AI as conversational partner to AI as task-executing agent. We will look at agent skills, connectors and MCP, and what it means to delegate real work to these systems while keeping human judgment at the center.


 

 

Watch the Session

An edited recording of the live demonstrations from Session 3 is available on YouTube.

 


The GW Engineering AI Academy is a strategic initiative to position SEAS as an AI-forward institution through systematic faculty development, anchored in the Entrepreneurial Mindset framework of our KEEN partnership.