How is Rabbit Hole different from ChatGPT Deep Research or Perplexity?

Three differences. First, Rabbit Hole uses 10 specialist agents searching in parallel vs one model doing sequential queries -- so it's faster and deeper. Second, a contrarian agent stress-tests every finding before synthesis, catching hidden assumptions and gaps. Third, the output is a downloadable report with embedded diagrams and verified citations, not a chat response. Stanford found Perplexity fabricates 26% of references and ChatGPT 40%. Rabbit Hole verifies every citation before you see it.

What is adversarial verification?

Before you see any report, a contrarian researcher agent reviews all findings. It looks for hidden assumptions, unstated dependencies, what would falsify the thesis, and steel-mans the opposition. Then a separate citation verification hook checks that every factual claim has a real, linked source. This two-layer approach catches the blind spots and hallucinations that single-model research tools miss.

How does pricing work?

Pricing is per month based on how many research reports you need. Free gives you 3 reports to try it out. Basic is $39/month for 15 reports, Plus is $99/month for 40, and Team is $499/month for 100 reports. Every plan includes all 10 specialist agents and adversarial verification. No per-seat fees, no surprises.

What sources does Rabbit Hole search?

10 specialist agents search different source types: arXiv and Semantic Scholar for academic papers, Reddit and Hacker News for community sentiment, X/Twitter and LinkedIn for social signals, SEC EDGAR for financial filings, GitHub and Stack Overflow for technical content, plus news and company sites. Each agent is optimized for its domain -- the academic researcher follows citation graphs differently than the community researcher analyzes Reddit sentiment.

Can I use this for professional work?

That's exactly what it's built for. Consultants use it for competitive landscapes and client deliverables. VCs use it for due diligence. Grad students use it for literature reviews with BibTeX export. The verified citations and confidence ratings mean you can actually cite the output in professional documents -- something you can't safely do with tools that fabricate references.

Why not just use Claude Code or ChatGPT to do this myself?

You could. It would take about 50+ hours. You'd need to set up MCP servers for arXiv, Reddit, SEC EDGAR, Hacker News, and finance APIs. Then build a multi-agent orchestrator with parallel delegation. Then design a contrarian review pipeline. Then wire up citation verification. Then build report formatting with SVG diagram generation. Then tune prompts for each specialist. Then keep it all working as APIs change. Rabbit Hole is that entire stack, already built and tested. At $39/month, it's cheaper than the API tokens you'd burn debugging it.

AI Search Is Confidently Wrong: What the Columbia Study Means for Researchers

On March 6, 2025, the Columbia Journalism Review's Tow Center dropped a study that should unsettle anyone using AI search tools for research. They tested eight major AI search engines-including ChatGPT Search, Perplexity, Gemini, and Grok-on a straightforward task: identify the source of a quoted news article passage.

The results were brutal. Collectively, these tools answered incorrectly more than 60% of the time. Grok 3, Elon Musk's much-hyped "truth-seeking" AI, failed 94% of tests. It fabricated 154 broken URLs out of 200 queries. Not slightly wrong. Not partially correct. Completely, confidently wrong.

60%+

Incorrect answers across the AI search tools tested by Columbia's Tow Center

94%

Grok-3 error rate on the source-identification benchmark

154

Broken or fabricated URLs from Grok-3 across 200 queries

1,600

Total article-identification queries across the tested tools

Citation error rate by AI search tool in the Columbia test

Grok-3

94%

ChatGPT Search

67%

Google Gemini

~60%

Perplexity

37%

The best performer in this benchmark was still wrong more than one-third of the time on a basic citation task.

If you're a consultant, analyst, journalist, or researcher who relies on AI search tools, this study demands your attention. Not because AI search is useless, but because the way most people use it is dangerous.

What the Study Actually Tested

The methodology was elegant in its simplicity. Researchers from Columbia selected 200 news articles from 20 publishers (ranging from the New York Times to niche outlets). They took direct excerpts from each article-passages that, if pasted into Google, would return the correct source within the first three results.

Then they fed these excerpts to eight AI search tools:

ChatGPT Search
Perplexity (free)
Perplexity Pro ($20/month)
DeepSeek Search
Microsoft Copilot
Grok-2
Grok-3 ($40/month)
Google Gemini

The task: identify the article's headline, original publisher, publication date, and URL. Something any competent researcher could do in 30 seconds with traditional search.

Across 1,600 total queries, the AI tools failed more often than they succeeded. And they failed with disturbing confidence.

The Specific Damage by Platform

Let's look at the error rates:

Grok-3: 94% error rate Out of 200 queries, Grok-3 got only 12 completely correct. It fabricated or broke 154 URLs. When it did identify an article correctly, it often linked to a made-up URL. This isn't "imperfect." This is non-functional for any research purpose where accuracy matters.

ChatGPT Search: 67% error rate Wrong two-thirds of the time. What's worse: it rarely acknowledged uncertainty. Out of 200 responses, ChatGPT signaled lack of confidence only 15 times. It never declined to answer. When it didn't know, it guessed-with the same authoritative tone as when it was correct.

Google Gemini: ~60% error rate More than half of Gemini's citations were fabricated or broken URLs. Google, the company that built its empire on search relevance, produced an AI search tool that hallucinates links more than half the time.

Perplexity: 37% error rate The "best" performer was still wrong more than one-third of the time. Perplexity markets itself as the "answer engine" that synthesizes reliable information. A 37% failure rate on basic citation tasks suggests that marketing outpaces reality. For researchers who need higher accuracy than Perplexity delivers, we compared the best Perplexity alternatives for defensible research work.

Perplexity Pro: Higher confidence, same problems Here's the most disturbing finding: premium versions of these tools were more confidently wrong than free versions. Perplexity Pro answered more queries correctly than the free version, but when it was wrong, it was more certain about it. Users paying $20/month got more authoritative-sounding misinformation.

The pattern is consistent across platforms: these tools present guesses as facts. They don't know what they don't know. And they cost $20-40/month for the privilege of being confidently misled.

The Fabrication Problem

The study revealed a specific failure mode that should terrify researchers: URL fabrication.

When Grok-3 cited a source, 77% of the time the link led to a 404 error or a fabricated URL that never existed. Gemini and Grok-3 both cited broken or invented URLs in more than half of their responses.

This isn't a minor technical glitch. This is the foundation of research integrity crumbling. A citation without a working link is unverifiable. When the link never existed, the citation is fictional. Researchers who paste these citations into their work are building on quicksand.

Consider the downstream effects:

A consultant cites a non-existent McKinsey study in a client memo
A journalist references a fabricated New York Times article
An analyst includes phantom market data in an investment thesis
A grad student submits a paper with bogus citations

In each case, the AI tool provided what looked like a legitimate citation. The DOI looked right. The URL structure seemed plausible. But it was invented. And the user, trusting the tool, used it without verification.

Ignoring Publisher Boundaries

The study uncovered another troubling pattern: AI search tools ignore robots.txt and access content publishers have explicitly blocked.

Five of the eight tested tools have publicly known crawlers, meaning publishers can block them via robots.txt. The study found evidence that these tools accessed content anyway.

Perplexity's access violations: National Geographic has blocked Perplexity's crawlers. Yet Perplexity correctly identified all 10 excerpts from National Geographic articles in the test. It shouldn't have had access to this content. It did anyway.

The New York Times has also blocked Perplexity's crawler. Yet Press Gazette reported that NYT was Perplexity's top-referred news site in January 2025, with 146,000 visits. The tool is accessing content publishers explicitly prohibited.

What this means: Even when AI search tools return correct citations, the underlying data may have been obtained unethically or illegally. Researchers using these citations may unknowingly be benefiting from content scraping that violates publisher policies and potentially copyright law.

Licensing Deals Don't Fix the Problem

OpenAI and Perplexity have both pursued licensing deals with news publishers. OpenAI has 17+ deals with major outlets. Perplexity has its "Publishers Program" with revenue-sharing.

The Columbia study tested whether these partnerships improved accuracy. They didn't.

Time magazine has deals with both OpenAI and Perplexity. While it was among the more accurately identified publishers, none of the models got it right 100% of the time.

The San Francisco Chronicle is part of a "strategic content partnership" with OpenAI. ChatGPT correctly identified only 1 of 10 excerpts from the Chronicle. In that one correct instance, it still failed to provide a working URL.

Licensing deals provide legal cover for the AI companies. They don't provide accuracy for the users. A tool can have permission to access content and still cite it incorrectly.

Why This Matters for Serious Research

The defenders of AI search will say: "These tools are starting points, not final sources. Always verify."

This response misses the point. The problem isn't that AI search is imperfect. The problem is that it's confidently imperfect in ways that defeat human verification.

Verification at scale is impossible: If you're doing deep research, you might need 50+ citations. If the tool is wrong 60% of the time, you need to verify 30 citations manually. At 5 minutes per citation, that's 2.5 hours of verification work-defeating the time savings of using AI search in the first place.

Confidence defeats skepticism: Human psychology research shows we trust confident sources more. When an AI presents information with certainty-"According to a 2024 Harvard Business Review study..."-we're less likely to question it. The authoritative tone is a feature that undermines the verification process.

Spot-checking doesn't work: Many users verify 2-3 citations, find them correct, and assume the rest are too. But AI errors aren't uniformly distributed. A tool might get all WSJ citations right and completely hallucinate citations from smaller outlets. Spot-checking creates false confidence.

The citation is just the start: Even when a citation is real, the AI might misrepresent what the source says. The Columbia study only tested whether tools could identify sources-not whether they accurately summarized them. A real citation with a false summary is arguably more dangerous than a fabricated citation, because it's harder to catch.

What the Study Doesn't Capture

The Columbia study was rigorous but narrow. It tested one task: source identification from direct quotes. It didn't test:

Whether AI search accurately summarizes complex research
Whether it properly contextualizes findings within a field
Whether it distinguishes between high-quality and low-quality sources
Whether it understands the difference between correlation and causation
Whether it can trace the evolution of ideas through multiple papers

These are the tasks that matter for serious research. And there's little reason to believe AI search tools perform better on these more complex tasks than they do on basic citation.

The Real Risk: Erosion of Verification Culture

The most insidious effect of AI search tools isn't the errors themselves. It's the cultural shift they enable.

Before AI search, researchers developed verification habits. You found a source, you checked it, you read the surrounding context. The process was slow but reliable.

AI search promises to eliminate the slow part. But in doing so, it threatens to eliminate the reliable part too. When you can generate 20 citations in 30 seconds, the mental model shifts from "I need to verify everything" to "I'll spot-check a few." That's a dangerous shift when the error rate is 60%.

The tools are training a generation of researchers to trust speed over accuracy. To prefer confident answers over correct ones. To value volume of citations over quality of sources. That is exactly why teams also need to learn how to structure research for LLM citations - not to trust AI answers more, but to publish evidence in a form models can quote without inventing connective tissue.

This isn't a technology problem. It's a workflow problem. And it's one that gets harder to fix the more embedded these tools become.

What to Do Instead

If you're doing research where accuracy matters-consulting reports, investment memos, journalism, academic papers-you need a different approach.

1. Use AI for discovery, not citation AI tools can help you find what to read. They can identify relevant papers, suggest search terms, map out research areas. But the actual citation should come from you reading the source and confirming it says what you think it says.

2. Build verification into your workflow Don't treat verification as a final step. Verify as you go. When you find a citation, open it immediately. If the link is broken or the content doesn't match, discard it then, not three days later when you're finalizing the report.

3. Prefer tools that show their work Some research tools provide confidence ratings, source diversity metrics, and audit trails. These features aren't just nice-to-haves-they're essential for research integrity. A citation without provenance is a liability.

4. Download reports, don't rely on chat history If you use AI for research, get the output in a downloadable, shareable format. Chat histories get lost. Ephemeral responses can't be audited. Research you can't defend isn't research-it's guesswork with formatting.

5. Know when not to use AI search For quick lookups where being slightly wrong doesn't matter, AI search is fine. For research that informs decisions, spending, or public claims, use traditional search and primary sources. The time savings of AI search isn't worth the credibility risk.

The Bottom Line

The Columbia study isn't a takedown of AI. It's a reality check. These tools are impressive technology with genuine use cases. But they are not research-grade citation systems. Using them as if they are will damage your work and your credibility.

The 60%+ error rate isn't a bug that will be fixed in the next update. It's a fundamental limitation of systems that generate plausible-sounding text based on pattern matching rather than understanding. The tools don't know when they're wrong because they don't know what "right" means in any meaningful sense.

For researchers, the implication is clear: AI search can be part of your workflow, but it can't be the foundation. The tools that save you time on discovery will cost you dearly if you trust them for verification.

The study's authors put it well: "Chatbots' authoritative tone masks their flaws, potentially eroding trust in credible journalism." The same applies to research. Confident misinformation is more dangerous than admitted ignorance.

If you're building research that matters-reports clients will act on, journalism the public will read, analysis that informs decisions-build it on sources you can verify, defend, and stand behind. AI search can point you toward those sources. But only you can make them trustworthy.

Rabbit Hole delivers confidence-rated research with verified citations and downloadable reports. See the difference at gorabbithole.ai.

AI Search Is Confidently Wrong: What the Columbia Study Means for Researchers

What the Study Actually Tested

The Specific Damage by Platform

The Fabrication Problem

Ignoring Publisher Boundaries

Licensing Deals Don't Fix the Problem

Why This Matters for Serious Research

What the Study Doesn't Capture

The Real Risk: Erosion of Verification Culture

What to Do Instead

The Bottom Line

Related Articles

The 2026 Buyer's Guide to AI-Powered Research Assistants

ChatGPT Deep Research vs Perplexity vs Rabbit Hole: Which One Cites Sources That Actually Exist?

AI Patent Search: From IPC Code to Cited Report in 5 Minutes

Ready to try honest research?