ChatGPT Acquires Autonomous Skills for Advanced Research Tasks

ChatGPT

OpenAI Unveils Deep Research: A Groundbreaking Tool for Complex Online Research

OpenAI has introduced an advanced feature known as Deep Research, enabling ChatGPT to perform intricate, multi-step research tasks online within a fraction of the time required by human researchers. This innovative capability marks a pivotal advancement in the pursuit of artificial general intelligence (AGI).

Transforming Research with Agentic AI

Deep Research equips ChatGPT with the ability to autonomously gather, analyze, and synthesize information from numerous online resources. Users can initiate a detailed research report simply by inputting a prompt, making the tool comparable in output to a skilled research analyst.

Leveraging a variant of OpenAI’s future “o3” model, Deep Research aims to alleviate the burden of time-consuming information retrieval. Users can request a variety of tasks, from competitive assessments of streaming services to comprehensive policy reviews or tailored suggestions for new commuter bicycles, all while ensuring accuracy and reliability.

Every output generated by Deep Research is accompanied by full citations and documentation, allowing users to easily verify the information presented.

This tool proves especially proficient at revealing niche or unexpected insights, which can be beneficial to a wide array of sectors including finance, scientific research, policy-making, and engineering. However, OpenAI also sees the potential for general consumers to benefit, such as shoppers seeking highly personalized product recommendations.

“People will post a lot of great examples, but here is a fun one: I am in Japan right now and looking for an old NSX. I spent hours searching unsuccessfully for the perfect one. I was about to give up and Deep Research just… found it.” — Sam Altman

February 3, 2025

How Deep Research Works

Users can access Deep Research through the ChatGPT interface by selecting the “Deep Research” option. They can also upload supporting documents or spreadsheets for added context.

Once activated, the AI undertakes a methodical, multi-step research process, which typically takes between 5 to 30 minutes to complete. A sidebar in the interface provides users with updates regarding the steps taken and the resources consulted during the research. This allows users to focus on other activities while awaiting the final report.

The results are conveyed within the chat interface as sophisticated, well-documented reports. In the near future, OpenAI plans to enhance these reports by including images, data visualizations, and graphs, thereby providing greater clarity and context.

Unlike the performance of GPT-4o, which excels in dynamic, multi-modal conversations, Deep Research emphasizes the delivery of depth and detail. Its rigorous citation capabilities and thorough analysis distinguish it, shifting the focus from rapid, summarized information to research-quality insights.

Tackling Real-World Problems

Deep Research employs advanced training techniques honed through real-world browsing and reasoning tasks across various fields. It utilizes reinforcement learning to autonomously plan and execute complex research tasks, allowing it to backtrack and adapt its methods based on newfound information.

The tool is capable of analyzing user-uploaded files, creating and refining graphs via Python, embedding multimedia such as generated images and web pages into its responses, and citing specific sentences from the sources. This extensive training produces a highly efficient agent for addressing multifaceted real-world challenges.

Performance Metrics and Benchmarks

OpenAI’s assessment of Deep Research involved a series of high-level evaluations collectively known as “Humanity’s Last Exam.” This comprehensive test includes over 3,000 questions encompassing diverse topics, including rocket science, linguistics, ecology, and classical studies, scrutinizing the AI’s capability to tackle complex problems.

The results have been notable, with Deep Research achieving an unprecedented 26.6% accuracy across all tested domains, significantly outperforming other models:

  • GPT-4o: 3.3%
  • Grok-2: 3.8%
  • Claude 3.5 Sonnet: 4.3%
  • OpenAI o1: 9.1%
  • DeepSeek-R1: 9.4%
  • Deep Research: 26.6% (with browsing + Python tools)

Additionally, Deep Research achieved an industry-leading score of 72.57% on the GAIA benchmark, which evaluates AI models on real-world questions that require reasoning, multimodal fluency, and adept use of various tools.

Understanding Limitations and Challenges

Despite the remarkable capabilities of Deep Research, OpenAI recognizes the presence of limitations as this technology continues to evolve.

The system has shown a reduced but still present tendency to “hallucinate” facts or form erroneous conclusions. Additionally, it can struggle to distinguish between credible and speculative sources and may exhibit inconsistent confidence levels, often appearing overly certain about uncertain information.

Users may experience minor formatting issues in reports and citations, along with potential delays in task initiation. OpenAI anticipates that these challenges will diminish as usage increases and refinements are made to the system.

The rollout of Deep Research will occur gradually, initially available to Pro users with a limit of 100 queries per month. The feature will soon extend to Plus and Team tiers, ultimately making its way to Enterprise users.

Currently, residents of the UK, Switzerland, and the European Economic Area are unable to access this feature, but OpenAI is actively working to expand availability to these regions.

In the coming weeks, accessibility will broaden to ChatGPT’s mobile and desktop platforms, with a long-term aim to connect to subscription-based or proprietary data repositories to enhance output robustness and personalization.

Looking ahead, OpenAI plans to integrate Deep Research with the “Operator,” a feature designed to take real-world actions, enabling ChatGPT to seamlessly manage tasks requiring both asynchronous online research and real-world execution.

FAQs about Deep Research

1. What types of research tasks can Deep Research assist with?

Deep Research can handle a variety of complex queries from competitive analysis and policy reviews to personalized product recommendations, transforming the landscape of online research.

2. How does Deep Research ensure the accuracy of its information?

Deep Research includes full citations and documentation alongside its outputs, allowing users to verify findings and assess the credibility of the information gathered.

3. Are there any current limitations of Deep Research users should be aware of?

While Deep Research is a formidable tool, it may occasionally produce inaccurate information and struggle with recognizing authoritative sources, as its capabilities continue to evolve and improve.

Scroll to Top