When to Use DeepResearch
Use DeepResearch when you need:- In-depth analysis - Complex research across multiple sources
- Report generation - Markdown or PDF output with citations
- Structured data extraction - Research results in custom JSON formats
- Background processing - Long-running research without blocking your application
Features
Multi-Source Research
Searches web, academic, and proprietary sources in a single task.
Research Modes
Choose fast for instant answers, standard for quick research, or heavy for complex analysis.
Multiple Outputs
Get results as markdown, PDF, or structured JSON.
File Analysis
Attach PDFs, images, and documents for analysis.
URL Extraction
Include specific URLs to analyze as part of research.
Webhooks
Get notified when research completes.
Quick Start
Create a Research Task
Wait for Completion
Task Statuses
When you create a task, it goes through the following statuses:| Status | Description |
|---|---|
queued | Task is waiting to start due to rate limits or capacity |
running | Task is actively researching |
completed | Research finished successfully |
failed | Research failed (check error field) |
cancelled | Task was cancelled by user |
awaiting_input | HITL checkpoint active, waiting for user response |
paused | HITL checkpoint timed out, state saved to S3 |
Queued Tasks
Tasks may be queued when:- Your organization has multiple concurrent tasks running
- System capacity is temporarily limited
The
wait method handles queued tasks automatically. It continues polling until the task completes, fails, or is cancelled.Research Modes
| Mode | Price | Best For | Max Steps |
|---|---|---|---|
| fast | $0.10 | Quick queries, batch processing | 10 |
| standard | $0.50 | Balanced research | 15 |
| heavy | $2.50 | Complex topics requiring fact verification | 15 |
| max | $15.00 | Exhaustive research with maximum quality | 25 |
Output Formats
Markdown (Default)
Markdown + PDF
Structured JSON
Get research results in a custom schema using JSON Schema specification:The schema must be a valid JSON Schema. Use
type, properties, required, items, and other standard JSON Schema keywords.Guiding research and output
Two parameters let you control how the agent researches and what the final report looks like:| Parameter | Controls |
|---|---|
research_strategy | The research phase — what to search for, which sources to prioritise, methodology |
report_format | The output — structure, style, length, presentation (overrides default formatting) |
The older
strategy parameter is deprecated. Use research_strategy instead. If both are provided, research_strategy takes precedence.Research strategy
Useresearch_strategy to tell the agent how to conduct research — what angles to explore, which sources matter most, and what methodology to follow.
Report format
Usereport_format to control the structure and style of the final output. This overrides default formatting.
Using both together
research_strategy and report_format work independently — you can use one or both. When combined, the agent follows your research strategy during the search phase, then formats the output according to your report format instructions.
Search configuration
Search parameters control which data sources are queried, what content is included/excluded, and how results are filtered by date or category. These parameters are specified in thesearch object within the request.
Search Type
Controls which backend search systems are queried:"all"(default): Searches both web and proprietary data sources"web": Searches only web sources (general web search, news, articles)"proprietary": Searches only proprietary data sources (academic papers, finance data, patents, etc.)
Included Sources
Restricts search to only the specified source types. When specified, only these sources will be searched. If the AI agent attempts to use other sources, they will be ignored. Available source types:"web": General web search results (news, articles, websites)"academic": Academic papers and research databases (ArXiv, PubMed, BioRxiv/MedRxiv, Clinical trials, FDA drug labels, WHO health data, NIH grants, Wikipedia)"finance": Financial and economic data (Stock/crypto/FX prices, SEC filings, Company financial statements, Economic indicators, Prediction markets)"patent": Patent and intellectual property data (USPTO patent database, Patent abstracts, claims, descriptions)"transportation": Transit and transportation data (UK National Rail schedules, Maritime vessel tracking)"politics": Government and parliamentary data (UK Parliament members, bills, votes)"legal": Case law and legal data (UK court judgments, Legislation text)
Excluded Sources
Excludes specific source types from search results. Uses the same source type values asincluded_sources. Cannot be used simultaneously with included_sources (use one or the other).
Source Biases
Soft ranking hints that influence (but don’t hard-filter) which sources appear in results. Keys are domains or URL paths, values are integers from -5 (strong demotion) to +5 (strong boost). Unlikeincluded_sources/excluded_sources, biases adjust ranking without removing any sources entirely.
Start Date
Format: ISO date format (YYYY-MM-DD)
Filters search results to only include content published or dated on or after this date. Applied to both publication dates and event dates when available. Works across all source types.
End Date
Format: ISO date format (YYYY-MM-DD)
Filters search results to only include content published or dated on or before this date. Applied to both publication dates and event dates when available. Works across all source types.
Category
Filters results by a specific category. The exact categories available depend on the data source. Category values are source-dependent and may not be applicable to all source types.Country Code
Format: ISO 3166-1 alpha-2 code (e.g.,"US", "GB", "DE")
Filters web search results to prioritize content from a specific country or region. This affects web search results by biasing towards content relevant to the specified location.
Country code filtering primarily affects web search results. Academic and proprietary data sources may not support location-based filtering.
Important Notes
Parameter Enforcement
Request-level parameters are enforced and cannot be overridden by the AI agent during research. This ensures consistent search behavior throughout the research process. Tool-level source specifications are ignored if request-level sources are specified.Date Filtering
Dates are applied to both publication dates and event dates when available. ISO format (YYYY-MM-DD) is required. Date filtering works across all source types. If only start_date is provided, results include all content from that date forward. If only end_date is provided, results include all content up to that date. Both dates can be combined for a specific date range.
File Attachments
Analyze documents as part of research:File Uploads
Deep Research accepts file attachments via thefiles array in the request body. Files are validated on upload and rejected with a 400 status if they violate any constraints.
Supported File Types
| Type | MIME Type | Extensions | Max Size |
|---|---|---|---|
application/pdf | 50 MB | ||
| PNG | image/png | .png | 20 MB |
| JPEG | image/jpeg | .jpg, .jpeg | 20 MB |
| GIF | image/gif | .gif | 20 MB |
| WebP | image/webp | .webp | 20 MB |
| Plain text | text/plain | .txt, .md, .log | 10 MB |
| CSV | text/csv | .csv | 10 MB |
| Markdown | text/markdown | .md | 10 MB |
| Word | application/vnd.openxmlformats-officedocument.wordprocessingml.document | .docx | 50 MB |
| Excel | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | .xlsx | 20 MB |
| PowerPoint | application/vnd.openxmlformats-officedocument.presentationml.presentation | .pptx | 50 MB |
How Files Are Processed
Most file types are passed directly to the LLM as native file content parts. The exception is PPTX, which is not natively supported by Claude/Gemini. PPTX files are automatically converted to markdown text (slide-by-slide) before being sent to the model. Extracted text is truncated at 500K characters to prevent context overflow.Error Responses
All validation errors return HTTP400 with a JSON body:
Unsupported file type
Returned when the MIME type is not in the whitelist.Extension mismatch
Returned when the file extension doesn’t match the declared MIME type.Per-file size exceeded
Returned when a single file exceeds the limit for its type.Total size exceeded
Returned when the combined size of all files exceeds 100 MB.Structural errors
Returned when the file object is malformed (missing fields, wrong types, invalid data URL format).Too many files
Tools
Thetools parameter controls which optional capabilities the research agent can use during a task. All tools are off by default and must be explicitly enabled.
| Tool | Description |
|---|---|
code_execution | Run Python code in a sandboxed environment. Required for XLSX/PPTX/DOCX deliverable generation. |
screenshots | Capture visual screenshots of web pages. The agent decides when screenshots add value (charts, dashboards, infographics). |
You enable tools — the agent decides when to use them. For example, enabling
screenshots does not screenshot every page; the agent selects pages where visual context is valuable.Screenshots
Whenscreenshots is enabled, the agent can capture visual screenshots of web pages during research. Screenshots appear in the images array alongside charts with image_type: "screenshot".
The agent autonomously decides when to screenshot — typically for pages with charts, dashboards, infographics, or other visual content that adds context to the report. Screenshots are embedded in the markdown report and rendered in PDF/PPTX/DOCX deliverables.
Screenshot-specific fields on ImageMetadata:
| Field | Type | Description |
|---|---|---|
source_url | string | The original web page URL that was screenshotted |
captured_at | int | Unix timestamp (ms) when the screenshot was taken |
| Limit | Value |
|---|---|
| Max screenshots per task | 15 |
| Max download size | 5 MB |
| Report image cap | 1280 × 4000 px |
Code execution
Whencode_execution is enabled, the agent can run Python code in a sandboxed environment to process data, perform calculations, or generate deliverables.
Limits:
| Limit | Value |
|---|---|
| Language | Python only |
| Timeout per execution | 5–60 seconds (default 30s) |
| Network access | None (sandbox is isolated) |
| Output | Text only via print() |
URL Extraction
Include specific URLs to analyze:Task Management
Check Status
Add Follow-up Instructions
While a task is running, you can add instructions to refine or adjust the scope of the research report to guide the research and report generation process.Submit instructions as early as possible during the research phase. Check task status to know when research has completed.
Cancel a Task
Delete a Task
List All Tasks
Webhooks
Webhooks provide real-time notifications when a DeepResearch task completes or fails, eliminating the need for polling.When to Use Webhooks
| Approach | Best For |
|---|---|
| Webhooks | Event-driven workflows |
| Polling | Simple scripts, real-time progress tracking |
Setting Up Webhooks
When you provide awebhook_url, the server generates a cryptographic secret for signature verification:
Webhook URLs must use HTTPS. HTTP URLs are rejected for security.
Webhook Payload
When the task completes or fails, your endpoint receives a POST request with the full task data:Request Headers
Each webhook request includes headers for verification:| Header | Description |
|---|---|
X-Webhook-Signature | HMAC-SHA256 signature in format sha256=<hex_signature> |
X-Webhook-Timestamp | Unix timestamp (milliseconds) when the request was sent |
Content-Type | application/json |
User-Agent | Valyu-DeepResearch/1.0 |
Verifying Webhook Signatures
Always verify the signature to ensure the webhook is authentic:Retry Behavior
The webhook service automatically retries failed deliveries:| Property | Value |
|---|---|
| Maximum retries | 5 attempts |
| Timeout per request | 15 seconds |
| Backoff strategy | Exponential: 1s → 2s → 4s → 8s → 16s |
| 4xx errors | No retry (client error) |
| 5xx errors | Will retry (server error) |
Return a
2xx status code quickly to acknowledge receipt. Process the webhook payload asynchronously to avoid timeouts.Webhook Events
Webhooks are triggered for:| Event | When |
|---|---|
completed | Research finished successfully |
failed | Research encountered an error |
Webhooks are not sent for
cancelled tasks. If you need to track cancellations, use the status endpoint or list endpoint to check task states.Complete Webhook Flow
Progress Callbacks
Track progress in real-time:Response Structure
Completed Task
Error Handling
Human-in-the-loop
DeepResearch supports optional checkpoints that pause execution for human review. Enable planning questions, plan review, source filtering, or outline review to guide the research process.HITL Guide
Learn how to configure and respond to HITL checkpoints
Best Practices
Polling strategy
- Fast mode: Poll every 2-5 seconds, timeout after 10 minutes
- Standard mode: Poll every 5-10 seconds, timeout after 30 minutes
- Heavy mode: Poll every 10-30 seconds, timeout after 120 minutes
- Max mode: Poll every 30-60 seconds, timeout after 180 minutes
- Use webhooks for production to avoid polling overhead
Recommended timeouts
| Mode | Recommended Timeout |
|---|---|
fast | 10 minutes (600 seconds) |
standard | 30 minutes (1800 seconds) |
heavy | 120 minutes (7200 seconds) |
max | 180 minutes (10800 seconds) |
Cost optimization
- Use
fastmode for quick lookups and simple questions - Use
standardmode for moderate research needs - Use
heavymode for complex topics requiring fact verification - Use
maxmode only for exhaustive research requiring maximum quality - Filter sources with
searchconfig to focus on relevant content - Set
start_dateandend_dateto limit scope
Research quality
- Provide clear, specific research queries
- Use
research_strategyto guide the research approach — tell the agent what sources, methodology, and angles to prioritise - Use
report_formatto control the output — specify structure, length, style, and any tables or sections you want - Add
contextto file attachments
Limitations
| Limit | Value |
|---|---|
| Query length | 25,000 characters |
research_strategy + report_format combined | 15,000 characters |
File attachment context | 10,000 characters per file |
| Maximum files per request | 10 |
| Maximum URLs per request | 10 |
| Maximum MCP servers | 5 |
| Maximum previous reports for context | 3 |
| Recommended file size | < 10MB per file |
Next Steps
Batch Processing
Process multiple research tasks efficiently
API Reference
Complete endpoint documentation
Python SDK
Python SDK reference
TypeScript SDK
TypeScript SDK reference

