The Contents API enables you to extract clean, structured content from web pages with optional AI-powered processing, including summarization and structured data extraction.
Basic Usage
import { Valyu } from "valyu-js";
const valyu = new Valyu();
const response = await valyu.contents([
"https://en.wikipedia.org/wiki/Machine_learning"
]);
console.log(`Processed ${response.urls_processed} of ${response.urls_requested} URLs`);
response.results?.forEach(result => {
console.log(`Title: ${result.title}`);
console.log(`Content length: ${result.length} characters`);
console.log(`Content preview: ${result.content.substring(0, 200)}...`);
});
Parameters
URLs (Required)
| Parameter | Type | Description |
|---|
urls | string[] | Array of URLs to process (max 10 sync, max 50 async) |
Options (Optional)
| Parameter | Type | Description | Default |
|---|
summary | boolean | string | object | AI processing configuration: false (none), true (auto), string (custom), or JSON schema | false |
extractEffort | "normal" | "high" | "auto" | Processing effort level for content extraction | ”normal” |
responseLength | string | number | Content length per URL: "short" (25k), "medium" (50k), "large" (100k), "max", or custom | ”short” |
screenshot | boolean | Request page screenshots. When true, results include screenshot_url field | false |
interface ContentsResponse {
success: boolean;
error?: string | null;
tx_id?: string;
urls_requested?: number;
urls_processed?: number;
urls_failed?: number;
results?: ContentResult[];
total_cost_dollars?: number;
total_characters?: number;
}
interface ContentResult {
url: string;
title: string;
content: string | object; // string for raw content, object for structured
description?: string;
length: number;
price: number;
source: string;
summary_success?: boolean;
data_type?: string;
image_url?: Record<string, string>;
screenshot_url?: string; // Only present when screenshot=true
citation?: string;
}
Parameter Examples
Extract clean content without AI processing:
const response = await valyu.contents([
"https://www.python.org",
"https://nodejs.org"
]);
response.results?.forEach(result => {
console.log(`${result.title}: ${result.length} characters`);
});
AI Summary (Boolean)
Get automatic AI summaries of the extracted content:
const response = await valyu.contents([
"https://en.wikipedia.org/wiki/Artificial_intelligence"
], {
summary: true,
responseLength: "medium"
});
if (response.results?.[0]?.content) {
console.log("AI Summary:", response.results[0].content);
}
Custom Summary Instructions
Provide specific instructions for AI summarization:
const response = await valyu.contents([
"https://en.wikipedia.org/wiki/Artificial_intelligence"
], {
summary: "Summarize the main AI trends mentioned in exactly 3 bullet points",
responseLength: "medium",
extractEffort: "high"
});
Extract specific data points using JSON schema:
const response = await valyu.contents([
"https://www.openai.com"
], {
extractEffort: "high",
responseLength: "large",
summary: {
type: "object",
properties: {
company_name: {
type: "string",
description: "The name of the company"
},
industry: {
type: "string",
enum: ["tech", "finance", "healthcare", "retail", "other"],
description: "Primary industry sector"
},
key_products: {
type: "array",
items: { type: "string" },
maxItems: 5,
description: "Main products or services"
},
founded_year: {
type: "number",
description: "Year the company was founded"
}
},
required: ["company_name", "industry"]
},
});
if (response.results?.[0]?.content) {
console.log("Extracted data:", response.results[0].content);
}
Response Length Control
Control the amount of content extracted per URL:
const response = await valyu.contents([
"https://arxiv.org/abs/2301.00001",
"https://arxiv.org/abs/1706.03762",
"https://www.science.org/doi/10.1126/science.1234567"
], {
responseLength: "large", // More content for academic papers
summary: "Extract the main research findings and methodology",
extractEffort: "high"
});
Control the extraction quality and processing intensity:
// Normal (default) - Fast
const normalResponse = await valyu.contents(urls, {
extractEffort: "normal"
});
// High - Enhanced quality for complex layouts and JS heavy pages
const highQualityResponse = await valyu.contents(urls, {
extractEffort: "high"
});
// Auto - Intelligent effort selection
const autoResponse = await valyu.contents(urls, {
extractEffort: "auto"
});
Response Length Options
Control content length with predefined or custom limits:
// Predefined lengths
const shortResponse = await valyu.contents(urls, {
responseLength: "short" // 25k characters
});
const mediumResponse = await valyu.contents(urls, {
responseLength: "medium" // 50k characters
});
const largeResponse = await valyu.contents(urls, {
responseLength: "large" // 100k characters
});
const fullResponse = await valyu.contents(urls, {
responseLength: "max" // No limit
});
// Custom length
const customResponse = await valyu.contents(urls, {
responseLength: 15000 // Custom character limit
});
Use Case Examples
Research Paper Analysis
Build an AI-powered academic research assistant that extracts and analyzes research papers:
async function analyzeResearchPaper(paperUrl: string) {
const response = await valyu.contents([paperUrl], {
summary: {
type: "object",
properties: {
title: { type: "string" },
authors: {
type: "array",
items: { type: "string" }
},
abstract: { type: "string" },
key_contributions: {
type: "array",
items: { type: "string" },
maxItems: 5,
description: "Main contributions of the research"
},
methodology: {
type: "string",
description: "Research methodology and approach"
},
results_summary: {
type: "string",
description: "Summary of key findings and results"
},
implications: {
type: "string",
description: "Broader implications and significance"
},
citations_count: { type: "number" },
publication_date: { type: "string" }
},
required: ["title", "abstract", "key_contributions", "methodology"]
},
responseLength: "max",
extractEffort: "high"
});
if (response.success && response.results?.[0]?.content) {
const analysis = response.results[0].content as any;
console.log("=== Research Paper Analysis ===");
console.log(`Title: ${analysis.title}`);
console.log(`Authors: ${analysis.authors?.join(", ")}`);
console.log(`\nAbstract: ${analysis.abstract}`);
console.log("\nKey Contributions:");
analysis.key_contributions?.forEach((contrib: string, i: number) => {
console.log(`${i + 1}. ${contrib}`);
});
console.log(`\nMethodology: ${analysis.methodology}`);
console.log(`\nResults: ${analysis.results_summary}`);
console.log(`\nImplications: ${analysis.implications}`);
return analysis;
}
return null;
}
// Usage
const paperAnalysis = await analyzeResearchPaper(
"https://arxiv.org/abs/2024.01234"
);
E-commerce Product Intelligence
Create a product research tool that extracts comprehensive product data:
async function analyzeProducts(productUrls: string[]) {
const response = await valyu.contents(productUrls, {
summary: {
type: "object",
properties: {
products: {
type: "array",
items: {
type: "object",
properties: {
product_name: { type: "string" },
brand: { type: "string" },
price: { type: "string" },
original_price: { type: "string" },
discount_percentage: { type: "string" },
description: { type: "string" },
key_features: {
type: "array",
items: { type: "string" },
maxItems: 8
},
specifications: {
type: "object",
description: "Technical specifications"
},
customer_rating: { type: "number" },
review_count: { type: "number" },
availability: {
type: "string",
enum: ["in_stock", "out_of_stock", "limited", "pre_order"]
},
shipping_info: { type: "string" },
warranty_info: { type: "string" }
},
required: ["product_name", "price", "description"]
}
},
comparison_summary: {
type: "string",
description: "Overall comparison of the products"
}
}
},
extractEffort: "high",
responseLength: "large"
});
if (response.success && response.results?.[0]?.content) {
const analysis = response.results[0].content as any;
console.log("=== Product Analysis ===");
analysis.products?.forEach((product: any, i: number) => {
console.log(`\n${i + 1}. ${product.product_name}`);
console.log(` Brand: ${product.brand}`);
console.log(` Price: ${product.price}`);
console.log(` Rating: ${product.customer_rating}/5 (${product.review_count} reviews)`);
console.log(` Availability: ${product.availability}`);
if (product.key_features?.length > 0) {
console.log(" Key Features:");
product.key_features.forEach((feature: string) => {
console.log(` • ${feature}`);
});
}
});
console.log(`\n=== Comparison Summary ===`);
console.log(analysis.comparison_summary);
return analysis;
}
return null;
}
// Usage
const productComparison = await analyzeProducts([
"https://amazon.com/product1",
"https://bestbuy.com/product2",
"https://target.com/product3"
]);
Technical Documentation Processor
Build a documentation analysis tool that extracts API information and technical details:
async function processDocumentation(docUrls: string[]) {
const response = await valyu.contents(docUrls, {
summary: {
type: "object",
properties: {
documentation_overview: {
type: "string",
description: "Overview of what the documentation covers"
},
api_endpoints: {
type: "array",
items: {
type: "object",
properties: {
method: { type: "string" },
path: { type: "string" },
description: { type: "string" },
parameters: {
type: "array",
items: {
type: "object",
properties: {
name: { type: "string" },
type: { type: "string" },
required: { type: "boolean" },
description: { type: "string" }
}
}
},
response_format: { type: "string" }
}
}
},
authentication: {
type: "object",
properties: {
method: { type: "string" },
description: { type: "string" },
example: { type: "string" }
}
},
rate_limits: { type: "string" },
code_examples: {
type: "array",
items: {
type: "object",
properties: {
language: { type: "string" },
example: { type: "string" },
description: { type: "string" }
}
}
},
common_errors: {
type: "array",
items: { type: "string" }
}
},
required: ["documentation_overview", "api_endpoints", "authentication"]
},
extractEffort: "high",
responseLength: "large"
});
if (response.success && response.results?.[0]?.content) {
const docs = response.results[0].content as any;
console.log("=== API Documentation Analysis ===");
console.log(`\nOverview: ${docs.documentation_overview}`);
console.log("\n=== Authentication ===");
console.log(`Method: ${docs.authentication?.method}`);
console.log(`Description: ${docs.authentication?.description}`);
console.log("\n=== API Endpoints ===");
docs.api_endpoints?.forEach((endpoint: any, i: number) => {
console.log(`\n${i + 1}. ${endpoint.method} ${endpoint.path}`);
console.log(` Description: ${endpoint.description}`);
if (endpoint.parameters?.length > 0) {
console.log(" Parameters:");
endpoint.parameters.forEach((param: any) => {
const required = param.required ? "(required)" : "(optional)";
console.log(` • ${param.name} (${param.type}) ${required}: ${param.description}`);
});
}
});
if (docs.rate_limits) {
console.log(`\n=== Rate Limits ===`);
console.log(docs.rate_limits);
}
return docs;
}
return null;
}
// Usage
const apiDocs = await processDocumentation([
"https://docs.example.com/api-reference",
"https://developers.service.com/guide"
]);
Async Processing
For large-scale extraction (11-50 URLs) or non-blocking workflows, use async mode.
Async mode is required when submitting more than 10 URLs. Max 50 URLs per request, processed in batches of 5 with 120s timeout per URL (vs 25s sync). Jobs expire after 7 days.
Submit and wait
The simplest approach — use waitForJob() to block until the job completes:
// Submit — returns immediately
const job = await valyu.contents([
"https://example.com/page1",
"https://example.com/page2",
// ... up to 50 URLs
], {
asyncMode: true,
webhookUrl: "https://your-app.com/webhooks/valyu", // optional
});
console.log(`Job ID: ${job.job_id}`);
// Store the webhook_secret immediately — it is ONLY returned here
if (job.webhook_secret) {
await saveWebhookSecret(job.job_id, job.webhook_secret);
}
// Wait with progress tracking (SDK handles polling)
const result = await valyu.waitForJob(job.job_id, {
pollInterval: 5000, // ms between polls (default: 5000)
maxWaitTime: 7200000, // max ms to wait (default: 7200000)
onProgress: (s) => console.log(` ${s.status} — batch ${s.current_batch ?? "?"}/${s.total_batches ?? "?"}`),
});
result.results?.forEach(r => {
console.log(`${r.title}: ${r.length} characters`);
});
console.log(`Total cost: $${result.actual_cost_dollars}`);
Manual polling
If you prefer full control over the polling loop:
let status;
do {
status = await valyu.getContentsJob(job.job_id);
console.log(`Status: ${status.status}`);
if (!["completed", "partial", "failed"].includes(status.status)) {
await new Promise((r) => setTimeout(r, 2000));
}
} while (!["completed", "partial", "failed"].includes(status.status));
Async parameters
| Parameter | Type | Description | Default |
|---|
asyncMode | boolean | Process URLs asynchronously. Required for more than 10 URLs. | false |
webhookUrl | string | HTTPS URL to receive results via webhook POST. | - |
waitForJob() options:
| Parameter | Type | Description | Default |
|---|
pollInterval | number | Milliseconds between polls. | 5000 |
maxWaitTime | number | Max milliseconds to wait before timing out. | 7200000 |
onProgress | function | Callback invoked on each poll with the current status object. | - |
Webhook verification
The SDK exports a verifyContentsWebhookSignature() helper:
import { verifyContentsWebhookSignature } from "valyu-js";
const isValid = verifyContentsWebhookSignature(
rawBody, // raw request body string
signatureHeader, // X-Webhook-Signature header
timestampHeader, // X-Webhook-Timestamp header
webhookSecret // secret from initial 202 response
);
See the Content Extraction guide for manual verification examples.
Async response types
// Initial response (HTTP 202)
interface ContentsAsyncResponse {
success: boolean;
job_id: string;
status: "pending";
urls_total: number;
poll_url: string;
tx_id: string;
webhook_secret?: string; // ONLY returned here — store immediately
}
// Job status response (polling / waitForJob result)
interface ContentsJobResponse {
success: boolean;
job_id: string;
status: "pending" | "processing" | "completed" | "partial" | "failed";
urls_total: number;
urls_processed: number;
urls_failed: number;
created_at: number; // Milliseconds since epoch
updated_at: number;
current_batch?: number; // Present during "processing"
total_batches?: number; // Present during "processing"
results?: ContentResult[]; // Present when completed/partial
actual_cost_dollars?: number; // Present when completed/partial
error?: string; // Present when partial/failed
}
Error Handling
const response = await valyu.contents(urls, options);
if (!response.success) {
console.error("Contents extraction failed:", response.error);
return;
}
// Check for partial failures
if (response.urls_failed && response.urls_failed > 0) {
console.warn(`${response.urls_failed} of ${response.urls_requested} URLs failed`);
}
// Process successful results
response.results?.forEach((result, index) => {
console.log(`Result ${index + 1}:`);
console.log(` Title: ${result.title}`);
console.log(` URL: ${result.url}`);
console.log(` Length: ${result.length} characters`);
if (result.summary_success) {
console.log(` Summary: ${result.content}`);
}
});