API Documentation
Welcome to the Radia Web Scraping API documentation. This API provides powerful endpoints for retrieving and parsing content from any website.
Getting Started
Overview of Radia's Comprehensive Web Scraping Platform
Platform Overview
Radia provides a comprehensive suite of tools to extract both clean Markdown content and structured JSON data from websites. Our platform is designed to handle everything from simple single-page scraping to complex multi-site data extraction workflows.
Key Features
Dynamic Scraping
Use the /scrape
and/extract
endpoints for on-the-fly scraping with custom parameters. Perfect for sites with dynamic content or one-off scraping needs.
Predefined Tools
Create and manage reusable scrapers through ourDashboard >> Scrapers. Each scraper is defined by three key components:
- A target URL or URL pattern
- A prompt that guides the extraction process
- A structured output schema using OpenAI's formatlearn more
Our visual schema editor helps you create and validate your output schemas without writing JSON manually. Try it out in our Playground.
Automated Scheduling
Schedule recurring scraping jobs using cron expressions directly from theDashboard >> Tasks. Access job results through our API at:
/tasks/{task_id}/runs
- View all runs for a task/tasks/{task_id}/runs/latest
- Get most recent results
Authentication
Important: All API endpoints require authentication via an API key. Generate your key in theDashboard >> API Keysand include it in every request header as:
Authorization: Bearer YOUR_API_KEY
Authorization
All API endpoints require authentication
Include an Authorization
header with your API key in all requests.
You can obtain your API key from theDashboard >> API Keyspage.
Example
const apiKey = 'YOUR_API_KEY';
const response = await fetch(
'https://api.radia.io/api/v2/scrape?url=www.example.com&format=json',
{
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
}
}
);
Scrape API
Retrieve cleaned HTML content from a specified URL
Scrape URL Content
Retrieve cleaned Markdown content from a URL. Returns HTML by default.
Parameters
Name | Type | Required | Description |
---|---|---|---|
url | string | Yes | The URL of the website to scrape. |
format | string | No | Output format. Use "json" to receive a JSON object containing the markdown. Default: html |
scroll | boolean | No | Try scrolling to reveal content. Defaults to true. If false, the page will be loaded as is and quicker response time. Default: true |
Response
Cleaned Markdown content as HTML string (default) or JSON object.string | object
Example
/v2/scrape
1GET /v2/scrape?url=https://example.com&format=json&should_click_elements=true2Authorization: Bearer YOUR_API_KEY
Extract API
Scrape and parse content from a specified URL into structured data
Extract Structured Data
Scrape content and extract structured data based on a prompt and JSON schema.
Parameters
Name | Type | Required | Description |
---|---|---|---|
url | string | Yes | The URL to scrape and parse. |
prompt | string | Yes | Prompt to guide data extraction (e.g., 'Extract product details'). |
response_format | object | Yes | JSON object describing the desired output structure (OpenAI function calling format). |
scroll | boolean | No | Try scrolling to reveal content. Defaults to true. If false, the page will be loaded as is and quicker response time. Default: true |
include_markdown | boolean | No | Include the cleaned Markdown content in the response. Defaults to false. Default: false |
Response
Extracted structured data, token count, and optionally the markdown content.object
Example
/v2/extract
1POST /v2/extract2Content-Type: application/json3Authorization: Bearer YOUR_API_KEY45{6 "url": "https://example-product-page.com",7 "prompt": "Extract the product name, price, and description.",8 "response_format": {9 "type": "object",10 "properties": {11 "product_name": { "type": "string", "description": "Name of the product" },12 "price": { "type": "number", "description": "Price of the product" },13 "description": { "type": "string", "description": "Product description" }14 },15 "required": ["product_name", "price"]16 },17 "should_click_elements": false,18 "include_markdown": true19}
Scrapers API
Manage scraper configurations
Available Endpoints
/v2/scrapers
/v2/scrapers
/v2/scrapers/{scraper_id}
/v2/scrapers/{scraper_id}/run
Get Scrapers
Get all scraper configurations for the authenticated user.
Response
A list of scraper configuration objects.array
Example
/v2/scrapers
1GET /v2/scrapers2Authorization: Bearer YOUR_API_KEY
Create Scraper
Create a new scraper configuration.
Parameters
Name | Type | Required | Description |
---|---|---|---|
scraper_name | string | Yes | A name for this scraper configuration. |
schema_id | string | Yes | ID of the schema defining the desired output structure. |
scraped_url | string | Yes | The target URL or URL pattern for the scraper. |
prompt | string | Yes | The prompt guiding the data extraction process. |
should_click_elements | boolean | No | Configure if the scraper should click elements. Defaults to false. Default: false |
headless | boolean | No | Configure if the scraper runs headlessly. Defaults to false. Default: false |
Response
The newly created scraper configuration object.object
Example
/v2/scrapers
1POST /v2/scrapers2Content-Type: application/json3Authorization: Bearer YOUR_API_KEY45{6 "scraper_name": "News Headline Scraper",7 "schema_id": "sch-def-456",8 "scraped_url": "https://news.example.com",9 "prompt": "Extract the main headlines from the homepage.",10 "should_click_elements": false,11 "headless": true12}
Get Scraper by ID
Get details of a specific scraper configuration.
Parameters
Name | Type | Required | Description |
---|---|---|---|
scraper_id | string | Yes | The ID of the scraper to retrieve (path parameter). |
Response
The requested scraper configuration object.object
Example
/v2/scrapers/{scraper_id}
1GET /v2/scrapers/s1c2r3p4-e5f6-7890-1234-abcdef1234562Authorization: Bearer YOUR_API_KEY
Run Scraper
Manually trigger a run of a specific scraper configuration.
Parameters
Name | Type | Required | Description |
---|---|---|---|
scraper_id | string | Yes | The ID of the scraper to run (path parameter). |
headless | boolean | No | Overrides scraper's default headless setting for this run (query parameter). |
Response
The result of the scraper run.object
Example
/v2/scrapers/{scraper_id}/run
1POST /v2/scrapers/s1c2r3p4-e5f6-7890-1234-abcdef123456/run?headless=true2Authorization: Bearer YOUR_API_KEY
Schemas API
Manage scraping output schemas
Available Endpoints
/v2/schemas
/v2/schemas
/v2/schemas/{schema_id}
Get Schemas
Get all extraction schemas for the authenticated user.
Response
A list of schema objects.array
Example
/v2/schemas
1GET /v2/schemas2Authorization: Bearer YOUR_API_KEY
Create Schema
Create a new extraction schema.
Parameters
Name | Type | Required | Description |
---|---|---|---|
schema_name | string | Yes | A descriptive name for the schema. |
schema_json | object | Yes | JSON object defining the extraction structure (OpenAI function call format). |
Response
The newly created schema object.object
Example
/v2/schemas
1POST /v2/schemas2Content-Type: application/json3Authorization: Bearer YOUR_API_KEY45{6 "schema_name": "Article Details",7 "schema_json": {8 "type": "object",9 "properties": {10 "title": { "type": "string", "description": "Article title" },11 "author": { "type": "string", "description": "Author name" },12 "publish_date": { "type": "string", "description": "Publication date" }13 },14 "required": ["title", "author"]15 }16}
Get Schema by ID
Get details of a specific extraction schema.
Parameters
Name | Type | Required | Description |
---|---|---|---|
schema_id | string | Yes | The ID of the schema to retrieve (path parameter). |
Response
The requested schema object.object
Example
/v2/schemas/{schema_id}
1GET /v2/schemas/sch-abc-1232Authorization: Bearer YOUR_API_KEY
Tasks API
Manage scheduled scraping jobs
Available Endpoints
/v2/tasks
/v2/tasks
/v2/tasks/{task_id}
/v2/tasks/{task_id}
/v2/tasks/{task_id}/run
/v2/tasks/{task_id}/runs
/v2/tasks/{task_id}/runs/latest
Get Tasks
Get all scheduled tasks for the authenticated user.
Response
A list of task objects.array
Example
/v2/tasks
1GET /v2/tasks2Authorization: Bearer YOUR_API_KEY
Create Task
Create a new scheduled task to run a scraper.
Parameters
Name | Type | Required | Description |
---|---|---|---|
scraper_id | string | Yes | ID of the scraper to run. |
task_name | string | Yes | A descriptive name for the task. |
cron_minute | string | Yes | Cron expression: minute (0-59 or *). |
cron_hour | string | Yes | Cron expression: hour (0-23 or *). |
cron_day_of_month | string | Yes | Cron expression: day of month (1-31 or *). |
cron_month | string | Yes | Cron expression: month (1-12 or *). |
cron_day_of_week | string | Yes | Cron expression: day of week (0-6 or *, Sunday=0). |
cron_timezone | string | No | Timezone for the schedule (e.g., 'America/New_York'). Defaults to UTC. Default: UTC |
Response
The newly created task object.object
Example
/v2/tasks
1POST /v2/tasks2Content-Type: application/json3Authorization: Bearer YOUR_API_KEY45{6 "scraper_id": "s1c2r3p4-e5f6-7890-1234-abcdef123456",7 "task_name": "Hourly Price Check",8 "cron_minute": "0",9 "cron_hour": "*",10 "cron_day_of_month": "*",11 "cron_month": "*",12 "cron_day_of_week": "*",13 "cron_timezone": "America/Los_Angeles"14}
Get Task By ID
Get details of a specific task by its ID.
Parameters
Name | Type | Required | Description |
---|---|---|---|
task_id | string | Yes | The ID of the task to retrieve (path parameter). |
Response
The requested task object.object
Example
/v2/tasks/{task_id}
1GET /v2/tasks/a1b2c3d4-e5f6-7890-1234-567890abcdef2Authorization: Bearer YOUR_API_KEY
Delete Task
Delete a specific scheduled task.
Parameters
Name | Type | Required | Description |
---|---|---|---|
task_id | string | Yes | The ID of the task to delete (path parameter). |
Response
Confirmation message.object
Example
/v2/tasks/{task_id}
1DELETE /v2/tasks/a1b2c3d4-e5f6-7890-1234-567890abcdef2Authorization: Bearer YOUR_API_KEY
Run Task
Manually trigger a run of the specified task.
Parameters
Name | Type | Required | Description |
---|---|---|---|
task_id | string | Yes | The ID of the task to run (path parameter). |
Response
The result of the scraper run initiated by the task.object
Example
/v2/tasks/{task_id}/run
1POST /v2/tasks/a1b2c3d4-e5f6-7890-1234-567890abcdef/run2Authorization: Bearer YOUR_API_KEY
Get Task Runs
Get a list of all historical runs for a specific task.
Parameters
Name | Type | Required | Description |
---|---|---|---|
task_id | string | Yes | The ID of the task (path parameter). |
Response
A list of task run preview objects, ordered by start time descending.array
Example
/v2/tasks/{task_id}/runs
1GET /v2/tasks/a1b2c3d4-e5f6-7890-1234-567890abcdef/runs2Authorization: Bearer YOUR_API_KEY
Get Latest Task Run
Get the result and details of the most recent run for a specific task.
Parameters
Name | Type | Required | Description |
---|---|---|---|
task_id | string | Yes | The ID of the task (path parameter). |
Response
The latest task run object including the result.object
Example
/v2/tasks/{task_id}/runs/latest
1GET /v2/tasks/a1b2c3d4-e5f6-7890-1234-567890abcdef/runs/latest2Authorization: Bearer YOUR_API_KEY
Authentication Required
Remember to include your API key in the request headers for authentication with all endpoints. Keep your API keys secure and never expose them in client-side code.