Bestseller
Content Extraction Web Scraping Text Processing

Article Extractor

Extract clean article content from any webpage URL. Automatically removes ads, navigation, and other non-content elements to provide clean, readable article text with title, content, and metadata. For HTML string extraction, please use the /v1/article/extractFromHtml endpoint.

Data & API Features

Extract clean article content from any webpage URL;
Automatic removal of ads, navigation, and non-content elements;
Extract article title, content, author, publication date, and metadata;
Separate endpoint available for HTML string extraction (/v1/article/extractFromHtml);
High-quality content extraction with intelligent parsing;
Full API support for HTTPS (TLS v1.0 / v1.1 / v1.2 / v1.3);
Fully compatible with Apple ATS;
Nationwide multi-node CDN deployment;
Ultra-fast response, API interface load balancing built with multiple servers.
Article Extractor
Annual Subscription
$49$99
Try it for free!
Sign in to get a trial key and test all APIs.
Sign In
Secure payment by Stripe

API Document

HTTP Protocol:HTTPS

HTTP Method:POST

HTTP Endpoint:https://api.gugudata.io/v1/article/extract

Response Type:application/json; charset=utf-8

DEMO Endpoint:https://api.gugudata.io/v1/article/extract/demo

Live Demo:Try Interactive Demo

API Request Parameters

NameTypeIs RequiredDefault ValueRemark
appkeystringtrueYOUR_APPKEYAPI key obtained after payment, can be passed as query parameter or in request body
urlstringtrueN/AThe URL of the webpage to extract article content from

API Response Parameters

NameTypeRemark
DataStatus.StatusCodeintAPI response status code
DataStatus.StatusDescriptionstringAPI response status description
DataStatus.ResponseDateTimestringAPI response timestamp
DataStatus.DataTotalCountintTotal data count under this condition, usually used for pagination calculation
Data.urlstringSource URL of the article
Data.titlestringExtracted article title
Data.descriptionstringArticle description/summary
Data.linksarrayArray of links contained in the article
Data.imagestringMain article image URL
Data.contentstringExtracted article content (HTML format, with ads and navigation removed)
Data.authorstringArticle author (if available, may be empty string)
Data.faviconstringWebsite favicon URL
Data.sourcestringSource website domain (e.g., sohu.com)
Data.publishedstringArticle publication date/time (format: YYYY-MM-DD HH:MM)
Data.ttrintEstimated reading time (Time to Read, in minutes)
Data.typestringArticle type (e.g., news, article, etc.)

API Response Status Codes

Status CodeExplanation of Status CodeRemarks
200API responding normally For business status codes, see below under API Custom Status Codes.
400Parameter error
402APPKEY error Please check if the APPKEY provided is the one obtained from the developer center.
403Account overdue Please pay attention to the E-mail reminders regarding order expiration.
429Request rate limited Requests cannot exceed 5 times per second. The CDN layer intelligently determines based on the frequency of IP requests. General high-frequency requests do not trigger this status code.
500API response error

Code Snippets Run In Postman

Others also bought