Bestseller
Intelligent Extraction Various Elements Information

Webpage Readable Content Extraction

Intelligently extracts key elements of articles

Data & API Features

Intelligently extracts readable content from webpages;
Provides HTML code of the webpage's readable content;
Supports passing either webpage HTML or webpage URL parameters;
Supports extraction of various elements information including article title, author, text direction, language, content, content (without HTML tags, divided by paragraphs), article length, excerpt, website name, publication time;
Second-level parsing performance, supporting high concurrency;
Supports HTTPS (TLS v1.0 / v1.1 / v1.2 / v1.3) for all interfaces;
Fully compatible with Apple ATS;
Nationwide multi-node CDN deployment;
Rapid response of the interface, with multiple servers building API interface load balancing.
Webpage Readable Content Extraction
Annual Subscription
$19$49
Try it for free!
Sign in to get a trial key and test all APIs.
Sign In
Secure payment by Stripe

API Document

HTTP Protocol:HTTPS

HTTP Method:POST

HTTP Endpoint:https://api.gugudata.io/v1/websitetools/readability?appkey={{appkey}}

Response Type:application/json; charset=utf-8

DEMO Endpoint:https://api.gugudata.io/v1/websitetools/readability/demo

Live Demo:Try Interactive Demo

Full API Docs:developers.gugudata.io

API Request Parameters

NameTypeIs RequiredDefault ValueRemark
appkeystringtrueYOUR_APPKEYObtained after payment
htmlstringfalseYOUR_VALUEThe webpage HTML content to be extracted, choose either this parameter or url
urlstringfalseYOUR_VALUEThe webpage URL to be extracted, choose either this parameter or html. (Issues caused by the source site's anti-crawling measures that prevent normal webpage content requests for subsequent processing are not handled)

API Response Parameters

NameTypeRemark
DataStatus.RequestParameterstringAPI request parameter
DataStatus.StatusCodeintAPI return status code
DataStatus.StatusDescriptionstringAPI return status description
DataStatus.ResponseDateTimestringAPI data return time
DataStatus.DataTotalCountintTotal data count under this condition, generally used for pagination
Data.TitlestringArticle title
Data.BylinestringArticle author
Data.DirstringArticle text direction
Data.LangstringArticle language
Data.ContentstringArticle content
Data.TextContentstringArticle content (without HTML tags, divided by paragraphs)
Data.LengthintArticle length
Data.ExcerptstringArticle excerpt
Data.SiteNamestringWebsite name
Data.PublishedTimestring[]Article publication time

API Response Status Codes

Status CodeExplanation of Status CodeRemarks
200API responding normally For business status codes, see below under API Custom Status Codes.
400Parameter error
402APPKEY error Please check if the APPKEY provided is the one obtained from the developer center.
403Account overdue Please pay attention to the E-mail reminders regarding order expiration.
429Request rate limited Requests cannot exceed 5 times per second. The CDN layer intelligently determines based on the frequency of IP requests. General high-frequency requests do not trigger this status code.
500API response error

Code Snippets Run In Postman

Others also bought