Bestseller
Intelligent Extraction Various Elements Information
Webpage Readable Content Extraction
Intelligently extracts key elements of articles
Data & API Features
Intelligently extracts readable content from webpages;
Provides HTML code of the webpage's readable content;
Supports passing either webpage HTML or webpage URL parameters;
Supports extraction of various elements information including article title, author, text direction, language, content, content (without HTML tags, divided by paragraphs), article length, excerpt, website name, publication time;
Second-level parsing performance, supporting high concurrency;
Supports HTTPS (TLS v1.0 / v1.1 / v1.2 / v1.3) for all interfaces;
Fully compatible with Apple ATS;
Nationwide multi-node CDN deployment;
Rapid response of the interface, with multiple servers building API interface load balancing.

Annual Subscription
$19$49
Try it for free!
Sign In Sign in to get a trial key and test all APIs.
Secure payment by Stripe
API Document
HTTP Protocol:HTTPS
HTTP Method:POST
HTTP Endpoint:https://api.gugudata.io/v1/websitetools/readability?appkey={{appkey}}
Response Type:application/json; charset=utf-8
DEMO Endpoint:https://api.gugudata.io/v1/websitetools/readability/demo
Live Demo:Try Interactive Demo
Full API Docs:developers.gugudata.io
API Request Parameters
| Name | Type | Is Required | Default Value | Remark |
|---|---|---|---|---|
| appkey | string | true | YOUR_APPKEY | Obtained after payment |
| html | string | false | YOUR_VALUE | The webpage HTML content to be extracted, choose either this parameter or url |
| url | string | false | YOUR_VALUE | The webpage URL to be extracted, choose either this parameter or html. (Issues caused by the source site's anti-crawling measures that prevent normal webpage content requests for subsequent processing are not handled) |
API Response Parameters
| Name | Type | Remark |
|---|---|---|
| DataStatus.RequestParameter | string | API request parameter |
| DataStatus.StatusCode | int | API return status code |
| DataStatus.StatusDescription | string | API return status description |
| DataStatus.ResponseDateTime | string | API data return time |
| DataStatus.DataTotalCount | int | Total data count under this condition, generally used for pagination |
| Data.Title | string | Article title |
| Data.Byline | string | Article author |
| Data.Dir | string | Article text direction |
| Data.Lang | string | Article language |
| Data.Content | string | Article content |
| Data.TextContent | string | Article content (without HTML tags, divided by paragraphs) |
| Data.Length | int | Article length |
| Data.Excerpt | string | Article excerpt |
| Data.SiteName | string | Website name |
| Data.PublishedTime | string[] | Article publication time |
API Response Status Codes
| Status Code | Explanation of Status Code | Remarks |
|---|---|---|
| 200 | API responding normally | For business status codes, see below under API Custom Status Codes. |
| 400 | Parameter error | |
| 402 | APPKEY error | Please check if the APPKEY provided is the one obtained from the developer center. |
| 403 | Account overdue | Please pay attention to the E-mail reminders regarding order expiration. |
| 429 | Request rate limited | Requests cannot exceed 5 times per second. The CDN layer intelligently determines based on the frequency of IP requests. General high-frequency requests do not trigger this status code. |
| 500 | API response error |


