Bestseller

Webpage Readable Content Extraction

Intelligently extracts key elements of articles

Intelligent ExtractionVarious Elements Information
 Web Tools | Text Processing

Data & API Features

  • Intelligently extracts readable content from webpages;
  • Provides HTML code of the webpage's readable content;
  • Supports passing either webpage HTML or webpage URL parameters;
  • Supports extraction of various elements information including article title, author, text direction, language, content, content (without HTML tags, divided by paragraphs), article length, excerpt, website name, publication time;
  • Second-level parsing performance, supporting high concurrency;
  • Supports HTTPS (TLS v1.0 / v1.1 / v1.2 / v1.3) for all interfaces;
  • Fully compatible with Apple ATS;
  • Nationwide multi-node CDN deployment;
  • Rapid response of the interface, with multiple servers building API interface load balancing.

API Document

HTTP Protocol:HTTPS

HTTP Method:POST

HTTP Endpoint:https://api.gugudata.io/v1/websitetools/readability

Response Type:application/json; charset=utf-8

DEMO Endpoint:https://api.gugudata.io/v1/websitetools/readability/demo

API Request Parameters

NameTypeIs RequiredDefault ValueRemark
appkeystringtrueYOUR_APPKEYObtained after payment
htmlstringfalseYOUR_VALUEThe webpage HTML content to be extracted, choose either this parameter or url
urlstringfalseYOUR_VALUEThe webpage URL to be extracted, choose either this parameter or html. (Issues caused by the source site's anti-crawling measures that prevent normal webpage content requests for subsequent processing are not handled)

API Response Parameters

NameTypeRemark
DataStatus.RequestParameterstringAPI request parameter
DataStatus.StatusCodeintAPI return status code
DataStatus.StatusDescriptionstringAPI return status description
DataStatus.ResponseDateTimestringAPI data return time
DataStatus.DataTotalCountintTotal data count under this condition, generally used for pagination
Data.TitlestringArticle title
Data.BylinestringArticle author
Data.DirstringArticle text direction
Data.LangstringArticle language
Data.ContentstringArticle content
Data.TextContentstringArticle content (without HTML tags, divided by paragraphs)
Data.LengthintArticle length
Data.ExcerptstringArticle excerpt
Data.SiteNamestringWebsite name
Data.PublishedTimestring[]Article publication time

API Response Status Codes

Status CodeExplanation of Status CodeRemarks
200API responding normally For business status codes, see below under API Custom Status Codes
403Request rate exceeded The CDN layer makes an intelligent judgment based on the frequency of IP requests. General high-frequency requests will not trigger this status code.
200Normal return
400Parameter error
429Request rate limitedRequests cannot exceed 100 times per second
403Account overdue Please pay attention to the E-mail reminders about order expiration
402APPKEY error Please check whether the APPKEY provided is the one obtained from the developer center
500API response error

Code Snippets

Others also bought

Data and API Description
Review ratingReview ratingReview ratingReview ratingReview rating
4.93
Website Tools
  • Parses the site's title and favicon
Data and API Description
Review ratingReview ratingReview ratingReview ratingReview rating
4.92
Image Recognition
  • Supports converting web pages to PDF
Data and API Description
Review ratingReview ratingReview ratingReview ratingReview rating
4.97
Website Tools
  • Provides domain SSL certificate information parsing