Bestseller
Content Extraction Web Scraping Text Processing
Article Extractor
Extract clean article content from any webpage URL. Automatically removes ads, navigation, and other non-content elements to provide clean, readable article text with title, content, and metadata. For HTML string extraction, please use the /v1/article/extractFromHtml endpoint.
Data & API Features
Extract clean article content from any webpage URL;
Automatic removal of ads, navigation, and non-content elements;
Extract article title, content, author, publication date, and metadata;
Separate endpoint available for HTML string extraction (/v1/article/extractFromHtml);
High-quality content extraction with intelligent parsing;
Full API support for HTTPS (TLS v1.0 / v1.1 / v1.2 / v1.3);
Fully compatible with Apple ATS;
Nationwide multi-node CDN deployment;
Ultra-fast response, API interface load balancing built with multiple servers.

Annual Subscription
$49$99
Try it for free!
Sign In Sign in to get a trial key and test all APIs.
Secure payment by Stripe
API Document
HTTP Protocol:HTTPS
HTTP Method:POST
HTTP Endpoint:https://api.gugudata.io/v1/article/extract
Response Type:application/json; charset=utf-8
DEMO Endpoint:https://api.gugudata.io/v1/article/extract/demo
Live Demo:Try Interactive Demo
API Request Parameters
| Name | Type | Is Required | Default Value | Remark |
|---|---|---|---|---|
| appkey | string | true | YOUR_APPKEY | API key obtained after payment, can be passed as query parameter or in request body |
| url | string | true | N/A | The URL of the webpage to extract article content from |
API Response Parameters
| Name | Type | Remark |
|---|---|---|
| DataStatus.StatusCode | int | API response status code |
| DataStatus.StatusDescription | string | API response status description |
| DataStatus.ResponseDateTime | string | API response timestamp |
| DataStatus.DataTotalCount | int | Total data count under this condition, usually used for pagination calculation |
| Data.url | string | Source URL of the article |
| Data.title | string | Extracted article title |
| Data.description | string | Article description/summary |
| Data.links | array | Array of links contained in the article |
| Data.image | string | Main article image URL |
| Data.content | string | Extracted article content (HTML format, with ads and navigation removed) |
| Data.author | string | Article author (if available, may be empty string) |
| Data.favicon | string | Website favicon URL |
| Data.source | string | Source website domain (e.g., sohu.com) |
| Data.published | string | Article publication date/time (format: YYYY-MM-DD HH:MM) |
| Data.ttr | int | Estimated reading time (Time to Read, in minutes) |
| Data.type | string | Article type (e.g., news, article, etc.) |
API Response Status Codes
| Status Code | Explanation of Status Code | Remarks |
|---|---|---|
| 200 | API responding normally | For business status codes, see below under API Custom Status Codes. |
| 400 | Parameter error | |
| 402 | APPKEY error | Please check if the APPKEY provided is the one obtained from the developer center. |
| 403 | Account overdue | Please pay attention to the E-mail reminders regarding order expiration. |
| 429 | Request rate limited | Requests cannot exceed 5 times per second. The CDN layer intelligently determines based on the frequency of IP requests. General high-frequency requests do not trigger this status code. |
| 500 | API response error |



