@@ -16,7 +16,7 @@ x-crawl is a Nodejs multifunctional crawler library.
1616
1717## Relationship with puppeteer
1818
19- The fetchHTML API internally uses the [ puppeteer ] ( https://github.com/puppeteer/puppeteer ) library to crawl pages.
19+ The fetchPage API internally uses the [ puppeteer] ( https://github.com/puppeteer/puppeteer ) library to crawl pages.
2020
2121The following can be done:
2222
@@ -34,7 +34,7 @@ The following can be done:
3434 + [ Example] ( #Example-1 )
3535 + [ Mode] ( #Mode )
3636 + [ IntervalTime] ( #IntervalTime )
37- * [ fetchHTML ] ( #fetchHTML )
37+ * [ fetchPage ] ( #fetchPage )
3838 + [ Type] ( #Type-2 )
3939 + [ Example] ( #Example-2 )
4040 + [ About page] ( #About-page )
@@ -50,19 +50,19 @@ The following can be done:
5050- [ Types] ( #Types )
5151 * [ AnyObject] ( #AnyObject )
5252 * [ Method] ( #Method )
53+ * [ RequestBaseConfig] ( #RequestBaseConfig )
5354 * [ RequestConfig] ( #RequestConfig )
5455 * [ IntervalTime] ( #IntervalTime )
5556 * [ XCrawlBaseConfig] ( #XCrawlBaseConfig )
5657 * [ FetchBaseConfigV1] ( #FetchBaseConfigV1 )
57- * [ FetchBaseConfigV2] ( #FetchBaseConfigV2 )
58- * [ FetchHTMLConfig] ( #FetchHTMLConfig )
58+ * [ FetchPageConfig] ( #FetchPageConfig )
5959 * [ FetchDataConfig] ( #FetchDataConfig )
6060 * [ FetchFileConfig] ( #FetchFileConfig )
6161 * [ StartPollingConfig] ( #StartPollingConfig )
6262 * [ FetchResCommonV1] ( #FetchResCommonV1 )
6363 * [ FetchResCommonArrV1] ( #FetchResCommonArrV1 )
6464 * [ FileInfo] ( #FileInfo )
65- * [ FetchHTML ] ( #FetchHTML )
65+ * [ FetchPage ] ( #FetchPage )
6666- [ More] ( #More )
6767
6868## Install
@@ -90,9 +90,9 @@ const myXCrawl = xCrawl({
9090// 3.Set the crawling task
9191// Call the startPolling API to start the polling function, and the callback function will be called every other day
9292myXCrawl .startPolling ({ d: 1 }, () => {
93- // Call fetchHTML API to crawl HTML
94- myXCrawl .fetchHTML (' https://www.youtube.com/' ).then ((res ) => {
95- const { jsdom } = res .data // By default, the JSDOM library is used to parse HTML
93+ // Call fetchPage API to crawl Page
94+ myXCrawl .fetchPage (' https://www.youtube.com/' ).then ((res ) => {
95+ const { jsdom } = res .data // By default, the JSDOM library is used to parse Page
9696
9797 // Get the cover image element of the Promoted Video
9898 const imgEls = jsdom .window .document .querySelectorAll (
@@ -124,7 +124,7 @@ running result:
124124 <img src =" https://raw.githubusercontent.com/coder-hxl/x-crawl/main/assets/en/crawler-result.png " />
125125</div >
126126
127- ** Note:** Do not crawl randomly, here is just to demonstrate how to use XCrawl , and control the request frequency within 3000ms to 2000ms.
127+ ** Note:** Do not crawl randomly, here is just to demonstrate how to use x-crawl , and control the request frequency within 3000ms to 2000ms.
128128
129129## Core concepts
130130
@@ -154,9 +154,9 @@ const myXCrawl = xCrawl({
154154})
155155` ` `
156156
157- Passing ** baseConfig ** is for ** fetchHTML / fetchData / fetchFile ** to use these values by default .
157+ Passing ** baseConfig ** is for ** fetchPage / fetchData / fetchFile ** to use these values by default .
158158
159- ** Note :** To avoid repeated creation of instances in subsequent examples , ** myXCrawl ** here will be the crawler instance in the ** fetchHTML / fetchData / fetchFile ** example .
159+ ** Note :** To avoid repeated creation of instances in subsequent examples , ** myXCrawl ** here will be the crawler instance in the ** fetchPage / fetchData / fetchFile ** example .
160160
161161#### Mode
162162
@@ -176,26 +176,26 @@ The intervalTime option defaults to undefined . If there is a setting value, it
176176
177177The first request is not to trigger the interval .
178178
179- ### fetchHTML
179+ ### fetchPage
180180
181- fetchHTML is the method of the above [myXCrawl ](https :// github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl page.
181+ fetchPage is the method of the above [myXCrawl ](https :// github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl page.
182182
183183#### Type
184184
185- - Look at the [FetchHTMLConfig ](#FetchHTMLConfig ) type
186- - Look at the [FetchHTML ](#FetchHTML - 2 ) type
185+ - Look at the [FetchPageConfig ](#FetchPageConfig ) type
186+ - Look at the [FetchPage ](#FetchPage - 2 ) type
187187
188188` ` ` ts
189- function fetchHTML : (
190- config: FetchHTMLConfig ,
191- callback?: (res: FetchHTML ) => void
192- ) => Promise<FetchHTML >
189+ function fetchPage : (
190+ config: FetchPageConfig ,
191+ callback?: (res: FetchPage ) => void
192+ ) => Promise<FetchPage >
193193` ` `
194194
195195#### Example
196196
197197` ` ` js
198- myXCrawl.fetchHTML ('/xxx').then((res) => {
198+ myXCrawl.fetchPage ('/xxx').then((res) => {
199199 const { jsdom } = res.data
200200 console.log(jsdom.window.document.querySelector('title')?.textContent)
201201})
@@ -296,7 +296,7 @@ function startPolling(
296296` ` ` js
297297myXCrawl.startPolling({ h: 1, m: 30 }, () => {
298298 // will be executed every one and a half hours
299- // fetchHTML /fetchData/fetchFile
299+ // fetchPage /fetchData/fetchFile
300300})
301301` ` `
302302
@@ -316,17 +316,24 @@ interface AnyObject extends Object {
316316type Method = 'get' | 'GET' | 'delete' | 'DELETE' | 'head' | 'HEAD' | 'options' | 'OPTONS' | 'post' | 'POST' | 'put' | 'PUT' | 'patch' | 'PATCH' | 'purge' | 'PURGE' | 'link' | 'LINK' | 'unlink' | 'UNLINK'
317317` ` `
318318
319+ ### RequestBaseConfig
320+
321+ ` ` ` ts
322+ interface RequestBaseConfig {
323+ url: string
324+ timeout?: number
325+ proxy?: string
326+ }
327+ ` ` `
328+
319329### RequestConfig
320330
321331` ` ` ts
322- interface RequestConfig {
323- url: string
332+ interface RequestConfig extends RequestBaseConfig {
324333 method?: Method
325334 headers?: AnyObject
326335 params?: AnyObject
327336 data?: any
328- timeout?: number
329- proxy?: string
330337}
331338` ` `
332339
@@ -360,20 +367,10 @@ interface FetchBaseConfigV1 {
360367}
361368` ` `
362369
363- ### FetchBaseConfigV2
364-
365- ` ` ` ts
366- interface FetchBaseConfigV2 {
367- url: string
368- timeout?: number
369- proxy?: string
370- }
371- ` ` `
372-
373- ### FetchHTMLConfig
370+ ### FetchPageConfig
374371
375372` ` ` ts
376- type FetchHTMLConfig = string | FetchBaseConfigV2
373+ type FetchPageConfig = string | RequestBaseConfig
377374` ` `
378375
379376### FetchDataConfig
@@ -432,10 +429,10 @@ interface FileInfo {
432429}
433430` ` `
434431
435- ### FetchHTML
432+ ### FetchPage
436433
437434` ` ` ts
438- interface FetchHTML {
435+ interface FetchPage {
439436 httpResponse: HTTPResponse | null // The type of HTTPResponse in the puppeteer library
440437 data: {
441438 page: Page // The type of Page in the puppeteer library
0 commit comments