@@ -4,17 +4,21 @@ English | [简体中文](https://github.com/coder-hxl/x-crawl/blob/main/docs/cn.
44
55x-crawl is a Nodejs multifunctional crawler library.
66
7- ## Feature
7+ ## Features
88
9- - Crawl HTML , JSON, file resources, etc. with simple configuration.
10- - Built -in puppeteer crawls HTML and uses JSDOM library to parse HTML .
9+ - Crawl pages , JSON, file resources, etc. with simple configuration.
10+ - The built -in puppeteer crawls the page, and uses the jsdom library to parse the page .
1111- Support asynchronous/synchronous way to crawl data.
12- - Support Promise/Callback way to get the result.
13- - Polling function.
12+ - Support Promise/Callback method to get the result.
13+ - Polling function, fixed-point crawling .
1414- Anthropomorphic request interval.
15- - Written in TypeScript, provides generics.
15+ - Written in TypeScript, providing generics.
1616
17- ## Benefits provided by using puppeter
17+ ## Relationship with puppeter
18+
19+ The fetchHTML API internally uses the [ puppeter] ( https://github.com/puppeteer/puppeteer ) library to crawl pages.
20+
21+ The following can be done:
1822
1923- Generate screenshots and PDFs of pages.
2024- Crawl a SPA (Single-Page Application) and generate pre-rendered content (i.e. "SSR" (Server-Side Rendering)).
@@ -33,6 +37,7 @@ x-crawl is a Nodejs multifunctional crawler library.
3337 * [ fetchHTML] ( #fetchHTML )
3438 + [ Type] ( #Type-2 )
3539 + [ Example] ( #Example-2 )
40+ + [ About page] ( #About-page )
3641 * [ fetchData] ( #fetchData )
3742 + [ Type] ( #Type-3 )
3843 + [ Example] ( #Example-3 )
@@ -173,12 +178,12 @@ The first request is not to trigger the interval.
173178
174179### fetchHTML
175180
176- fetchHTML is the method of the above [myXCrawl ](https :// github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl HTML .
181+ fetchHTML is the method of the above [myXCrawl ](https :// github.com/coder-hxl/x-crawl#Example-1) instance, usually used to crawl page .
177182
178183#### Type
179184
180185- Look at the [FetchHTMLConfig ](#FetchHTMLConfig ) type
181- - Look at the [FetchHTML ](#FetchHTML ) type
186+ - Look at the [FetchHTML ](#FetchHTML - 2 ) type
182187
183188` ` ` ts
184189function fetchHTML: (
@@ -196,6 +201,10 @@ myXCrawl.fetchHTML('/xxx').then((res) => {
196201})
197202` ` `
198203
204+ #### About page
205+
206+ Get the page instance from res .data .page , which can do interactive operations such as events . For specific usage , refer to [page ](https :// pptr.dev/api/puppeteer.page).
207+
199208### fetchData
200209
201210fetchData is the method of the above [myXCrawl ](#Example - 1 ) instance , which is usually used to crawl APIs to obtain JSON data and so on .
@@ -224,7 +233,7 @@ const requestConfig = [
224233
225234myXCrawl.fetchData({
226235 requestConfig, // Request configuration, can be RequestConfig | RequestConfig[]
227- intervalTime: { max: 5000, min: 1000 } // The intervalTime passed in when not using myXCrawl
236+ intervalTime: { max: 5000, min: 1000 } // The intervalTime passed in when creating myXCrawl is not used
228237}).then(res => {
229238 console.log(res)
230239})
@@ -380,7 +389,7 @@ interface FetchDataConfig extends FetchBaseConfigV1 {
380389interface FetchFileConfig extends FetchBaseConfigV1 {
381390 fileConfig: {
382391 storeDir: string // Store folder
383- extension?: string // filename extension
392+ extension?: string // Filename extension
384393 }
385394}
386395` ` `
@@ -409,7 +418,7 @@ interface FetchCommon<T> {
409418### FetchResCommonArrV1
410419
411420` ` ` ts
412- type FetchCommonArr <T> = FetchCommon <T>[]
421+ type FetchResCommonArrV1 <T> = FetchResCommonV1 <T>[]
413422` ` `
414423
415424### FileInfo
0 commit comments