Skip to content

Commit 0f91b1c

Browse files
committed
fix(doc): move warnings above
1 parent a62f7b4 commit 0f91b1c

File tree

1 file changed

+6
-4
lines changed

1 file changed

+6
-4
lines changed

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -790,16 +790,18 @@ The program then writes that one record into a local Parquet file, does a second
790790

791791
### Bonus: download a full crawl index and query with DuckDB
792792

793-
If you want to run many of these queries, and you have a lot of disk space, you'll want to download the 300 gigabyte index and query it repeatedly. Run:
793+
In case you want to run many of these queries, and you have a lot of disk space, you'll want to download the 300 gigabyte index and query it repeatedly.
794+
795+
> [!IMPORTANT]
796+
> If you happen to be using the Common Crawl Foundation development server, we've already downloaded these files, and you can run ```make duck_ccf_local_files```
797+
798+
To download the crawl index, there are two options: if you have access to the CCF AWS buckets, run:
794799

795800
```shell
796801
mkdir -p 'crawl=CC-MAIN-2024-22/subset=warc'
797802
aws s3 sync s3://commoncrawl/cc-index/table/cc-main/warc/crawl=CC-MAIN-2024-22/subset=warc/ 'crawl=CC-MAIN-2024-22/subset=warc'
798803
```
799804

800-
> [!IMPORTANT]
801-
> If you happen to be using the Common Crawl Foundation development server, we've already downloaded these files, and you can run ```make duck_ccf_local_files```
802-
803805
If, by any other chance, you don't have access through the AWS CLI:
804806

805807
```shell

0 commit comments

Comments
 (0)