Skip to content

Commit 8791a48

Browse files
committed
fix: revert line ending to follow the WARC format
Signed-off-by: Luca Foppiano <luca@foppiano.org>
1 parent 307f6b2 commit 8791a48

File tree

4 files changed

+162
-159
lines changed

4 files changed

+162
-159
lines changed

.gitattributes

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
*.warc binary
2+
*.warc.wet binary
3+
*.warc.wat binary

data/whirlwind.warc

Lines changed: 100 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -1,85 +1,85 @@
1-
WARC/1.0
2-
WARC-Type: warcinfo
3-
WARC-Date: 2024-05-17T23:31:22Z
4-
WARC-Record-ID: <urn:uuid:668d88fc-4208-41fc-b327-1aa6cb783331>
5-
Content-Length: 486
6-
Content-Type: application/warc-fields
7-
WARC-Filename: CC-MAIN-20240517233122-20240518023122-00000.warc.gz
8-
9-
isPartOf: CC-MAIN-2024-22
10-
publisher: Common Crawl
11-
description: Wide crawl of the web for May 2024
12-
operator: Common Crawl Admin (info@commoncrawl.org)
13-
hostname: ip-10-67-67-211
14-
software: Apache Nutch 1.19 (modified, https://github.com/commoncrawl/nutch/)
15-
robots: checked via crawler-commons 1.5-SNAPSHOT (https://github.com/crawler-commons/crawler-commons)
16-
format: WARC File Format 1.1
17-
conformsTo: https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/
18-
19-
20-
WARC/1.0
21-
WARC-Type: request
22-
WARC-Date: 2024-05-18T01:58:10Z
23-
WARC-Record-ID: <urn:uuid:292f457d-203c-42f2-a1b5-69a4dabefd4f>
24-
Content-Length: 265
25-
Content-Type: application/http; msgtype=request
26-
WARC-Warcinfo-ID: <urn:uuid:668d88fc-4208-41fc-b327-1aa6cb783331>
27-
WARC-IP-Address: 208.80.154.224
28-
WARC-Target-URI: https://an.wikipedia.org/wiki/Escopete
29-
30-
GET /wiki/Escopete HTTP/1.1
31-
User-Agent: CCBot/2.0 (https://commoncrawl.org/faq/)
32-
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
33-
Accept-Language: en-US,en;q=0.5
34-
Accept-Encoding: br,gzip
35-
Host: an.wikipedia.org
36-
Connection: Keep-Alive
37-
38-
39-
40-
WARC/1.0
41-
WARC-Type: response
42-
WARC-Date: 2024-05-18T01:58:10Z
43-
WARC-Record-ID: <urn:uuid:2aabeff2-67f5-4608-8466-e87c6296e2b6>
44-
Content-Length: 74581
45-
Content-Type: application/http; msgtype=response
46-
WARC-Warcinfo-ID: <urn:uuid:668d88fc-4208-41fc-b327-1aa6cb783331>
47-
WARC-Concurrent-To: <urn:uuid:292f457d-203c-42f2-a1b5-69a4dabefd4f>
48-
WARC-IP-Address: 208.80.154.224
49-
WARC-Target-URI: https://an.wikipedia.org/wiki/Escopete
50-
WARC-Payload-Digest: sha1:RY7PLBUFQNI2FFV5FTUQK72W6SNPXLQU
51-
WARC-Block-Digest: sha1:35FTUGFVNWRVTZQGCWIX2MQA3LMYC7X7
52-
WARC-Identified-Payload-Type: text/html
53-
54-
HTTP/1.1 200 OK
55-
date: Sat, 18 May 2024 01:58:10 GMT
56-
server: mw-web.eqiad.canary-bb67b76b8-jtwdb
57-
x-content-type-options: nosniff
58-
content-language: an
59-
origin-trial: AonOP4SwCrqpb0nhZbg554z9iJimP3DxUDB8V4yu9fyyepauGKD0NXqTknWi4gnuDfMG6hNb7TDUDTsl0mDw9gIAAABmeyJvcmlnaW4iOiJodHRwczovL3dpa2lwZWRpYS5vcmc6NDQzIiwiZmVhdHVyZSI6IlRvcExldmVsVHBjZCIsImV4cGlyeSI6MTczNTM0Mzk5OSwiaXNTdWJkb21haW4iOnRydWV9
60-
accept-ch:
61-
vary: Accept-Encoding,Cookie,Authorization
62-
last-modified: Sat, 04 May 2024 01:58:10 GMT
63-
content-type: text/html; charset=UTF-8
64-
X-Crawler-content-encoding: gzip
65-
age: 0
66-
x-cache: cp1106 miss, cp1106 miss
67-
x-cache-status: miss
68-
server-timing: cache;desc="miss", host;desc="cp1106"
69-
strict-transport-security: max-age=106384710; includeSubDomains; preload
70-
report-to: { "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
71-
nel: { "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0}
72-
set-cookie: WMF-Last-Access=18-May-2024;Path=/;HttpOnly;secure;Expires=Wed, 19 Jun 2024 00:00:00 GMT
73-
set-cookie: WMF-Last-Access-Global=18-May-2024;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Wed, 19 Jun 2024 00:00:00 GMT
74-
set-cookie: WMF-DP=1a6;Path=/;HttpOnly;secure;Expires=Sat, 18 May 2024 00:00:00 GMT
75-
x-client-ip: 34.239.158.223
76-
cache-control: private, s-maxage=0, max-age=0, must-revalidate
77-
set-cookie: GeoIP=US:VA:Ashburn:39.05:-77.49:v4; Path=/; secure; Domain=.wikipedia.org
78-
set-cookie: NetworkProbeLimit=0.001;Path=/;Secure;Max-Age=3600
79-
accept-ranges: bytes
80-
X-Crawler-transfer-encoding: chunked
81-
Content-Length: 72848
82-
1+
WARC/1.0
2+
WARC-Type: warcinfo
3+
WARC-Date: 2024-05-17T23:31:22Z
4+
WARC-Record-ID: <urn:uuid:668d88fc-4208-41fc-b327-1aa6cb783331>
5+
Content-Length: 486
6+
Content-Type: application/warc-fields
7+
WARC-Filename: CC-MAIN-20240517233122-20240518023122-00000.warc.gz
8+
9+
isPartOf: CC-MAIN-2024-22
10+
publisher: Common Crawl
11+
description: Wide crawl of the web for May 2024
12+
operator: Common Crawl Admin (info@commoncrawl.org)
13+
hostname: ip-10-67-67-211
14+
software: Apache Nutch 1.19 (modified, https://github.com/commoncrawl/nutch/)
15+
robots: checked via crawler-commons 1.5-SNAPSHOT (https://github.com/crawler-commons/crawler-commons)
16+
format: WARC File Format 1.1
17+
conformsTo: https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/
18+
19+
20+
WARC/1.0
21+
WARC-Type: request
22+
WARC-Date: 2024-05-18T01:58:10Z
23+
WARC-Record-ID: <urn:uuid:292f457d-203c-42f2-a1b5-69a4dabefd4f>
24+
Content-Length: 265
25+
Content-Type: application/http; msgtype=request
26+
WARC-Warcinfo-ID: <urn:uuid:668d88fc-4208-41fc-b327-1aa6cb783331>
27+
WARC-IP-Address: 208.80.154.224
28+
WARC-Target-URI: https://an.wikipedia.org/wiki/Escopete
29+
30+
GET /wiki/Escopete HTTP/1.1
31+
User-Agent: CCBot/2.0 (https://commoncrawl.org/faq/)
32+
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
33+
Accept-Language: en-US,en;q=0.5
34+
Accept-Encoding: br,gzip
35+
Host: an.wikipedia.org
36+
Connection: Keep-Alive
37+
38+
39+
40+
WARC/1.0
41+
WARC-Type: response
42+
WARC-Date: 2024-05-18T01:58:10Z
43+
WARC-Record-ID: <urn:uuid:2aabeff2-67f5-4608-8466-e87c6296e2b6>
44+
Content-Length: 74581
45+
Content-Type: application/http; msgtype=response
46+
WARC-Warcinfo-ID: <urn:uuid:668d88fc-4208-41fc-b327-1aa6cb783331>
47+
WARC-Concurrent-To: <urn:uuid:292f457d-203c-42f2-a1b5-69a4dabefd4f>
48+
WARC-IP-Address: 208.80.154.224
49+
WARC-Target-URI: https://an.wikipedia.org/wiki/Escopete
50+
WARC-Payload-Digest: sha1:RY7PLBUFQNI2FFV5FTUQK72W6SNPXLQU
51+
WARC-Block-Digest: sha1:35FTUGFVNWRVTZQGCWIX2MQA3LMYC7X7
52+
WARC-Identified-Payload-Type: text/html
53+
54+
HTTP/1.1 200 OK
55+
date: Sat, 18 May 2024 01:58:10 GMT
56+
server: mw-web.eqiad.canary-bb67b76b8-jtwdb
57+
x-content-type-options: nosniff
58+
content-language: an
59+
origin-trial: AonOP4SwCrqpb0nhZbg554z9iJimP3DxUDB8V4yu9fyyepauGKD0NXqTknWi4gnuDfMG6hNb7TDUDTsl0mDw9gIAAABmeyJvcmlnaW4iOiJodHRwczovL3dpa2lwZWRpYS5vcmc6NDQzIiwiZmVhdHVyZSI6IlRvcExldmVsVHBjZCIsImV4cGlyeSI6MTczNTM0Mzk5OSwiaXNTdWJkb21haW4iOnRydWV9
60+
accept-ch:
61+
vary: Accept-Encoding,Cookie,Authorization
62+
last-modified: Sat, 04 May 2024 01:58:10 GMT
63+
content-type: text/html; charset=UTF-8
64+
X-Crawler-content-encoding: gzip
65+
age: 0
66+
x-cache: cp1106 miss, cp1106 miss
67+
x-cache-status: miss
68+
server-timing: cache;desc="miss", host;desc="cp1106"
69+
strict-transport-security: max-age=106384710; includeSubDomains; preload
70+
report-to: { "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
71+
nel: { "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction": 0.0}
72+
set-cookie: WMF-Last-Access=18-May-2024;Path=/;HttpOnly;secure;Expires=Wed, 19 Jun 2024 00:00:00 GMT
73+
set-cookie: WMF-Last-Access-Global=18-May-2024;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Wed, 19 Jun 2024 00:00:00 GMT
74+
set-cookie: WMF-DP=1a6;Path=/;HttpOnly;secure;Expires=Sat, 18 May 2024 00:00:00 GMT
75+
x-client-ip: 34.239.158.223
76+
cache-control: private, s-maxage=0, max-age=0, must-revalidate
77+
set-cookie: GeoIP=US:VA:Ashburn:39.05:-77.49:v4; Path=/; secure; Domain=.wikipedia.org
78+
set-cookie: NetworkProbeLimit=0.001;Path=/;Secure;Max-Age=3600
79+
accept-ranges: bytes
80+
X-Crawler-transfer-encoding: chunked
81+
Content-Length: 72848
82+
8383
<!DOCTYPE html>
8484
<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-sticky-header-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-clientpref-1 vector-feature-main-menu-pinned-disabled vector-feature-limited-width-clientpref-1 vector-feature-limited-width-content-enabled vector-feature-custom-font-size-clientpref-0 vector-feature-appearance-disabled vector-feature-appearance-pinned-clientpref-0 vector-feature-night-mode-disabled skin-theme-clientpref-day vector-toc-available" lang="an" dir="ltr">
8585
<head>
@@ -932,21 +932,21 @@ Mire-se <a href="https://foundation.wikimedia.org/wiki/Special:MyLanguage/Policy
932932
<script>(RLQ=window.RLQ||[]).push(function(){mw.config.set({"wgHostname":"mw-web.eqiad.canary-bb67b76b8-jtwdb","wgBackendResponseTime":224,"wgPageParseReport":{"limitreport":{"cputime":"0.195","walltime":"0.526","ppvisitednodes":{"value":2544,"limit":1000000},"postexpandincludesize":{"value":21268,"limit":2097152},"templateargumentsize":{"value":5362,"limit":2097152},"expansiondepth":{"value":25,"limit":100},"expensivefunctioncount":{"value":1,"limit":500},"unstrip-depth":{"value":0,"limit":20},"unstrip-size":{"value":659,"limit":5000000},"entityaccesscount":{"value":1,"limit":400},"timingprofile":["100.00% 218.110 1 -total"," 96.15% 209.720 1 Plantilla:Ficha_de_localidat_d'Espanya"," 94.89% 206.968 1 Plantilla:LocCodigo"," 53.30% 116.256 6 Plantilla:Propiedat"," 16.67% 36.361 1 Plantilla:Coor_dd"," 6.26% 13.643 1 Plantilla:Mapa_de_localización"," 4.76% 10.382 2 Plantilla:DMS"," 2.94% 6.421 12 Plantilla:Mod"," 2.22% 4.834 18 Plantilla:Floor"," 1.68% 3.661 6 Plantilla:Supa10"]},"scribunto":{"limitreport-timeusage":{"value":"0.088","limit":"10.000"},"limitreport-memusage":{"value":2345110,"limit":52428800},"limitreport-logs":"No detectaus args\nNombre Castiella-La Mancha\n\nNo existe x\nNo existe y\n"},"cachereport":{"origin":"mw2363","timestamp":"20240429141658","ttl":2592000,"transientcontent":false}}});});</script>
933933
<script type="application/ld+json">{"@context":"https:\/\/schema.org","@type":"Article","name":"Escopete","url":"https:\/\/an.wikipedia.org\/wiki\/Escopete","sameAs":"http:\/\/www.wikidata.org\/entity\/Q1653851","mainEntity":"http:\/\/www.wikidata.org\/entity\/Q1653851","author":{"@type":"Organization","name":"Colaboradores de los proyectos Wikimedia"},"publisher":{"@type":"Organization","name":"Wikimedia Foundation, Inc.","logo":{"@type":"ImageObject","url":"https:\/\/www.wikimedia.org\/static\/images\/wmf-hor-googpub.png"}},"datePublished":"2008-07-04T09:51:01Z","dateModified":"2023-08-17T21:26:39Z","image":"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/a\/aa\/Iglesia_de_Nuestra_Se%C3%B1ora_de_la_Asunci%C3%B3n._Escopete_%28Guadalajara%29.jpg"}</script>
934934
</body>
935-
</html>
936-
937-
WARC/1.0
938-
WARC-Type: metadata
939-
WARC-Date: 2024-05-18T01:58:10Z
940-
WARC-Record-ID: <urn:uuid:c9ede96e-7ed2-4d17-8b6b-fb3d240f4442>
941-
Content-Length: 201
942-
Content-Type: application/warc-fields
943-
WARC-Warcinfo-ID: <urn:uuid:668d88fc-4208-41fc-b327-1aa6cb783331>
944-
WARC-Concurrent-To: <urn:uuid:2aabeff2-67f5-4608-8466-e87c6296e2b6>
945-
WARC-Target-URI: https://an.wikipedia.org/wiki/Escopete
946-
947-
fetchTimeMs: 258
948-
charset-detected: UTF-8
949-
languages-cld2: {"reliable":false,"text-bytes":3080,"languages":[{"code":"es","code-iso-639-3":"spa","text-covered":0.69,"score":335.0,"name":"SPANISH"}]}
950-
951-
952-
935+
</html>
936+
937+
WARC/1.0
938+
WARC-Type: metadata
939+
WARC-Date: 2024-05-18T01:58:10Z
940+
WARC-Record-ID: <urn:uuid:c9ede96e-7ed2-4d17-8b6b-fb3d240f4442>
941+
Content-Length: 201
942+
Content-Type: application/warc-fields
943+
WARC-Warcinfo-ID: <urn:uuid:668d88fc-4208-41fc-b327-1aa6cb783331>
944+
WARC-Concurrent-To: <urn:uuid:2aabeff2-67f5-4608-8466-e87c6296e2b6>
945+
WARC-Target-URI: https://an.wikipedia.org/wiki/Escopete
946+
947+
fetchTimeMs: 258
948+
charset-detected: UTF-8
949+
languages-cld2: {"reliable":false,"text-bytes":3080,"languages":[{"code":"es","code-iso-639-3":"spa","text-covered":0.69,"score":335.0,"name":"SPANISH"}]}
950+
951+
952+

data/whirlwind.warc.wat

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,28 @@
1-
WARC/1.0
2-
WARC-Type: warcinfo
3-
WARC-Date: 2024-05-31T01:16:45Z
4-
WARC-Filename: CC-MAIN-20240517233122-20240518023122-00000.warc.wat.gz
5-
WARC-Record-ID: <urn:uuid:b4fe2291-40c2-4492-b4ab-7a7776be6c73>
6-
Content-Type: application/warc-fields
7-
Content-Length: 278
8-
9-
Software-Info: ia-web-commons.1.1.10-SNAPSHOT-20240513074037
10-
Extracted-Date: Fri, 31 May 2024 01:16:45 GMT
11-
ip: 10.67.67.159
12-
hostname: ip-10-67-67-159.ec2.internal
13-
format: WARC File Format 1.0
14-
conformsTo: http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf
15-
16-
17-
18-
WARC/1.0
19-
WARC-Type: metadata
20-
WARC-Target-URI: https://an.wikipedia.org/wiki/Escopete
21-
WARC-Date: 2024-05-31T01:17:49Z
22-
WARC-Record-ID: <urn:uuid:3d3f4785-c69a-4f8c-968c-d23302c55ef4>
23-
WARC-Refers-To: <urn:uuid:292f457d-203c-42f2-a1b5-69a4dabefd4f>
24-
Content-Type: application/json
25-
Content-Length: 1386
26-
27-
{"Container":{"Filename":"CC-MAIN-20240517233122-20240518023122-00000.warc.gz","Compressed":true,"Offset":"80610308","Gzip-Metadata":{"Deflate-Length":"423","Header-Length":"10","Footer-Length":"8","Inflated-CRC":"1106529533","Inflated-Length":"626"}},"Envelope":{"Payload-Metadata":{"Actual-Content-Type":"application/http; msgtype=request","HTTP-Request-Metadata":{"Request-Message":{"Method":"GET","Path":"/wiki/Escopete","Version":"HTTP/1.1"},"Headers-Length":"263","Headers":{"User-Agent":"CCBot/2.0 (https://commoncrawl.org/faq/)","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language":"en-US,en;q=0.5","Accept-Encoding":"br,gzip","Host":"an.wikipedia.org","Connection":"Keep-Alive"},"Entity-Length":"0","Entity-Digest":"sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ","Entity-Trailing-Slop-Length":"0"},"Actual-Content-Length":"265","Trailing-Slop-Length":"4","Block-Digest":"sha1:IE7NEN3QEJHUCYRRGVMHDDW3BEHFRQ6V"},"Format":"WARC/1.0","WARC-Header-Length":"357","WARC-Header-Metadata":{"WARC-Type":"request","WARC-Date":"2024-05-18T01:58:10Z","WARC-Record-ID":"<urn:uuid:292f457d-203c-42f2-a1b5-69a4dabefd4f>","Content-Length":"265","Content-Type":"application/http; msgtype=request","WARC-Warcinfo-ID":"<urn:uuid:668d88fc-4208-41fc-b327-1aa6cb783331>","WARC-IP-Address":"208.80.154.224","WARC-Target-URI":"https://an.wikipedia.org/wiki/Escopete"}}}
28-
1+
WARC/1.0
2+
WARC-Type: warcinfo
3+
WARC-Date: 2024-05-31T01:16:45Z
4+
WARC-Filename: CC-MAIN-20240517233122-20240518023122-00000.warc.wat.gz
5+
WARC-Record-ID: <urn:uuid:b4fe2291-40c2-4492-b4ab-7a7776be6c73>
6+
Content-Type: application/warc-fields
7+
Content-Length: 278
8+
9+
Software-Info: ia-web-commons.1.1.10-SNAPSHOT-20240513074037
10+
Extracted-Date: Fri, 31 May 2024 01:16:45 GMT
11+
ip: 10.67.67.159
12+
hostname: ip-10-67-67-159.ec2.internal
13+
format: WARC File Format 1.0
14+
conformsTo: http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf
15+
16+
17+
18+
WARC/1.0
19+
WARC-Type: metadata
20+
WARC-Target-URI: https://an.wikipedia.org/wiki/Escopete
21+
WARC-Date: 2024-05-31T01:17:49Z
22+
WARC-Record-ID: <urn:uuid:3d3f4785-c69a-4f8c-968c-d23302c55ef4>
23+
WARC-Refers-To: <urn:uuid:292f457d-203c-42f2-a1b5-69a4dabefd4f>
24+
Content-Type: application/json
25+
Content-Length: 1386
26+
27+
{"Container":{"Filename":"CC-MAIN-20240517233122-20240518023122-00000.warc.gz","Compressed":true,"Offset":"80610308","Gzip-Metadata":{"Deflate-Length":"423","Header-Length":"10","Footer-Length":"8","Inflated-CRC":"1106529533","Inflated-Length":"626"}},"Envelope":{"Payload-Metadata":{"Actual-Content-Type":"application/http; msgtype=request","HTTP-Request-Metadata":{"Request-Message":{"Method":"GET","Path":"/wiki/Escopete","Version":"HTTP/1.1"},"Headers-Length":"263","Headers":{"User-Agent":"CCBot/2.0 (https://commoncrawl.org/faq/)","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language":"en-US,en;q=0.5","Accept-Encoding":"br,gzip","Host":"an.wikipedia.org","Connection":"Keep-Alive"},"Entity-Length":"0","Entity-Digest":"sha1:3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ","Entity-Trailing-Slop-Length":"0"},"Actual-Content-Length":"265","Trailing-Slop-Length":"4","Block-Digest":"sha1:IE7NEN3QEJHUCYRRGVMHDDW3BEHFRQ6V"},"Format":"WARC/1.0","WARC-Header-Length":"357","WARC-Header-Metadata":{"WARC-Type":"request","WARC-Date":"2024-05-18T01:58:10Z","WARC-Record-ID":"<urn:uuid:292f457d-203c-42f2-a1b5-69a4dabefd4f>","Content-Length":"265","Content-Type":"application/http; msgtype=request","WARC-Warcinfo-ID":"<urn:uuid:668d88fc-4208-41fc-b327-1aa6cb783331>","WARC-IP-Address":"208.80.154.224","WARC-Target-URI":"https://an.wikipedia.org/wiki/Escopete"}}}
28+

data/whirlwind.warc.wet

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,32 @@
1-
WARC/1.0
2-
WARC-Type: warcinfo
3-
WARC-Date: 2024-05-31T01:16:46Z
4-
WARC-Filename: CC-MAIN-20240517233122-20240518023122-00000.warc.wet.gz
5-
WARC-Record-ID: <urn:uuid:5327826b-b3d8-426e-938d-cf3318afe8e9>
6-
Content-Type: application/warc-fields
7-
Content-Length: 368
8-
9-
Software-Info: ia-web-commons.1.1.10-SNAPSHOT-20240513074037
10-
Extracted-Date: Fri, 31 May 2024 01:16:46 GMT
11-
robots: checked via crawler-commons 1.5-SNAPSHOT (https://github.com/crawler-commons/crawler-commons)
12-
isPartOf: CC-MAIN-2024-22
13-
operator: Common Crawl Admin (info@commoncrawl.org)
14-
description: Wide crawl of the web for May 2024
15-
publisher: Common Crawl
16-
17-
18-
19-
WARC/1.0
20-
WARC-Type: conversion
21-
WARC-Target-URI: https://an.wikipedia.org/wiki/Escopete
22-
WARC-Date: 2024-05-18T01:58:10Z
23-
WARC-Record-ID: <urn:uuid:ba729a40-ff84-4085-8d48-0a5b2ee0c42d>
24-
WARC-Refers-To: <urn:uuid:2aabeff2-67f5-4608-8466-e87c6296e2b6>
25-
WARC-Block-Digest: sha1:RDTSR52RUHWDA7QK4BK7OUHU3EXTXYUL
26-
WARC-Identified-Content-Language: spa
27-
Content-Type: text/plain
28-
Content-Length: 4456
29-
1+
WARC/1.0
2+
WARC-Type: warcinfo
3+
WARC-Date: 2024-05-31T01:16:46Z
4+
WARC-Filename: CC-MAIN-20240517233122-20240518023122-00000.warc.wet.gz
5+
WARC-Record-ID: <urn:uuid:5327826b-b3d8-426e-938d-cf3318afe8e9>
6+
Content-Type: application/warc-fields
7+
Content-Length: 368
8+
9+
Software-Info: ia-web-commons.1.1.10-SNAPSHOT-20240513074037
10+
Extracted-Date: Fri, 31 May 2024 01:16:46 GMT
11+
robots: checked via crawler-commons 1.5-SNAPSHOT (https://github.com/crawler-commons/crawler-commons)
12+
isPartOf: CC-MAIN-2024-22
13+
operator: Common Crawl Admin (info@commoncrawl.org)
14+
description: Wide crawl of the web for May 2024
15+
publisher: Common Crawl
16+
17+
18+
19+
WARC/1.0
20+
WARC-Type: conversion
21+
WARC-Target-URI: https://an.wikipedia.org/wiki/Escopete
22+
WARC-Date: 2024-05-18T01:58:10Z
23+
WARC-Record-ID: <urn:uuid:ba729a40-ff84-4085-8d48-0a5b2ee0c42d>
24+
WARC-Refers-To: <urn:uuid:2aabeff2-67f5-4608-8466-e87c6296e2b6>
25+
WARC-Block-Digest: sha1:RDTSR52RUHWDA7QK4BK7OUHU3EXTXYUL
26+
WARC-Identified-Content-Language: spa
27+
Content-Type: text/plain
28+
Content-Length: 4456
29+
3030
Escopete - Biquipedia, a enciclopedia libre
3131
Ir al contenido
3232
Menú principal
@@ -209,5 +209,5 @@ Estatisticas
209209
Declaración de cookies
210210
Versión ta mobils
211211
Activar o desactivar el límite de anchura del contenido
212-
213-
212+
213+

0 commit comments

Comments
 (0)