Skip to content

Commit 6ddb6a2

Browse files
Initial sample data for geospatial data ingest blog (#337)
* Initial sample data for geospatial data ingest blog * Added sample data for drone flights * Restructured to sub-directories * Fix links
1 parent 53801c3 commit 6ddb6a2

File tree

7 files changed

+4471
-0
lines changed

7 files changed

+4471
-0
lines changed
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
## Geospatial data ingest for ES|QL
2+
3+
This folder contains sample data used in various blogs about querying
4+
Geospatial data using ES|QL. In particular:
5+
6+
* [Geospatial data ingest for ES|QL](https://www.elastic.co/search-labs/blog/geospatial-data-ingest)
7+
* [Geospatial search with ES|QL](https://www.elastic.co/search-labs/blog/esql-geospatial-search-part-one)
8+
* [Geospatial distance search with ES|QL](https://www.elastic.co/search-labs/blog/esql-geospatial-search-part-two)
9+
10+
Most of the data we used for the examples in these blogs were based on data we use internally for integration tests.
11+
But some data was generated by the authors:
12+
13+
* [airports](airports/README.md)
14+
* [drone](drone/README.md)
15+
16+
Each of these sets of data has different licensing requirements depending on
17+
the original data sources. Please read the included README in each
18+
directory for specific attribution and other licensing requirements.
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
## Geospatial data ingest for ES|QL
2+
3+
This folder contains sample data used in the blogs:
4+
5+
* [Geospatial data ingest for ES|QL](https://www.elastic.co/search-labs/blog/geospatial-data-ingest)
6+
* [Geospatial search with ES|QL](https://www.elastic.co/search-labs/blog/esql-geospatial-search-part-one)
7+
* [Geospatial distance search with ES|QL](https://www.elastic.co/search-labs/blog/esql-geospatial-search-part-two)
8+
9+
The data we used for the examples in these blogs were based on data we use internally for integration tests.
10+
This in turn was created by merging data from a few different souces with
11+
the goal of creating datasets appropriate for testing a number of specific
12+
ES|QL features as they were developed.
13+
14+
* [airports.csv](https://raw.githubusercontent.com/elastic/elasticsearch-labs/tree/main/supporting-blog-content/geospatial-data-ingest/airports/airports.csv)
15+
* This contains a merger of three datasets:
16+
* Airports (names, locations and related data) from [Natural Earth](https://www.naturalearthdata.com/downloads/10m-cultural-vectors/airports/)
17+
* City locations from [SimpleMaps](https://simplemaps.com/data/world-cities)
18+
* Airport elevations from [The global airport database](https://www.partow.net/miscellaneous/airportdatabase/)
19+
* [airport_city_boundaries.csv](https://raw.githubusercontent.com/elastic/elasticsearch-labs/tree/main/supporting-blog-content/geospatial-data-ingest/airports/airport_city_boundaries.csv)
20+
* This contains a merger of airport and city names from above with one new source:
21+
* City boundaries from [OpenStreetMap](https://www.openstreetmap.org/)
22+
23+
### Licensing and attribution
24+
25+
Since this data is derived from various sources, we list the license
26+
conditions for each, some of which also require attribution:
27+
28+
* Natural Earth (original dataset with airport names, locations and related
29+
information)
30+
* Released in the public domain with no requirement for attribution. See
31+
https://www.naturalearthdata.com/about/terms-of-use/.
32+
* SimpleMaps (city locations)
33+
* License: Creative Commons Attribution 4.0 license as described at: https://creativecommons.org/licenses/by/4.0/
34+
* Requires attribution and link to https://simplemaps.com/data/world-cities
35+
* Global Airport Database (airport elevations)
36+
* Licensed under MIT license - http://www.opensource.org/licenses/MIT
37+
* Requires attribution: https://www.partow.net/miscellaneous/airportdatabase/
38+
* OpenStreetMap (city boundary polygons)
39+
* Licensed with Open Database License (ODbL) v1.0 - https://opendatacommons.org/licenses/odbl/1-0/
40+
* Requires attribution
41+
* Requires that derived data be shared under a compatible license
42+
43+
The most restrictive license here is the [Open Database License (ODbL)
44+
v1.0](https://opendatacommons.org/licenses/odbl/1-0/) because it as a
45+
'ShareAlike' clause. For this reason we are sharing this combined dataset
46+
using the same license.
47+
48+
### Changes made to the data
49+
50+
Several of the above licenses also require that any changes made from the
51+
original source data are clearly indicated. This can be done by explaining
52+
the process used to create the final results:
53+
54+
* The original airports name, IATA code, location, type and scalerank were
55+
obtained from the Natural Earth dataset. The only changes made were to
56+
reformat into a CSV format, with the locations expressed in WKT, since the
57+
original format was ESRI Shapefile.
58+
* The city locations from SimpleMaps were download and then only the subset
59+
of cities reasonably identifiable as the closest major city to one of the
60+
existing airport locations was selected and merged into the airports.csv
61+
file. So the edit in question is simply a subset selection.
62+
* The airport elevations were similarly a subset selection of airports from
63+
the 'global airport database' where IATA codes matched the existing set of
64+
airports in our dataset.
65+
* The city boundaries were found from OpenStreetMap. The process to
66+
determine the boundaries was complex, for many reasons including the fact
67+
that OSM data is massive. We downloaded OSM.PDB files for various regions of
68+
the world from the Geofabrik download server at
69+
https://download.geofabrik.de/. For each region we imported the PDF files
70+
into PostGIS, and then searched for appropriate polygons surounding cities
71+
serving the airports of interest. This was achieved using several complex
72+
SQL queries. The selected polygons were further simplified using
73+
`ST_SimplifyPreserveTopology` calls with various resolutions to generate
74+
city polygons of appropriate levels of detail.

0 commit comments

Comments
 (0)