|
| 1 | +## Geospatial data ingest for ES|QL |
| 2 | + |
| 3 | +This folder contains sample data used in the blogs: |
| 4 | + |
| 5 | +* [Geospatial data ingest for ES|QL](https://www.elastic.co/search-labs/blog/geospatial-data-ingest) |
| 6 | +* [Geospatial search with ES|QL](https://www.elastic.co/search-labs/blog/esql-geospatial-search-part-one) |
| 7 | +* [Geospatial distance search with ES|QL](https://www.elastic.co/search-labs/blog/esql-geospatial-search-part-two) |
| 8 | + |
| 9 | +The data we used for the examples in these blogs were based on data we use internally for integration tests. |
| 10 | +This in turn was created by merging data from a few different souces with |
| 11 | +the goal of creating datasets appropriate for testing a number of specific |
| 12 | +ES|QL features as they were developed. |
| 13 | + |
| 14 | +* [airports.csv](https://raw.githubusercontent.com/elastic/elasticsearch-labs/tree/main/supporting-blog-content/geospatial-data-ingest/airports/airports.csv) |
| 15 | + * This contains a merger of three datasets: |
| 16 | + * Airports (names, locations and related data) from [Natural Earth](https://www.naturalearthdata.com/downloads/10m-cultural-vectors/airports/) |
| 17 | + * City locations from [SimpleMaps](https://simplemaps.com/data/world-cities) |
| 18 | + * Airport elevations from [The global airport database](https://www.partow.net/miscellaneous/airportdatabase/) |
| 19 | +* [airport_city_boundaries.csv](https://raw.githubusercontent.com/elastic/elasticsearch-labs/tree/main/supporting-blog-content/geospatial-data-ingest/airports/airport_city_boundaries.csv) |
| 20 | + * This contains a merger of airport and city names from above with one new source: |
| 21 | + * City boundaries from [OpenStreetMap](https://www.openstreetmap.org/) |
| 22 | + |
| 23 | +### Licensing and attribution |
| 24 | + |
| 25 | +Since this data is derived from various sources, we list the license |
| 26 | +conditions for each, some of which also require attribution: |
| 27 | + |
| 28 | +* Natural Earth (original dataset with airport names, locations and related |
| 29 | + information) |
| 30 | + * Released in the public domain with no requirement for attribution. See |
| 31 | + https://www.naturalearthdata.com/about/terms-of-use/. |
| 32 | +* SimpleMaps (city locations) |
| 33 | + * License: Creative Commons Attribution 4.0 license as described at: https://creativecommons.org/licenses/by/4.0/ |
| 34 | + * Requires attribution and link to https://simplemaps.com/data/world-cities |
| 35 | +* Global Airport Database (airport elevations) |
| 36 | + * Licensed under MIT license - http://www.opensource.org/licenses/MIT |
| 37 | + * Requires attribution: https://www.partow.net/miscellaneous/airportdatabase/ |
| 38 | +* OpenStreetMap (city boundary polygons) |
| 39 | + * Licensed with Open Database License (ODbL) v1.0 - https://opendatacommons.org/licenses/odbl/1-0/ |
| 40 | + * Requires attribution |
| 41 | + * Requires that derived data be shared under a compatible license |
| 42 | + |
| 43 | +The most restrictive license here is the [Open Database License (ODbL) |
| 44 | +v1.0](https://opendatacommons.org/licenses/odbl/1-0/) because it as a |
| 45 | +'ShareAlike' clause. For this reason we are sharing this combined dataset |
| 46 | +using the same license. |
| 47 | + |
| 48 | +### Changes made to the data |
| 49 | + |
| 50 | +Several of the above licenses also require that any changes made from the |
| 51 | +original source data are clearly indicated. This can be done by explaining |
| 52 | +the process used to create the final results: |
| 53 | + |
| 54 | +* The original airports name, IATA code, location, type and scalerank were |
| 55 | + obtained from the Natural Earth dataset. The only changes made were to |
| 56 | + reformat into a CSV format, with the locations expressed in WKT, since the |
| 57 | + original format was ESRI Shapefile. |
| 58 | +* The city locations from SimpleMaps were download and then only the subset |
| 59 | + of cities reasonably identifiable as the closest major city to one of the |
| 60 | + existing airport locations was selected and merged into the airports.csv |
| 61 | + file. So the edit in question is simply a subset selection. |
| 62 | +* The airport elevations were similarly a subset selection of airports from |
| 63 | + the 'global airport database' where IATA codes matched the existing set of |
| 64 | + airports in our dataset. |
| 65 | +* The city boundaries were found from OpenStreetMap. The process to |
| 66 | + determine the boundaries was complex, for many reasons including the fact |
| 67 | + that OSM data is massive. We downloaded OSM.PDB files for various regions of |
| 68 | + the world from the Geofabrik download server at |
| 69 | + https://download.geofabrik.de/. For each region we imported the PDF files |
| 70 | + into PostGIS, and then searched for appropriate polygons surounding cities |
| 71 | + serving the airports of interest. This was achieved using several complex |
| 72 | + SQL queries. The selected polygons were further simplified using |
| 73 | + `ST_SimplifyPreserveTopology` calls with various resolutions to generate |
| 74 | + city polygons of appropriate levels of detail. |
0 commit comments