Releases: Kaszanas/DatasetPreparator
2.1.0 DatasetPreparator Release
What's Changed
- [PR] Moving Maps to SC2 Installation Directory by @Kaszanas in #75
- [PR] Automatic SC2 Cache Seeding by @Kaszanas in #73
- [PR] Creating directories if they do not exist by @Kaszanas in #77
- [PR] Type Hints Without Imports by @Kaszanas in #79
- [PR] Preparing 2.1.0 Release by @Kaszanas in #80
Full Changelog: 2.0.0...2.1.0
DatasetPreparator
This project contains various scripts that can assist in the process of preparing datasets. To have a broad overview of the tools please refer to the Detailed Tools Description.
Tools in this repository were used to create the SC2ReSet: StarCraft II Esport Replaypack Set, and finally SC2EGSet: StarCraft II Esport Game State Dataset, citation information Cite Us!.
Installation
Note
To run this project there are some prerequisites that you need to have installed on your system:
- Docker
- Optional make (if you do not wish to use make, please refer to the commands defined in the
makefileand run them manually)
Our prefered way of distributing the toolset is through DockerHub. We use the Docker Image to provide a fully reproducible environment for our scripts.
To pull the image from DockerHub, run the following command:
docker pull kaszanas/datasetpreparator:latestIf you wish to clone the repository and build the Docker image yourself, run the following command:
make docker_buildAfter building the image, please refer to the Command Line Arguments Usage section for the usage of the scripts and for a full description for each of the scripts refer to Detailed Tools Description.
Command Line Arguments Usage
When using Docker, you will have to pass the arguments through the docker run command and mount the input/output directory. Below is an example of how to run the directory_flattener script using Docker. For ease of use we have prepared example directory structure in the processing directory. The command below uses that to issue a command to flatten the directory structure:
docker run -rm \
-v ".\processing:/app/processing" \
datasetpreparator:latest \
python3 directory_flattener.py \
--input_path ./processing/input/directory_flattener \
--output_path ./processing/output/directory_flattener \
--n_processes 8 \
--force_overwrite True \SC2EGSet Dataset Reproduction Steps
Note
Instructions below are for reproducing the result of the SC2EGSet dataset. If you wish to use the tools in this repository separately for your own dataset, please refer to the Detailed Tools Description.
Using Docker
We provide a release image containing all of the scripts. To see the usage of these scripts please refer to their respective README.md files as described in Detailed Tools Description.
The following steps were used to prepare the SC2ReSet and SC2EGSet datasets:
- Build the docker image for the DatasetPreparator using the provided
makefiletarget:make docker_build. This will load all of the dependencies such as the SC2InfoExtractorGo. - Place the input replaypacks into
./processing/input/directory_flattenerdirectory. - Run the command
make sc2reset_sc2egset_pipelineto process the replaypacks and create the datasets. The output will be placed in./processing/output/SC2ReSetand./processing/output/SC2EGSetdirectories.
Detailed Tools Description
Each of the scripts has its usage described in their respective README.md files, you can find the detailed description of the available tools below.
CLI Usage; Generic scripts
- Directory Packager (dir_packager): README
- Directory Flattener (directory_flattener): README
- File Renamer (file_renamer): README
- JSON Merger (json_merger): README
- Processed Mapping Copier (processed_mapping_copier): README
CLI Usage; StarCraft 2 Specific Scripts
- SC2 Map Downloader (sc2_map_downloader): README
- SC2EGSet Pipeline (sc2egset_pipeline): README
- SC2EGSet Replaypack Processor (sc2egset_replaypack_processor): README
- SC2ReSet Replaypack Downloader (sc2reset_replaypack_downloader): README
Contributing and Reporting Issues
If you want to report a bug, request a feature, or open any other issue, please do so in the issue tracker.
Please see CONTRIBUTING.md for detailed development instructions and contribution guidelines.
Cite Us!
This Repository
SC2EGSet: Dataset Description
@article{Bialecki2023_SC2EGSet,
author = {Bia{\l}ecki, Andrzej
and Jakubowska, Natalia
and Dobrowolski, Pawe{\l}
and Bia{\l}ecki, Piotr
and Krupi{\'{n}}ski, Leszek
and Szczap, Andrzej
and Bia{\l}ecki, Robert
and Gajewski, Jan},
title = {SC2EGSet: StarCraft II Esport Replay and Game-state Dataset},
journal = {Scientific Data},
year = {2023},
month = {Sep},
day = {08},
volume = {10},
number = {1},
pages = {600},
issn = {2052-4463},
doi = {10.1038/s41597-023-02510-7},
url = {https://doi.org/10.1038/s41597-023-02510-7}
}
@software{Białecki_2022_6366039,
author = {Białecki, Andrzej and
Białecki, Piotr and
Krupiński, Leszek},
title = {{Kaszanas/SC2DatasetPreparator: 1.2.0
SC2DatasetPreparator Release}},
month = {jun},
year = {2022},
publisher = {Zenodo},
version = {2.0.1},
doi = {10.5281/zenodo.5296664},
url = {https://doi.org/10.5281/zenodo.5296664}
}
2.0.0 DatasetPreparator Release
What's Changed
- Attempt to fix the CLA bot by @Ostrzyciel in #23
- CI and Makefile consistency fixes by @Ostrzyciel in #27
- [PR] Breaking Changes to DEV, Change CLA Configuration on main by @Kaszanas in #30
- Make the CI pass by @Ostrzyciel in #32
- Improve the README a bit by @Ostrzyciel in #35
- [PR] Refactoring Replaypack Processor, Quality Changes by @Kaszanas in #36
- [PR] Draft of Docker Docs by @Kaszanas in #41
- [PR] Bumped Dependency Versions by @Kaszanas in #45
- [PR] Renamed dir_packager to directory_packager by @Kaszanas in #47
- [PR] Mounting curdir as a dot by @Kaszanas in #49
- [PR] Default Flag Values for Golang by @Kaszanas in #52
- [PR] Ran Tests, Fixing Commands by @Kaszanas in #54
- [PR] Copying Scripts to Top in Docker Images by @Kaszanas in #56
- [PR] Added Docker Releases On main,dev Push and Release by @Kaszanas in #58
- [PR] Added Maps Needed for SC2InfoExtractorGo by @Kaszanas in #60
- [PR] 63 prompt user possible overwrite by @Kaszanas in #64
- 65 parallel directory flattener by @Kaszanas in #67
- [PR] Multithreading in Directory Packager by @Kaszanas in #68
- [PR] Attempting to Fix Tests by @Kaszanas in #69
- [PR] 37 sc2egset processing pipeline by @Kaszanas in #70
- [PR] Refined Documentation by @Kaszanas in #71
- [PR] Preparing 2.0 Release by @Kaszanas in #39
New Contributors
- @Ostrzyciel made their first contribution in #23
Full Changelog: 1.2.0...2.0.0
DatasetPreparator
This project contains various scripts that can assist in the process of preparing datasets. To have a broad overview of the tools please refer to the Detailed Tools Description.
Tools in this repository were used to create the SC2ReSet: StarCraft II Esport Replaypack Set, and finally SC2EGSet: StarCraft II Esport Game State Dataset, citation information Cite Us!.
Installation
Note
To run this project there are some prerequisites that you need to have installed on your system:
- Docker
- Optional make (if you do not wish to use make, please refer to the commands defined in the
makefileand run them manually)
Our prefered way of distributing the toolset is through DockerHub. We use the Docker Image to provide a fully reproducible environment for our scripts.
To pull the image from DockerHub, run the following command:
docker pull kaszanas/datasetpreparator:latestIf you wish to clone the repository and build the Docker image yourself, run the following command:
make docker_buildAfter building the image, please refer to the Command Line Arguments Usage section for the usage of the scripts and for a full description for each of the scripts refer to Detailed Tools Description.
Command Line Arguments Usage
When using Docker, you will have to pass the arguments through the docker run command and mount the input/output directory. Below is an example of how to run the directory_flattener script using Docker. For ease of use we have prepared example directory structure in the processing directory. The command below uses that to issue a command to flatten the directory structure:
docker run -rm \
-v ".\processing:/app/processing" \
datasetpreparator:latest \
python3 directory_flattener.py \
--input_path ./processing/input/directory_flattener \
--output_path ./processing/output/directory_flattener \
--n_processes 8 \
--force_overwrite True \SC2EGSet Dataset Reproduction Steps
Note
Instructions below are for reproducing the result of the SC2EGSet dataset. If you wish to use the tools in this repository separately for your own dataset, please refer to the Detailed Tools Description.
Using Docker
We provide a release image containing all of the scripts. To see the usage of these scripts please refer to their respective README.md files as described in Detailed Tools Description.
The following steps were used to prepare the SC2ReSet and SC2EGSet datasets:
- Build the docker image for the DatasetPreparator using the provided
makefiletarget:make docker_build. This will load all of the dependencies such as the SC2InfoExtractorGo. - Place the input replaypacks into
./processing/input/directory_flattenerdirectory. - Run the command
make sc2reset_sc2egset_pipelineto process the replaypacks and create the datasets. The output will be placed in./processing/output/SC2ReSetand./processing/output/SC2EGSetdirectories.
Detailed Tools Description
Each of the scripts has its usage described in their respective README.md files, you can find the detailed description of the available tools below.
CLI Usage; Generic scripts
- Directory Packager (dir_packager): README
- Directory Flattener (directory_flattener): README
- File Renamer (file_renamer): README
- JSON Merger (json_merger): README
- Processed Mapping Copier (processed_mapping_copier): README
CLI Usage; StarCraft 2 Specific Scripts
- SC2 Map Downloader (sc2_map_downloader): README
- SC2EGSet Pipeline (sc2egset_pipeline): README
- SC2EGSet Replaypack Processor (sc2egset_replaypack_processor): README
- SC2ReSet Replaypack Downloader (sc2reset_replaypack_downloader): README
Contributing and Reporting Issues
If you want to report a bug, request a feature, or open any other issue, please do so in the issue tracker.
Please see CONTRIBUTING.md for detailed development instructions and contribution guidelines.
Cite Us!
This Repository
[SC2EGSet: Dataset Description](https://www.researc...
1.2.0 SC2DatasetPreparator Release
SC2DatasetPreparator
This repository contains tools which can be used to create an StarCraft II dataset. The following steps are suggested:
- Obtain a number of replays to process. This can be a replaypack or your own replay folder.
- Download latest version of SC2InfoExtractorGo, or build it from source.
- Optional Using
src/directory_flattener.pyFlatten the directory structure and save the old directory tree to a mapping:{"replayUniqueHash": "whereItWasInOldStructure"}. This is required in order to properly use the SC2InfoExtractorGo. - Optional Use the map downloader
src/sc2_map_downloader.pyto download maps that were used in the replays that you obtained. - Optional Use the SC2MapLocaleExtractor to obtain the mapping of
{"foreign_map_name": "english_map_name"}which is required for the SC2InfoExtractorGo to translate the map names. - Perform replaypack processing using
src/sc2_replaypack_processor.pywith the SC2InfoExtractorGo in PATH, or next to the script. - Optional Using the
src/file_renamer.py, rename the files that were generated in step 5. - Using the
src/file_packager.py, create .zip archives containing the datasets and the supplementary files.
Customization
In order to specify different processing flags for https://github.com/Kaszanas/SC2InfoExtractorGo please modify the src/sc2_replaypack_processor file directly
Usage
Before using this software please install Python >= 3.10 and requirements.txt.
Please keep in mind that src/directory_flattener.py does not contain default flag values and can be customized with the following command line flags:
usage: directory_flattener.py [-h] [--input_path INPUT_PATH] [--output_path OUTPUT_PATH]
[--file_extension FILE_EXTENSION]
Directory restructuring tool used in order to flatten the structure, map the old structure to a separate
file, and for later processing with other tools. Created primarily to define StarCraft 2 (SC2) datasets.
options:
-h, --help show this help message and exit
--input_path INPUT_PATH (default = ../processing/directory_flattener/input)
Please provide input path to the dataset that is going to be processed.
--output_path OUTPUT_PATH (default = ../processing/directory_flattener/output)
Please provide output path where sc2 map files will be downloaded.
--file_extension FILE_EXTENSION (default = .SC2Replay)
Please provide a file extension for files that will be moved and renamed.
Please keep in mind that the src/sc2_map_downloader.py does not contain default flag values and can be customized with the following command line flags:
usage: sc2_map_downloader.py [-h] [--input_path INPUT_PATH] [--output_path OUTPUT_PATH]
Tool for downloading StarCraft 2 (SC2) maps based on the data that is available within .SC2Replay file.
options:
-h, --help show this help message and exit
--input_path INPUT_PATH (default = ../processing/directory_flattener/output)
Please provide input path to the dataset that is going to be processed.
--output_path OUTPUT_PATH (default = ../processing/sc2_map_downloader/output)
Please provide output path where sc2 map files will be downloaded.
Please keep in mind that the src/sc2_replaypack_processor.py contains default flag values and can be customized with the following command line flags:
usage: sc2_replaypack_processor.py [-h] [--input_dir INPUT_DIR] [--output_dir OUTPUT_DIR]
[--n_processes N_PROCESSES]
Tool used for processing StarCraft 2 (SC2) datasets. with https://github.com/Kaszanas/SC2InfoExtractorGo
options:
-h, --help show this help message and exit
--input_dir INPUT_DIR (default = ../processing/directory_flattener/output)
Please provide input path to the directory containing the dataset that is going to be processed.
--output_dir OUTPUT_DIR (default = ../processing/sc2_replaypack_processor/output)
Please provide an output directory for the resulting files.
--n_processes N_PROCESSES (default = 4)
Please provide the number of processes to be spawned for the dataset processing.
Please keep in mind that the src/file_renamer.py contains default flag values and can be customized with the following command line flags:
usage: file_renamer.py [-h] [--input_dir INPUT_DIR]
Tool used for processing StarCraft 2 (SC2) datasets with https://github.com/Kaszanas/SC2InfoExtractorGo
options:
-h, --help show this help message and exit
--input_dir INPUT_DIR (default = ../processing/sc2_replaypack_processor/output)
Please provide input path to the directory containing the dataset that is going to be processed.
Please keep in mind that the src/file_packager.py contains default flag values and can be customized with the following command line flags:
usage: file_packager.py [-h] [--input_dir INPUT_DIR]
Tool used for processing StarCraft 2 (SC2) datasets. with https://github.com/Kaszanas/SC2InfoExtractorGo
options:
-h, --help show this help message and exit
--input_dir INPUT_DIR (default = ../processing/sc2_replaypack_processor/output)
Please provide input path to the directory containing the dataset that is going to be processed by packaging into .zip archives.
Please keep in mind that the src/json_merger.py contains default flag values and can be customized with the following command line flags:
usage: json_merger.py [-h] [--json_one JSON_ONE] [--json_two JSON_TWO] [--output_filepath OUTPUT_FILEPATH]
Tool used for merging two .json files. Created in order to merge two mappings created by
https://github.com/Kaszanas/SC2MapLocaleExtractor
options:
-h, --help show this help message and exit
--json_one JSON_ONE (default = ../processing/json_merger/json1.json)
Please provide the path to the first .json file that is going to be merged.
--json_two JSON_TWO (default = ../processing/json_merger/json2.json)
Please provide the path to the second .json file that is going to be merged.
--output_filepath OUTPUT_FILEPATH (default = ../processing/json_merger/merged.json)
Please provide output path where sc2 map files will be downloaded.
Please keep in mind that the src/processed_mapping_copier.py contains default flag values and can be customized with the following command line flags:
usage: processed_mapping_copier.py [-h] [--input_path INPUT_PATH] [--output_path OUTPUT_PATH]
Tool for copying the processed_mapping.json files that are required to define the StarCraft 2 (SC2) dataset.
options:
-h, --help show this help message and exit
--input_path INPUT_PATH (default = ../processing/directory_flattener/output)
Please provide input path to the flattened replaypacks that contain
procesed_mapping.json files.
--output_path OUTPUT_PATH (default = ../processing/sc2_replaypack_processor/output)
Please provide output path where processed_mapping.json will be copied.
1.1.0 SC2DatasetPreparator Release
SC2DatasetPreparator
This repository contains tools which can be used to create an StarCraft II dataset. The following steps are suggested:
- Obtain replays to process. This can be a replaypack or your own replay folder.
- Download latest version of SC2InfoExtractorGo, or build it from source.
- Optional If the replays that you have are held in nested directories it is best to use
src/directory_flattener.py. This will copy the directory and place all of the files to the top directory where it can be further processed. In order to preserve the old directory structure, a .json file is created. The file contains the old directory tree to a mapping:{"replayUniqueHash": "whereItWasInOldStructure"}. This step is is required in order to properly use SC2InfoExtractorGo as it only lists the files immediately available on the top level of the input directory. SC2InfoExtractorGo. - Optional Use the map downloader
src/sc2_map_downloader.pyto download maps that were used in the replays that you obtained. This is required for the next step. - Optional Use the SC2MapLocaleExtractor to obtain the mapping of
{"foreign_map_name": "english_map_name"}which is required for the SC2InfoExtractorGo to translate the map names in the output .json files. - Perform replaypack processing using
src/sc2_replaypack_processor.pywith the SC2InfoExtractorGo placed in PATH, or next to the script. - Optional Using the
src/file_renamer.py, rename the files that were generated in the previous step. This is not required and is done to increase the readibility of the directory structure for the output. - Using the
src/file_packager.py, create .zip archives containing the datasets and the supplementary files. By finishing this stage, your dataset should be ready to upload.
Customization
In order to specify different processing flags for https://github.com/Kaszanas/SC2InfoExtractorGo please modify the src/sc2_replaypack_processor file directly
Usage
Before using this software please install Python >= 3.10 and requirements.txt.
Please keep in mind that src/directory_flattener.py does not contain default flag values and can be customized with the following command line flags:
usage: directory_flattener.py [-h] [--input_path INPUT_PATH] [--output_path OUTPUT_PATH]
[--file_extension FILE_EXTENSION]
Directory restructuring tool used in order to flatten the structure, map the old structure to a separate
file, and for later processing with other tools. Created primarily to define StarCraft 2 (SC2) datasets.
options:
-h, --help show this help message and exit
--input_path INPUT_PATH (default = ../processing/directory_flattener/input)
Please provide input path to the dataset that is going to be processed.
--output_path OUTPUT_PATH (default = ../processing/directory_flattener/output)
Please provide output path where sc2 map files will be downloaded.
--file_extension FILE_EXTENSION (default = .SC2Replay)
Please provide a file extension for files that will be moved and renamed.
Please keep in mind that the src/sc2_map_downloader.py does not contain default flag values and can be customized with the following command line flags:
usage: sc2_map_downloader.py [-h] [--input_path INPUT_PATH] [--output_path OUTPUT_PATH]
Tool for downloading StarCraft 2 (SC2) maps based on the data that is available within .SC2Replay file.
options:
-h, --help show this help message and exit
--input_path INPUT_PATH (default = ../processing/directory_flattener/output)
Please provide input path to the dataset that is going to be processed.
--output_path OUTPUT_PATH (default = ../processing/sc2_map_downloader/output)
Please provide output path where sc2 map files will be downloaded.
Please keep in mind that the src/sc2_replaypack_processor.py contains default flag values and can be customized with the following command line flags:
usage: sc2_replaypack_processor.py [-h] [--input_dir INPUT_DIR] [--output_dir OUTPUT_DIR]
[--n_processes N_PROCESSES]
Tool used for processing StarCraft 2 (SC2) datasets. with https://github.com/Kaszanas/SC2InfoExtractorGo
options:
-h, --help show this help message and exit
--input_dir INPUT_DIR (default = ../processing/directory_flattener/output)
Please provide input path to the directory containing the dataset that is going to be processed.
--output_dir OUTPUT_DIR (default = ../processing/sc2_replaypack_processor/output)
Please provide an output directory for the resulting files.
--n_processes N_PROCESSES (default = 4)
Please provide the number of processes to be spawned for the dataset processing.
Please keep in mind that the src/file_renamer.py contains default flag values and can be customized with the following command line flags:
usage: file_renamer.py [-h] [--input_dir INPUT_DIR]
Tool used for processing StarCraft 2 (SC2) datasets with https://github.com/Kaszanas/SC2InfoExtractorGo
options:
-h, --help show this help message and exit
--input_dir INPUT_DIR (default = ../processing/sc2_replaypack_processor/output)
Please provide input path to the directory containing the dataset that is going to be processed.
Please keep in mind that the src/file_packager.py contains default flag values and can be customized with the following command line flags:
usage: file_packager.py [-h] [--input_dir INPUT_DIR]
Tool used for processing StarCraft 2 (SC2) datasets. with https://github.com/Kaszanas/SC2InfoExtractorGo
options:
-h, --help show this help message and exit
--input_dir INPUT_DIR (default = ../processing/sc2_replaypack_processor/output)
Please provide input path to the directory containing the dataset that is going to be processed by packaging into .zip archives.
Please keep in mind that the src/json_merger.py contains default flag values and can be customized with the following command line flags:
usage: json_merger.py [-h] [--json_one JSON_ONE] [--json_two JSON_TWO] [--output_filepath OUTPUT_FILEPATH]
Tool used for merging two .json files. Created in order to merge two mappings created by
https://github.com/Kaszanas/SC2MapLocaleExtractor
options:
-h, --help show this help message and exit
--json_one JSON_ONE (default = ../processing/json_merger/json1.json)
Please provide the path to the first .json file that is going to be merged.
--json_two JSON_TWO (default = ../processing/json_merger/json2.json)
Please provide the path to the second .json file that is going to be merged.
--output_filepath OUTPUT_FILEPATH (default = ../processing/json_merger/merged.json)
Please provide output path where sc2 map files will be downloaded.
Please keep in mind that the src/processed_mapping_copier.py contains default flag values and can be customized with the following command line flags:
usage: processed_mapping_copier.py [-h] [--input_path INPUT_PATH] [--output_path OUTPUT_PATH]
Tool for copying the processed_mapping.json files that are required to define the StarCraft 2 (SC2) dataset.
options:
-h, --help show this help message and exit
--input_path INPUT_PATH (default = ../processing/directory_flattener/output)
Please provide input path to the flattened replaypacks that contain
procesed_mapping.json files.
--output_path OUTPUT_PATH (default = ../processing/sc2_replaypack_processor/output)
Please provide output path where processed_mapping.json will be copied.
1.0.0 SC2DatasetPreparator Release
SC2DatasetPreparator
This repository contains tools which can be used in order to perform the following steps:
- Using
src/directory_flattener.pyFlatten the directory structure and save the old directory tree to a mapping of{"replayUniqueHash": "whereItWasInOldStructure"} - Using
src/sc2_replaypack_processorPerform replaypack processing with https://github.com/Kaszanas/SC2InfoExtractorGo
Customization
In order to specify different processing flags for https://github.com/Kaszanas/SC2InfoExtractorGo please modify the src/sc2_replaypack_processor file directly
Usage
Before using this software please install Python >= 3.7 and requirements.txt.
Please keep in mind that src/directory_flattener.py does not contain default flag values and can be customized with the following command line flags:
usage: directory_flattener.py [-h] [--input_path INPUT_PATH]
[--file_extension FILE_EXTENSION]
Directory restructuring tool used in order to flatten the structure, map the
old structure to a separate file, and for later processing with other tools.
optional arguments:
-h, --help show this help message and exit
--input_path INPUT_PATH
Please provide input path to the dataset that is going
to be processed.
--file_extension FILE_EXTENSION
Please provide a file extension for files that will be
moved and renamed.
Please keep in mind that the src/sc2_replaypack_processor.py does not contain default flag values and can be customized with the following command line flags:
Tool used for processing SC2 datasets. with
https://github.com/Kaszanas/SC2InfoExtractorGo
optional arguments:
-h, --help show this help message and exit
--input_dir INPUT_DIR
Please provide input path to the directory containing
the dataset that is going to be processed.
--output_dir OUTPUT_DIR
Please provide an output directory for the resulting
files.
--number_of_processes NUMBER_OF_PROCESSES
Please provide the number of processes to be spawn for
the dataset processing.