Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions DATA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
### Data Storage

Paths are abbreviated: `LOCAL_DIR` is your local machine, `SERVER_DIR` is the polygon server

## Part 1 — Move raw videos and process Lookit JSON

Level 1 · Move raw video and conversion:

```
LOCAL_DIR/raw/raw_videos/*
│ upload
SERVER_DIR/data/raw/original_videos/webm/*
│ convert
SERVER_DIR/data/raw/original_videos/mp4_converted/*
```

Level 2 · Lookit JSON → trials CSV:

```
LOCAL_DIR/data/raw/lookit/sample#/input_lookit_study.json
│ clean & format
SERVER_DIR/data/main/data_to_analyze/lookit_study.json
```

## Part 2 — Run iCatcher+ over converted videos

```
SERVER_DIR/data/raw/original_videos/mp4_converted/*
│ iCatcher+
├──────► SERVER_DIR/data/raw/icatcher_videos/*
└──────► SERVER_DIR/data/raw/icatcher_annotations/*
└──────► SERVER_DIR/data/main/data_to_analyze/level-looks_source-lookit_data.csv

```

## Part 3 — Process iCatcher output into looks CSV

```
SERVER_DIR/data/raw/icatcher_annotations/*
│ process (through jupyter notebook)
SERVER_DIR/data/main/data_to_analyze/level-looks_source-icatcher_data.csv
```

## Part 4 - WIP


### Local Repo Structure
```
visual-precision/
├── analysis/ # Part 4 analysis and model similarities
├── data/
│ ├── embeddings/ # embeddings for current sample
│ ├── main/ # local copies of processed iCatcher and Lookit data
│ ├── metadata/
│ ├── pilot/ # pilot data
│ └── raw/ # videos placed in part 1
├── experiment/ # image pairs used
├── figures/ # final-stage graphs for publication
├── models/ # model information
├── preprocessing/ # primary preprocessing scripts
├── stimuli/
├── writing/
├── .env_template
├── .gitignore
├── preprocess.py # Part 1
├── README.md
└── requirements.txt
```

### Server Repo Structure
```
visual-precision/
├── analysis/ # R scripts and results
├── data/
│ ├── embeddings/ # model embedding results
│ ├── main/ # processed iCatcher and Lookit data (CSVs)
│ ├── metadata/
│ ├── pilot/ # pilot data and analysis for comparison
│ └── raw/
│ ├── icatcher_annotations/ # frame-by-frame gaze data
│ ├── icatcher_videos/ # videos with gaze overlay
│ ├── lookit/ # Lookit data from Part 2, giftcard scripts
│ └── original_videos/ # webm and mp4 videos
├── frames/ # image pairs generated
├── models/ # model information
├── preprocessing/ # backup copy of preprocessing scripts
├── stimuli/ # images used for testing
├── writing/ # drafts
├── config.py
├── dataset_description.json
├── preprocess.py
├── README.md
└── requirements.txt
```
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ Since the videos we collected are inherently identifiable (and large) we cannot

1. Downloading the video ZIP and trial JSON files from Children Helping Science.
- unzip the videos and store them in `data/raw/raw_videos` locally
- Place the trial JSON file as `data/lookit/<sample>/input_lookit_study_data.json` on the server, where sample is either 'sample1' or 'sample2' depending on which sample you are processing.
- Install `ffmpeg` to be able to convert webm to mp4
- Place the trial JSON file as `data/lookit/<sample>/input_lookit_study.json` **locally**, where sample is either 'sample1' or 'sample2' depending on which sample you are processing.
- Connect to VPN and Polygon
- Copy over the `.env_template` file into a `.env` file, filling out the rows as required.
- Run `preprocess.py` (which calls `preprocessing/utils/move_to_polygon.py` and `preprocessing/1_preprocess_raw_data.py`) to move the videos to the server and then format the raw videos and clean the Lookit JSON file.
Expand All @@ -42,7 +43,6 @@ Since the videos we collected are inherently identifiable (and large) we cannot
- Navigate to `preprocessing/2_run_icatcher`
- Activate the conda environment `conda activate visualprecision`
- Install the requirements `pip install -r requirements.txt`
- Install `ffmpeg` to be able to convert webm to mp4
- Run `python run_icatcher_local.py --gpu_id 0` on a server with a GPU like Tversky.
- See `preprocessing/2_run_icatcher/README.md` for a more detailed setup instruction and troubleshooting if needed.

Expand Down