|
1 | | -astra-toolbox requires cuda 10.2: conda install -c astra-toolbox/label/dev astra-toolbox |
| 1 | +# Fourier Image Transformer |
2 | 2 |
|
3 | | -conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch |
| 3 | +Tim-Oliver Buchholz<sup>1</sup> and Florian Jug<sup>2</sup></br> |
| 4 | +<sup>1</sup>tibuch@mpi-cbg.de, <sup>2</sup>florian.jug@fht.org |
4 | 5 |
|
5 | | -Build Python package: |
6 | | -`python setup.py bdist_wheel` |
| 6 | +Transformer architectures show spectacular performance on NLP tasks and have recently also been used for tasks such as |
| 7 | +image completion or image classification. Here we propose to use a sequential image representation, where each prefix of |
| 8 | +the complete sequence describes the whole image at reduced resolution. Using such Fourier Domain Encodings (FDEs), an |
| 9 | +auto-regressive image completion task is equivalent to predicting a higher resolution output given a low-resolution |
| 10 | +input. Additionally, we show that an encoder-decoder setup can be used to query arbitrary Fourier coefficients given a |
| 11 | +set of Fourier domain observations. We demonstrate the practicality of this approach in the context of computed |
| 12 | +tomography (CT) image reconstruction. In summary, we show that Fourier Image Transformer (FIT) can be used to solve |
| 13 | +relevant image analysis tasks in Fourier space, a domain inherently inaccessible to convolutional architectures. |
7 | 14 |
|
8 | | -Build singularity recipe: |
9 | | -`neurodocker generate singularity -b nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 -p apt --copy /home/tibuch/Gitrepos/FourierImageTransformer/dist/fourier_image_transformers-0.1.24_zero-py3-none-any.whl /fourier_image_transformers-0.1.24_zero-py3-none-any.whl --miniconda create_env=fit conda_install='python=3.7 astra-toolbox pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch -c astra-toolbox/label/dev' pip_install='/fourier_image_transformers-0.1.24_zero-py3-none-any.whl' activate=true --entrypoint "/neurodocker/startup.sh python" > v0.1.24_zero.Singularity` |
| 15 | +Preprint: [arXiv](arXiv) |
10 | 16 |
|
11 | | -Build singularity container: |
12 | | -`sudo singularity build fit_v0.1.24_zero.simg v0.1.24_zero.Singularity` |
| 17 | +## FIT for Super-Resolution |
| 18 | + |
| 19 | + |
| 20 | + |
| 21 | +__FIT for super-resolution.__ Low-resolution input images are first transformed into Fourier space and then unrolled |
| 22 | +into an FDE sequence, as described in Section 3.1 of the paper. This FDE sequence can now be fed to a FIT, that, |
| 23 | +conditioned on this input, extends the FDE sequence to represent a higher resolution image. This setup is trained using |
| 24 | +an FC-Loss that enforces consistency between predicted and ground truth Fourier coefficients. During inference, the FIT |
| 25 | +is conditioned on the first 39 entries of the FDE, corresponding to (a,d) 3x Fourier binned input images. Panels (b,e) |
| 26 | +show the inverse Fourier transform of the predicted output, and panels (c,f) depict the corresponding ground truth. |
| 27 | + |
| 28 | +## FIT for Tomography |
| 29 | + |
| 30 | + |
| 31 | + |
| 32 | +__FIT for computed tomography.__ We propose an encoder-decoder based Fourier Image Transformer setup for tomographic |
| 33 | +reconstruction. In 2D computed tomography, 1D projections of an imaged sample (i.e. the columns of a sinogram) are |
| 34 | +back-transformed into a 2D image. A common method for this transformationis the filtered backprojection (FBP). Since |
| 35 | +each projection maps to a line of coefficients in 2D Fourier space, a limited number of projections in a sinogram leads |
| 36 | +to visible streaking artefacts due to missing/unobserved Fourier coefficients. The idea of our FIT setup is to encode |
| 37 | +all information of a given sinogram and use the decoder to predict missing Fourier coefficients. The reconstructed image |
| 38 | +is then computed via an inverse Fourier transform (iFFT) of these predictions. In order to reduce high frequency |
| 39 | +fluctuations in this result, we introduce a shallow conv-block after the iFFT (shown in black). We train this setup |
| 40 | +combining the FC-Loss, see Section 3.2 in the paper, and a conventional MSE-loss between prediction and ground truth. |
| 41 | + |
| 42 | +## Installation |
| 43 | + |
| 44 | +We use [fast-transformers](https://github.com/idiap/fast-transformers) as underlying transformer implementation. In our super-resolution experiments we use their |
| 45 | +`causal-linear` implementation, which uses custom CUDA code (prediction works without this custom code). This code is |
| 46 | +compiled during the installation of fast-transformers and it is necessary that CUDA and NVIDIA driver versions match. |
| 47 | +For our experiments we used CUDA 10.2 and NVIDIA driver 440.118.02. |
| 48 | + |
| 49 | +We recommend to install Fast Image Transformer into a new [conda](https://docs.conda.io/en/latest/miniconda.html) |
| 50 | +environment: |
| 51 | + |
| 52 | +`conda create -n fit python=3.7` |
| 53 | + |
| 54 | +Next activate the new environment.: |
| 55 | + |
| 56 | +`conda activate fit` |
| 57 | + |
| 58 | +Then we install PyTorch for CUDA 10.2: |
| 59 | + |
| 60 | +`conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch` |
| 61 | + |
| 62 | +Followed by installing fast-transformers: |
| 63 | + |
| 64 | +`pip install --user pytorch-fast-transformers` |
| 65 | + |
| 66 | +Now we have to install the `astra-toolbox`: |
| 67 | + |
| 68 | +`conda install -c astra-toolbox/label/dev astra-toolbox` |
| 69 | + |
| 70 | +And finally we install Fourier Image Transformer: |
| 71 | + |
| 72 | +`pip install fourier-image-transformer` |
| 73 | + |
| 74 | +Start the jupyter server: |
| 75 | + |
| 76 | +`jupyter notebook` |
| 77 | + |
| 78 | + |
| 79 | +## Cite |
| 80 | +``` |
| 81 | +@{} |
| 82 | +``` |
0 commit comments