Prefer scipy.fft to numpy.fft #211

mreineck · 2020-07-22T08:19:32Z

I just noticed that the code uses numpy.fft in many places.

If you have the choice, I strongly suggest to use scipy.fft instead, because this is much faster for multi-D transforms. Also the interface is practically identical (in contrast to the older scipy.fftpack module).

The text was updated successfully, but these errors were encountered:

sjperkins · 2020-07-22T08:47:06Z

Thanks for raising this @mreineck.

As I understand it, scipy.fft is pypocketfft? In that case, we'd certainly want to leverage the performance enhancments.

@landmanbester do you think this is something you could address in #204 or do you think that would be outside the scope of the PR? Of course, since you're introducing a dependency on ducc0, you could use the pypocketfft within the gridding functionality.

It seems that scipy fft could be used in the dask wrappers:

https://github.com/ska-sa/codex-africanus/search?q=fft&unscoped_q=fft

scipy is an optional codex dependency. I wonder whether we should force the user to install the scipy in order to use the nifty gridding functionality, or silently fall back to numpy if scipy is not installed? Anyone have strong opinions here?

mreineck · 2020-07-22T09:01:10Z

As I understand it, scipy.fft is pypocketfft?

Yes and no :) It's pypocketfft, but since scipy wheels are built without CPU-specific optimizations it will be somewhat slower than a self-compiled pypocketfft or ducc0.fft package.

To make things even more confusing, numpy.fft is also pocketfft, but based on an old C version. As far as I understand, numpy can't upgrade (at least not for the moment), since they don't accept extensions written in C++.

It seems that scipy fft could be used in the dask wrappers:

In fact I opened another issue earlier today for dask, suggesting that they switch from numpy.fft to scipy.fft internally as well ;-)

mreineck · 2020-07-22T09:34:04Z

In the case that ducc0 becomes a mandatory dependency (e.g. when the w-gridder is included), the best option is probably to use ducc0.fft throughout the code.

landmanbester · 2020-07-22T10:18:48Z

Yep, I could do this in that PR. Maybe we should consider dask wrappers for the FFT's in ducc0? This would be the first step towards recognising that, eventually, we are going to need a distributed FFT. @mreineck do you already have something like that in NIFTy?

mreineck · 2020-07-22T10:28:32Z

@mreineck do you already have something like that in NIFTy?

You mean an FFT for arrays that are distributed over several tasks along one axis?
I have Python code for doing the necessary redistribution; this was originally in NIFTy, but we scrapped this mode of parallelization. I should find it easily in the old commits, though.

Sooner or later I need to have MPI array transposition code in ducc as well, but that will probably not happen very soon.

landmanbester · 2020-07-22T12:57:50Z

I meant any out of memory 2D FFT. I guess splitting up tasks along one of the spatial axes is one way to go. We saw a talk by someone at Cambridge that was trying to do this and he made it look pretty difficult. Philipp told me that is was in principle in NIFTy already, just not the IO part of it. I am mainly thinking about this because of the promised 60kx60k SKA images but I know some wide-field VLBI people that would already find this very useful

mreineck · 2020-07-22T13:48:54Z

I guess splitting up tasks along one of the spatial axes is one way to go.

It's what FFTW does, and it seems to work reasonably well. Of course, a full FFT requires two full array transpositions, which is not cheap compared to the pure FFT computation cost.

As long as the array fits into a single node's memory, I'd definitely prefer a multi-threaded transform over MPI (and perhaps do simultaneous independent FFTs on multiple nodes, if that's possible) ... but 60kx60k is getting close to the limit.

Assuming I had to do work with a huge set of visibilities and such a large image, I'd try the following approach with ducc.wgridder:

determine the extent of the w range
assign equal chunks of the w range to my MPI tasks (1 MPI task per compute node)
put every visibility onto all nodes it can affect (this can be more than one node because it is gridded onto several w planes)
run ms2dirty independently on every task
allreduce_sum the result

This won't scale perfectly, but on the other hand it's really simple to set up while still being quite efficient.

In any case, the only missing ingredient to the kind of MPI-distributed FFT you are looking for is a Python functionality that can transpose a MPI-distributed 2D array. I'm pretty sure this must exist already...

sjperkins · 2020-07-22T18:14:08Z

It's what FFTW does, and it seems to work reasonably well. Of course, a full FFT requires two full array transpositions, which is not cheap compared to the pure FFT computation cost.

As long as the array fits into a single node's memory, I'd definitely prefer a multi-threaded transform over MPI (and perhaps do simultaneous independent FFTs on multiple nodes, if that's possible) ... but 60kx60k is getting close to the limit.

As I understand the problem, the challenge of a distributed FFT is handling the communication pattern inherent in the butterfly diagram:

.

At some point one needs to broadcast all images to all nodes and it just becomes really expensive from a data transfer POV?

mreineck · 2020-07-22T18:30:29Z

As long as your 1D FFTs fit into memory, you shouldn't need to worry about the butterfly.

The normal strategy is:

START: 2D array is distributed over n tasks along axis 0
do the FFT along axis 1 (can be done completely without communication) for the partial array on every task
transpose the array (this requires a fairly nasty all2allv communication in the general case)
do the FFT along axis 1 (which was axis 0 originally) for the partial array on every task
the FFT is now complete
(optional) do another transpose to re-establish the original storage scheme

It's not terribly complicated, but it is indeed expensive.

sjperkins added the enhancement New feature or request label Jul 23, 2020

landmanbester mentioned this issue Jan 5, 2023

Profiling PSF convolve ratt-ru/pfb-imaging#70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefer scipy.fft to numpy.fft #211

Prefer scipy.fft to numpy.fft #211

mreineck commented Jul 22, 2020

sjperkins commented Jul 22, 2020

mreineck commented Jul 22, 2020

mreineck commented Jul 22, 2020

landmanbester commented Jul 22, 2020

mreineck commented Jul 22, 2020

landmanbester commented Jul 22, 2020

mreineck commented Jul 22, 2020

sjperkins commented Jul 22, 2020

mreineck commented Jul 22, 2020

Prefer scipy.fft to numpy.fft #211

Prefer scipy.fft to numpy.fft #211

Comments

mreineck commented Jul 22, 2020

sjperkins commented Jul 22, 2020

mreineck commented Jul 22, 2020

mreineck commented Jul 22, 2020

landmanbester commented Jul 22, 2020

mreineck commented Jul 22, 2020

landmanbester commented Jul 22, 2020

mreineck commented Jul 22, 2020

sjperkins commented Jul 22, 2020

mreineck commented Jul 22, 2020