AMD Naples seismic benchmark

Oil and natural gas exploration -- geology and geophysics
Post Reply
Tesseract
Silver Member
Posts: 19
Joined: Thu Jun 09, 2016 8:17 am
Location: cyberspace

AMD Naples seismic benchmark

Post by Tesseract » Sat Apr 15, 2017 4:57 am

I came across this:
https://www.reddit.com/r/Amd/comments/5 ... e52699_v4/
Not sure of the details, other than AMD used a "seismic analysis workload, which involved multiple iterations of 3D wave equations."
Interesting that the Xeon couldn't swallow the dataset with 384 GiB of RAM, and AMD had 512 GiB.

GuyM
VIP Member
Posts: 504
Joined: Sat Mar 24, 2012 11:35 pm

Re: AMD Naples seismic benchmark

Post by GuyM » Mon Apr 17, 2017 8:23 pm

Been watching this with some interest.

Turns out that high-core count chips (ie >16 cores per CPU) have different bottlenecks when it comes to running parallel processing, and so the underpinning schema you are using for how to split up a problem becomes pretty important.

Used to be that it was around disc I/O, clock speed, the backplane (across the CPUs) and the RAM you had; how well you can shuffle data in and out of RAM is now in that mix.

While trace-by-trace stuff is "embarrassingly parallel" the N-dimensional problems (ie a whole block of traces input in X,Y, offset to get one output) are not - and you need to run *at scale* to find any choke points - all those little linear components add up to a non-linear system that's data and parameter dependent (number of samples, geometry, imaging parameters)

We've been hammering a Lenovo P910 with the Intel E5-2600 v4 series processors (40 cores total!) mapping some of that out and adjusting; its a bit different to running on a parallel cluster of say 5CPUs x 8 cores because of the memory bus and cache.

So - we now have some self-loading/self-optimsing "smarts" in there so the system can optimize for RAM or core count dynamically while its going, plus a smarter set up around managing the RAM when its all associated with a single (or paired) CPU "blade" - looking forward to see what these AMD chips can do though!

Bottom line might be a 500sqkm 3DpreSTM over a weekend or less, on a US$15K desktop box. Golly.

Tesseract
Silver Member
Posts: 19
Joined: Thu Jun 09, 2016 8:17 am
Location: cyberspace

Re: AMD Naples seismic benchmark

Post by Tesseract » Mon Jul 31, 2017 6:10 am

Looking back at this, I noticed that they used open64 compiler - not updated since 2013! Now that only supports Piledriver CPU. On the other hand, it would not generate optimum code for Intel Broadwell either. AFAIK, AMD CPU can only manage 16 FLOPS/cycle because of their half-arse AVX. Broadwell Xeon can do 32 FLOPS/cycle, although it has to slow down 18% on AVX2.

Tesseract
Silver Member
Posts: 19
Joined: Thu Jun 09, 2016 8:17 am
Location: cyberspace

Re: AMD Naples seismic benchmark

Post by Tesseract » Sat Aug 12, 2017 7:47 am

I just came across a review of AMD Threadripper in tom's HARDWARE (August 10) and it listed as one of the benchmarks Kirchoff Depth Migration!
It comfortably beat Intel's i9. Elsewhere in the article it revealed that Threadripper has rather poor core-to-core latency, which means it will be slower for some things.

GuyM
VIP Member
Posts: 504
Joined: Sat Mar 24, 2012 11:35 pm

Re: AMD Naples seismic benchmark

Post by GuyM » Sat Aug 12, 2017 4:50 pm

Core-to-core latency sounds like the *opposite* issue.

Or rather, the general case for (large scale) parallel processing is a lot of "blades" each with a relatively small number of cores; you might have multiple CPUs per blade, but in general there were "cores in the CPU", "cores on this blade" and "cores being used in the system"

Blade-to-blade latency was at the network speed; there were fancy setups (eg SGI) that had ultrafast networks so that all the cores could access all of the RAM, for example.

So with all of these things performance comes down to your code architecture in part - and how much control the user has to vary how the algorithm works to adapt to their specific hardware, if you want to run generically.

Be interesting to see how this performs and whether it means we have to adapt our software...

Post Reply