CSCI 8970 – Colloquium Series – Fall 2010 – Twelve Event
The Exaflop/s: Why and How
Monday, November 29, 2010
Presenter: David Keyes, KAUST and Columbia University
Dr. Keyes lecture focused on Exaflops and the fourth paradigm, a movement towards meta-tags and improved classification of data. He began by showing what the increased accumulation of data has meant for the oil industry. They can now better solve an inverse problem or perform data assimilation, combine multiple complex models and quantify uncertainty. Through the GigaPOWERS Impact program, they went from a Mega-Cell to a Giga-Cell which helped find more oil patches. Oil companies are vast, with dozens of reservoirs “upstream” and many refineries and transportation systems “downstream”. Yet while the simulator is only a small piece of their puzzle and is already heavily taxing their computers. As a consequence, how should such computer resources be managed? Data is increasingly too complex for a self-contained, self-consistent theory. In addition, while simulations are useful, data mining is appearing to be more useful.
Through IESP road mapping they listed 19 candidates for exascale applications including magnetically confined fusion, molecular dynamics, climate, combustion, aerodynamics, among other areas. There needs to be improvement in various areas such as finding dominant resources, improving memory bandwidth, flop/s per byte of storage, I/O versus computation (the big bottle neck), communication versus computation and synchronization. When using core algorithms they obtained what they expected. After this review of his research, David explored the progress of high-end scientific computing.. Some exascale considerations include applications, architectures and algorithms
For the purposes of the presentation he focused on Algorithms. He argued that according to an algorithmicist, applications are given (as function of time), architectures are also a given (as function of time), yet algorithms and software must be adapted or created to bridge to hostile architectures for the sake of the complex applications (Important as Moore’s law moves from speed-based to concurrency-based, due to power considerations).
Tracking the third paradigm progress and looking at previous Gordon Bell Prize winners for “peak performance”, he showed how from 1998 to 2010, gflops grew from 1.0 during 1988 to 1020 during 1998. In a short amount of time there has been a trillion fold improvements and as such many recent reports ride the “bell curve” for simulation. Yet as such, how are problems like these solved at the petascale today? Currently science uses Newton (Nonlinear solver) -Krylov (accelerator) -Schur (preconditioner) -Schwarz (preconditioner) : a solver “workhorse”. Workhorse innards: Krylov-Schwarz. Yet, what will first “general purpose” exaflop/s machines look like? His lab is currently thinking about 2018. There are many paths beyond today’s CMOS silicon-based logic. Earliest and most significant post-CMOS device improvement may be carbon nanotube memory, but not in 10 years.
Dr. Keyes sees two paths from peta- to exa-. One is the IBM: BlueGene successor and the other one is the GigaHz KiloCore MegaNoe system. The need for improvement is necessary. Currently, one of these computers could use the power of a town of 14,000 at an OECD country. Yet there are various hard to solve hurdles. Hurdle #1 – memory bandwidth eats up the entire power budget. Hurdle #2 – memory capacity eats up the entire fiscal budget. Hurdle #3 – power requires slower clocks and greater concurrency. As such the need for a draconian reduction required in power per flop and per byte will make computing and copying data less reliable. Despite the negative aspects of this, it is a necessary engineering tradeoff. The present consensus path to exascale is thousand-fold many core, Processors are cheap
Evolution of parallel programming models” strong scaling within a node through the use of shared memory or distributed memory. There is a need for hybrid programming model for a full application: petascale bio-electro-magnetics. Current hybrid programming models not enough. There must be an evolution of parallel programming models: breaking the synchrony stronghold. While there is a history of asynchronous methods, they have not been prevalent since we know we can make different things additive without destroying the properties of the algorithms. However, hybrid programming models are once again a hot topic and they will be the topic of discussion at the next ICERM -2011 conference which will focus on asynchronous programming.
As we move from Peta to exa for algorithms. There are things we should to do for exascale will help us at petascale and terascale, this include, reducing memory requirements and memory traffic, exploiting hybrid and less synchronous parallel programming models, and co-design of hardware and software (for, e.g., power management) Some major challenges that stay the same in peta to exa such as poor scaling of collectives due to internode latency and poor performance of SpMV due to shared intranode bandwidth.
When discussing the path for scaling up applications, he argued that weak scale applications up to distributed memory limits while strong scale applications need to go beyond this. The workflow must be scaled and a co-design process should be staged. Currently, few high end computers come with required software enabling technologies. As such, having overviewed the current state of affairs, Dr. Keyes argued that the current situation is dire but is not much different than it was after sputnik was launched as Kennedy challenged the nation before in 1962 and US scientist had a deadline of seven years, they once again have this deadline. Will we be up to the challenge? I certainly hope so!