Supercomputing Frontiers 2017 in Singapore

Supercomputing Frontiers 2017

China’s Exascale ambitions, and planned ascendance in HPC were front-and-center at SCF in Singapore the week of March 13th.

A few years ago, I was part of a research group that was asked “how far can a nation state get in computing if they simply ignored the rules that we adhere to in the US computing industry?”.  China’s #1 place on Top500 in 2013 has been seen as a “stunt” by some. The US supercomputer industry doesn’t have much to prove by “winning” the Top500, but China does.  It demonstrates that they possess a computing capability and industry, even if the machine isn’t well architected.  After being in Singapore, I’m convinced that assessment is wrong.  In our study, we found that ignoring some of the constraints we have to adhere to in the USA (cooling, power, commercial viability, etc.) could produce a huge strategic advantage.  I believe that this is precisely what China’s doing and that given that they can bring the resources of a nation state to the endeavor they will be successful.  I additionally believe that they’ve got potential disruptive commercial viability by “breaking the rules”.

The machines are well architected and unconstrained by the commercial requirements that US industry would use to provide a lower-risk but incremental product.  China can “throw out” the unnecessary, build big chips, be aggressive on cooling, and optimize for the problem of building a supercomputer.  This could enable higher performance on a less aggressive process node, and address the fundamental energy problem that constrains all supercomputers.  They can optimize for memory (which is critically important for performance and energy), and they can do so without worrying about cost the same way we would.

China’s architectures  can potentially address the energy of data movement, which is the real problem in future systems.  US approaches appear to focus intensely on backwards compatibility and FLOPS.  While maintaining the software base is important, the danger lies in falling behind on capability.  Information Technology is strategically critical for both national security and economic competitiveness.  In fact, its been a fundamental differentiating factor since World War II.

China’s Potential Ascendancy

China’s currently #1 on the Top500 with TaihuLight, and while people argue the relevance of Top500‘s HPL benchmark, it demonstrates a significant capability.  The same machine is #2 on the Graph500 with RIKEN’s K-computer occupying the top spot.  In addition to NUDT’s TianheLight, China’s got additional Top500 entries and 4 potential paths to Exascale with mature software stacks.

Shenwei’s SW26010 is a derivative of Wuxi NSCC TaihuLight is targeted at 125 PF/s and is claimed to have be a true “#1” system with better performance per watt and a highly integrated design.

The four domestic programs include:

  • Shenwei’s Alpha inspired architecture, which seems to be in the lead.  It clearly has architectural advantage if they’ve done what’s claimed and reduced overheads significantly.  The prototype began development 4 months ago and is scheduled to deliver in early 2020.  This is moving to a highly-optimized SIMD design with direct core-to-core communication, possibly abandoning unnecessary requirements for cache coherency.
  • Phytium’s ARM-based architecture, likely to deliver at the end of 2020.  This includes a 64-core heavyweight ARM architecture with external memory controllers that can be either FPGA or ASIC based.  This would enable significant memory innovation compared to the approach in the US.  They’ll move towards 256 cores and 2 TF/s/socket, with SIMD extensions pushing 20 TF/s.  With an innovative memory system to feed it and a standardized ARM software stack, this could be a very powerful machine.
  • Thatic-Sugon, which is AMD Zen-based on the x86 license targeted at the end of 2021.  While this is likely the last machine to emerge, it includes aggressive technologies like a 6D torus and phase-change cooling.  Our study group highlighted the disruptive nature of cooling.
  • Finally, a Power 8/9 architecture based on the IBM license, which appears to be the furthest in the future.  This could create a significant opportunity to differentiate.

There’s significant innovation in the network, including global memory approaches similar to the innovative Extoll network.  The focus on the memory system and global interconnect addresses the real problem of data movement energy — a more data movement capable system is generally a more capable computer, except for a small number of niche problems.

China’s approach has already born fruit.  Given that current designs are “no better than 28nm”, we can already see that improved architecture matters.  From a technology perspective, this is not surprising.  The end of Dennard Scaling (around 2003) impacts raw chip performance significantly — you can’t switch smaller devices faster and get the heat out, without exotic cooling.  Changes of Moore’s Law will begin to impact density the way the semiconductor industry achieves higher density (more functional units or more memory per unit area).  Ultimately, the opportunity is in building better computers not relying on “free” performance gains from improved devices.  That train left the station long ago.  China may be the first country to truly adapt their computing strategy to the fundamental changes in technology we are all living.

There were several additional examples of architectural innovation, including:

  • Simplifying the Virtual Memory system and eliminating “unneeded” features to save energy.  Being unconstrained by backwards compatibility for legacy binaries helps significantly.
  • Creative cache architectures, with indications that the last level cache is using long cache lines to reduce the energy of column access (e.g., CAS) to DRAM.

In addition to the efforts described above, China has access to MIPS (via Loongson) and SPARC (via Feiteng/Phytium).   Huawei, Shenzhen, and QUALCOMM provide access to multiple additional ARM options.

Overall, China’s effort is at likely 2 years ahead of the US and Japan in reaching Exascale.  More important than timing, they may build a more “capable” Exascale platform rather than a stunt machine.

Conclusions

The supercomputing community in Singapore is also vibrant, innovative, and capable.  Given China’s efforts, I’m left to wonder if they’ll focus on a differentiated platform, more capable in something like analytics rather than traditional “exascale”, which seems to be “exaflops”.

Its hard to draw strong conclusions other than that the US should focus on capability rather than FLOPS.  The technology and applications have changed, and there’s real risk that a combination of hubris and the constraints of decades-old computing strategies will leave us not only behind on the Top500 list but behind in fundamental computing capabilities.

A disruptive approach would be both cheaper and higher payoff.  The incremental will happen on its own.