The next generation of super computing platforms will deliver unprecedented performance for the benefit of all the scientific disciplines involving huge amount of computations. However, increasing performance has traditionally been accompanied by a corresponding increase in power consumption. We focus on automatic hardware acceleration of C code with an explicit focus on power/performance metric, for the next generation of supercomputing platforms.
Scaling traditional supercomputing platforms is not feasible anymore due to power consumption concerns - forecasts predict that a dedicate nuclear plant might be required to operate next gen HPC clusters. In the race to power efficiency, we focus on Field Programmable Gate Arrays (FPGAs) to accelerate the "hot" portion of your application, in the power efficient way ingrained into FPGAs.
Most HPC applications involve large amounts of computations with very regular (or regularizable) memory access patterns. This implies that there is plenty of exploitable data-level parallelism which allows the application to to scale to thousands of nodes. Our approach allows to explicitly extract this parallelism from source code and distribute it among a complex FPGA-based HPC infrastructure.
We believe that most computational kernels can be programmed without exotic language features. Moreover, we understand that many programmers - and in particular those in the HPC field - are not very familiar with hardware description languages (HDLs), nor with explicit description of computer architectures. That's why we designed exaFPGA to support an expressive subset of C, the language of choice of many HPC programmers.
Are you a fan of modern compilation suites like LLVM? Would you like to explore the field of automatic parallelization, a hot topic in an ever more multi-core and heterogenous world? Would you like to explore new and powerful models of computation, like Polyhedral Process Networks? Please have a look at the Programming Languages and Compiler section!
How do we connect various hardware components on FPGA? Where are we introducing potential bottlenecks? How do we manage memory hierarchies, if any? How do I efficiently link together multiple nodes? How much memory bandwidth do I need? Well, if this is the kind of problems you might like, have a look at the Computer Architectures section!
Now that you have your complex infrastructure, how do you link it together? How do you control data movements among the network of accelerators? How do you know when a new node is available? Can I share resources among different applications? Intrigued by these problems ;)? Have a look at the Operating System and Runtime Management section!