Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0 O Fuhrer, T Chadha, T Hoefler, G Kwasniewski, X Lapillonne, D Leutwyler, ... Geoscientific Model Development 11 (4), 1665-1681, 2018 | 108 | 2018 |

Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication G Kwasniewski, M Kabić, M Besta, J VandeVondele, R Solcà, T Hoefler Proceedings of the International Conference for High Performance Computing …, 2019 | 69 | 2019 |

Sebs: A serverless benchmark suite for function-as-a-service computing M Copik, G Kwasniewski, M Besta, M Podstawski, T Hoefler Proceedings of the 22nd International Middleware Conference, 64-78, 2021 | 58 | 2021 |

Sisa: Set-centric instruction set architecture for graph mining on processing-in-memory systems M Besta, R Kanakagiri, G Kwasniewski, R Ausavarungnirun, J Beránek, ... MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture …, 2021 | 51 | 2021 |

Flexible communication avoiding matrix multiplication on FPGA with high-level synthesis J de Fine Licht, G Kwasniewski, T Hoefler Proceedings of the 2020 ACM/SIGDA International Symposium on Field …, 2020 | 42 | 2020 |

Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0, Geosci. Model Dev., 11, 1665–1681 O Fuhrer, T Chadha, T Hoefler, G Kwasniewski, X Lapillonne, D Leutwyler, ... gmd-11-1665-2018, 2018 | 32 | 2018 |

Using compiler techniques to improve automatic performance modeling A Bhattacharyya, G Kwasniewski, T Hoefler 2015 International Conference on Parallel Architecture and Compilation (PACT …, 2015 | 32 | 2015 |

A PCIe congestion-aware performance model for densely populated accelerator servers M Martinasso, G Kwasniewski, SR Alam, TC Schulthess, T Hoefler SC'16: Proceedings of the International Conference for High Performance …, 2016 | 26 | 2016 |

Extreme scale plasma turbulence simulations on top supercomputers worldwide W Tang, B Wang, S Ethier, G Kwasniewski, T Hoefler, KZ Ibrahim, ... SC'16: Proceedings of the International Conference for High Performance …, 2016 | 15 | 2016 |

On the parallel i/o optimality of linear algebra kernels: Near-optimal matrix factorizations G Kwasniewski, M Kabic, T Ben-Nun, AN Ziogas, JE Saethre, A Gaillard, ... Proceedings of the International Conference for High Performance Computing …, 2021 | 14 | 2021 |

Graphminesuite: Enabling high-performance and programmable graph mining algorithms with set algebra M Besta, Z Vonarburg-Shmaria, Y Schaffner, L Schwarz, G Kwasniewski, ... arXiv preprint arXiv:2103.03653, 2021 | 14 | 2021 |

Automatic complexity analysis of explicitly parallel programs T Hoefler, G Kwasniewski Proceedings of the 26th ACM symposium on Parallelism in algorithms and …, 2014 | 10 | 2014 |

Motif prediction with graph neural networks M Besta, R Grob, C Miglioli, N Bernold, G Kwasniewski, G Gjini, ... Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and …, 2022 | 9 | 2022 |

On the parallel i/o optimality of linear algebra kernels: near-optimal lu factorization G Kwasniewski, T Ben-Nun, AN Ziogas, T Schneider, M Besta, T Hoefler Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of …, 2021 | 7 | 2021 |

Automatic performance modeling of hpc applications F Wolf, C Bischof, A Calotoiu, T Hoefler, C Iwainsky, G Kwasniewski, ... Software for Exascale Computing-SPPEXA 2013-2015, 445-465, 2016 | 6 | 2016 |

Pebbles, graphs, and a pinch of combinatorics: Towards tight i/o lower bounds for statically analyzable programs G Kwasniewski, T Ben-Nun, L Gianinazzi, A Calotoiu, T Schneider, ... Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and …, 2021 | 5 | 2021 |

A scalable weakly-synchronous algorithm for solving partial differential equations K Aditya, T Gysi, G Kwasniewski, T Hoefler, DA Donzis, JH Chen arXiv preprint arXiv:1911.05769, 2019 | 3 | 2019 |

ProbGraph: high-performance and high-accuracy graph mining with probabilistic set representations M Besta, C Miglioli, PS Labini, J Tětek, P Iff, R Kanakagiri, S Ashkboos, ... arXiv preprint arXiv:2208.11469, 2022 | 2 | 2022 |

Lifting C semantics for dataflow optimization A Calotoiu, T Ben-Nun, G Kwasniewski, J de Fine Licht, T Schneider, ... Proceedings of the 36th ACM International Conference on Supercomputing, 1-13, 2022 | 1 | 2022 |

Deinsum: Practically I/O optimal multilinear algebra AN Ziogas, G Kwasniewski, T Ben-Nun, T Schneider, T Hoefler arXiv preprint arXiv:2206.08301, 2022 | 1 | 2022 |