Skip to main content
eScholarship
Open Access Publications from the University of California
Cover page of Helping Faculty Teach Software Performance Engineering

Helping Faculty Teach Software Performance Engineering

(2024)

Over the academic year 2022–23, we discussed the teaching of software performance engineering with more than a dozen faculty across North America and beyond. Our outreach was centered on research-focused faculty with an existing interest in this course material. These discussions revealed an enthusiasm for making software performance engineering a more prominent part of a curriculum for computer scientists and engineers. Here, we discuss how MIT’s longstanding efforts in this area may serve as a launching point for community development of a software performance engineering curriculum, challenges in and solutions for providing the necessary infrastructure to universities, and future directions.

Cover page of GASNet-EX Specification Collection, Revision 2024.5.0

GASNet-EX Specification Collection, Revision 2024.5.0

(2024)

GASNet-EX is a portable, open-source, high-performance communication library designed to efficiently support the networking requirements of PGAS runtime systems and other alternative models in emerging exascale systems. It provides network-independent, high-performance communication primitives including Remote Memory Access (RMA) and Active Messages (AM). GASNet-EX is an evolution of the popular GASNet communication system, building upon over 20 years of lessons learned, and the primary goals are high performance, interface portability, and expressiveness. The library has been used to implement parallel programming models and libraries such as UPC, UPC++, Fortran coarrays, Legion, Chapel, and many others. This anthology collects together the four separate volumes that currently comprise the GASNet-EX specification, as of the 2024.5.0 release of GASNet-EX.

Cover page of Sparse-Stochastic Fragmented Exchange for Large-Scale Hybrid Time-Dependent Density Functional Theory Calculations

Sparse-Stochastic Fragmented Exchange for Large-Scale Hybrid Time-Dependent Density Functional Theory Calculations

(2024)

We extend our recently developed sparse-stochastic fragmented exchange formalism for ground-state near-gap hybrid DFT to calculate absorption spectra within linear-response time-dependent generalized Kohn-Sham DFT (LR-GKS-TDDFT) for systems consisting of thousands of valence electrons within a grid-based/plane-wave representation. A mixed deterministic/fragmented-stochastic compression of the exchange kernel, here using long-range explicit exchange functionals, provides an efficient method for accurate optical spectra. Both real-time propagation as well as frequency-resolved Casida-equation-type approaches for spectra are presented, and the method is applied to large molecular dyes.

Cover page of Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.3

Parallel Runtime Interface for Fortran (PRIF) Specification, Revision 0.3

(2024)

This document specifies an interface to support the parallel features of Fortran, named the Parallel Runtime Interface for Fortran (PRIF). PRIF is a proposed solution in which the runtime library is responsible for coarray allocation, deallocation and accesses, image synchronization, atomic operations, events, and teams. In this interface, the compiler is responsible for transforming the invocation of Fortran-level parallel features into procedure calls to the necessary PRIF procedures. The interface is designed for portability across shared- and distributed-memory machines, different operating systems, and multiple architectures. Implementations of this interface are intended as an augmentation for the compiler's own runtime library. With an implementation-agnostic interface, alternative parallel runtime libraries may be developed that support the same interface. One benefit of this approach is the ability to vary the communication substrate. A central aim of this document is to define a parallel runtime interface in standard Fortran syntax, which enables us to leverage Fortran to succinctly express various properties of the procedure interfaces, including argument attributes.

AutoCT: Automated CT registration, segmentation, and quantification

(2024)

The processing and analysis of computed tomography (CT) imaging is important for both basic scientific development and clinical applications. In AutoCT, we provide a comprehensive pipeline that integrates an end-to-end automatic preprocessing, registration, segmentation, and quantitative analysis of 3D CT scans. The engineered pipeline enables atlas-based CT segmentation and quantification leveraging diffeomorphic transformations through efficient forward and inverse mappings. The extracted localized features from the deformation field allow for downstream statistical learning that may facilitate medical diagnostics. On a lightweight and portable software platform, AutoCT provides a new toolkit for the CT imaging community to underpin the deployment of artificial intelligence-driven applications.

A substitutional quantum defect in WS2 discovered by high-throughput computational screening and fabricated by site-selective STM manipulation

(2024)

Point defects in two-dimensional materials are of key interest for quantum information science. However, the parameter space of possible defects is immense, making the identification of high-performance quantum defects very challenging. Here, we perform high-throughput (HT) first-principles computational screening to search for promising quantum defects within WS2, which present localized levels in the band gap that can lead to bright optical transitions in the visible or telecom regime. Our computed database spans more than 700 charged defects formed through substitution on the tungsten or sulfur site. We found that sulfur substitutions enable the most promising quantum defects. We computationally identify the neutral cobalt substitution to sulfur (Co S0 ) and fabricate it with scanning tunneling microscopy (STM). The Co S0 electronic structure measured by STM agrees with first principles and showcases an attractive quantum defect. Our work shows how HT computational screening and nanoscale synthesis routes can be combined to design promising quantum defects.

A unifying perspective on non-stationary kernels for deeper Gaussian processes

(2024)

The Gaussian process (GP) is a popular statistical technique for stochastic function approximation and uncertainty quantification from data. GPs have been adopted into the realm of machine learning (ML) in the last two decades because of their superior prediction abilities, especially in data-sparse scenarios, and their inherent ability to provide robust uncertainty estimates. Even so, their performance highly depends on intricate customizations of the core methodology, which often leads to dissatisfaction among practitioners when standard setups and off-the-shelf software tools are being deployed. Arguably, the most important building block of a GP is the kernel function, which assumes the role of a covariance operator. Stationary kernels of the Matérn class are used in the vast majority of applied studies; poor prediction performance and unrealistic uncertainty quantification are often the consequences. Non-stationary kernels show improved performance but are rarely used due to their more complicated functional form and the associated effort and expertise needed to define and tune them optimally. In this perspective, we want to help ML practitioners make sense of some of the most common forms of non-stationarity for Gaussian processes. We show a variety of kernels in action using representative datasets, carefully study their properties, and compare their performances. Based on our findings, we propose a new kernel that combines some of the identified advantages of existing kernels.

Cover page of ExaWind: Open‐source CFD for hybrid‐RANS/LES geometry‐resolved wind turbine simulations in atmospheric flows

ExaWind: Open‐source CFD for hybrid‐RANS/LES geometry‐resolved wind turbine simulations in atmospheric flows

(2024)

Predictive high-fidelity modeling of wind turbines with computational fluid dynamics, wherein turbine geometry is resolved in an atmospheric boundary layer, is important to understanding complex flow accounting for design strategies and operational phenomena such as blade erosion, pitch-control, stall/vortex-induced vibrations, and aftermarket add-ons. The biggest challenge with high-fidelity modeling is the realization of numerical algorithms that can capture the relevant physics in detail through effective use of high-performance computing. For modern supercomputers, that means relying on GPUs for acceleration. In this paper, we present ExaWind, a GPU-enabled open-source incompressible-flow hybrid-computational fluid dynamics framework, comprising the near-body unstructured grid solver Nalu-Wind, and the off-body block-structured-grid solver AMR-Wind, which are coupled using the Topology Independent Overset Grid Assembler. Turbine simulations employ either a pure Reynolds-averaged Navier–Stokes turbulence model or hybrid turbulence modeling wherein Reynolds-averaged Navier–Stokes is used for near-body flow and large eddy simulation is used for off-body flow. Being two-way coupled through overset grids, the two solvers enable simulation of flows across a huge range of length scales, for example, 10 orders of magnitude going from O(μm) boundary layers along the blades to O(10 km) across a wind farm. In this paper, we describe the numerical algorithms for geometry-resolved turbine simulations in atmospheric boundary layers using ExaWind. We present verification studies using canonical flow problems. Validation studies are presented using megawatt-scale turbines established in literature. Additionally presented are demonstration simulations of a small wind farm under atmospheric inflow with different stability states.