This working group meeting was held as part of the 2021 ECP Annual Meeting. Today, DOE computing centers track a wealth of information on the health, usage, and efficiency of our machines, workflows and programming environments. This information is needed by system operations, application developers, and user groups and is critical to understanding observed performance and to early diagnosis of both application and system issues. Would-be users of this data face significant challenges and efforts are currently fragmented across centers. Without a clear game-plan and identifiable outcome the cost of the infrastructure to collect, store, and analyze the volumes of data available is prohibitive. With many new potential sources of data in both the A21 exascale systems and CORAL2 exascale systems, we must identify solutions provided by the various ECP centers that can be used and gaps that need to be filled. Only then can we collectively and collaboratively address the existing challenges and emerging needs that will enable us to build solutions to efficiently and effectively support our upcoming exascale environments. We have assembled a working group to review current efforts across centers and discuss focus areas that show potential.
Ещё видео!