## README for MHD_ML Dataset MHD_2d.h5 contains "data" which is 256 x 256 x 6912 "labels" which is 6 x 6912 The data are 256 x 256 slices of periodic isothermal MHD simulations which are a single pixel thick from a 256^3 simulation cube. 6912 = 3 x 32 x 8 x 9 for 3 axes that can be sliced along, slices are spaced every 8 pixels along the slicing direction, there are 8 different classes described by the Ms and Ma values in labels, and 9 statistically independent time steps. The labels are the Ms (sonic Mach number), Ma (alphenic Mach number), time step, axis of slice, position along slicing axis, class label. The Ms and Ma describe two different parameters which are varied in the MHD simulations. For ease of non-domain experts, we assign a single class label, an integer 1 to 8 in order to make classification easier. Note that the dataset was created in Julia where the indexes start from 1. We recommend test-train splits be done on the time step index since each simulation box is statistically independent from the other time steps, but slices within a box may be correlated. MHD_2dcs.h5 contains "data" which is 256 x 256 x 6912 "labels" which is 6 x 6912 This dataset is derived from the MHD cubes above, but slices are taken from the cumulative sum of the density field along a given axis. This leads to far more correlation between slices and a far less uniform distribution of images which can make classification much harder. Any questions or difficulties should be addressed to andrew.saydjari@cfa.harvard.edu. The simulations sliced here were run by Blakesley Burkhart and their use should follow the reference guidelines at https://www.mhdturbulence.com/