There are still a couple of open bugs regarding HDF5 performance and compression that may affect save. Some of these fixes include patches for older releases, and readers are advised to download and install the appropriate patches if you do not use the latest Matlab release (currently R2013a). Over the years, MathWorks has fixed several inefficiencies when reading HDF5 files ( ref1, ref2). Matlab's preferences for saving binary data This preference can be changed in Matlab’s Preferences/General window (or we could always specify the –v7/-v7.3 switch directly when using save): Perhaps for this reason the default preference is for save to use –v7, even on new releases that support –v7.3. This holds true for both pure-HDF files (saved via the hdf and hdf5 set of functions, for HDF4 and HDF5 formats respectively), and v7.3-format MAT files. For this reason, HDF5 files are typically larger and slower than non-HDF5 MAT files, especially if the data contains cell arrays or structs. Moreover, Matlab’s HDF5 implementation does not by default compress non-numeric data (struct and cell arrays). HDF5 uses a generic format to store data of any conceivable type, and has a non-significant storage overhead in order to describe the file’s contents. The following table summarizes the available options for saving data using the save function: save option Note that Matlab’s 7.3 format is not a pure HDF5 file, but rather a HDF5 variant that uses an undocumented internal format. Matlab 6 and earlier did not employ automatic data compression Matlab versions 7.0 (R14) through 7.2 (R2006a) use GZIP compression Matlab 7.3 (R2006b) and newer can use an HDF5-variant format, which apparently also uses GZIP (level-3) compression, although MathWorks might have done better to pay the license cost of employing SZIP ( thanks to Malcolm Lidierth for the clarification). This format is publicly available and adaptors are available for other programming languages ( C, C#, Java). MAT is Matlab’s default data format for the save function. Today’s article will show little-known tricks of improving save‘s performance. This data can later be loaded back into Matlab using the load function. Matlab’s built-in save function is an easy and very convenient way to store data in both binary and text formats. But we often don’t need or want to use such low-level functions. fwrite is normally used to store binary data in some custom pre-defined format. Two weeks ago I posted an article about improving fwrite‘s performance.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |