JLD2 dataset format

JLD2 is a structured Julia data format comprising a subset of HDF5, without any dependency on the HDF5 C library.

JLD2 files have extension .jld2 and are binary files that can be read and written by Julia, which requires to load package JLD2.jl in the Julia environment.

Package JchemoData uses JLD2 format, as well as the examples of the help pages of package Jchemo and the project-environment JchemoDemo.

This note illustrates how to

  1. Load a JLD2 dataset already existing

  2. Build a JLD2 dataset from CSV files

  3. Save a JLD2 dataset to CSV files

For details on using dataframes and CSV files, see the documentation of packages

Packages required for the examples

using Jchemo, JchemoData    
using JLD2, CSV, DataFrames

1. Loading a JLD2 dataset already existing

The example below loads dataset tecator.jld2 stored in package JchemoData.

path_jdat = dirname(dirname(pathof(JchemoData)))   # automatically detect the path where Julia has installed package JchemoData
db = joinpath(path_jdat, "data/tecator.jld2")      # full path to the jld2 file; can be changed to any other existing .jld2 file 
## Same as:   db = string(path_jdat, "\\data\\tecator.jld2")
"C:\\Users\\lesnoff\\.julia\\packages\\JchemoData\\5mVcR\\data/tecator.jld2"
  • Situation 1

The name of the object contained in the .jl2 file is known. In this example, the name is dat

@load db dat   # same as:    dat = load(db, "dat") 
@names dat     # @names is a Jchemo function
(:X, :Y)
@head dat.X
... (178, 100)
3×100 DataFrame
Row8508528548568588608628648668688708728748768788808828848868888908928948968989009029049069089109129149169189209229249269289309329349369389409429449469489509529549569589609629649669689709729749769789809829849869889909929949969981000100210041006100810101012101410161018102010221024102610281030103210341036103810401042104410461048
Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64
12.617762.618142.618592.619122.619812.620712.621862.623342.625112.627222.629642.632452.635652.639332.643532.648252.65352.659372.665852.672812.680082.687332.694272.700732.706842.712812.719142.726282.734622.744162.754662.765682.776792.78792.799492.812252.827062.843562.861062.878572.894972.909242.920852.930152.938462.947712.960192.978313.003063.035063.074283.119633.168683.217713.262543.299883.328473.348993.363423.373793.381523.387413.391643.394183.39493.393663.390453.385413.378693.370413.360733.349793.337693.324433.310133.294873.278913.262323.245423.228283.21083.192873.174333.155033.134753.113393.091163.06853.045963.023933.002472.981452.960722.940132.919782.899662.879642.85962.83942.8192
22.834542.838712.842832.847052.851382.855872.86062.865662.870932.876612.882642.888982.895772.903082.910972.919532.928732.938632.949292.960722.972722.984932.99693.008333.01923.02993.041013.053453.067773.084163.102213.121063.139833.15813.176233.195193.215843.237473.258893.278353.293843.303623.306813.303933.2973.289253.284093.285053.293263.309233.332673.362513.396613.431883.464923.492953.514583.530043.540673.547973.553063.556753.559213.560453.560343.558763.555713.551323.545853.53953.532353.524423.515833.506683.4973.486833.476263.465523.455013.444813.434773.424653.414193.403033.390823.377313.362653.347453.332453.318183.304733.291863.279213.266553.253693.240453.226593.211813.1963.17942
32.582842.584582.586292.588082.589962.591922.594012.596272.598732.601312.604142.607142.610292.613612.617142.620892.624862.629092.633612.638352.64332.648382.653542.65872.663752.66882.673832.678922.684112.689372.69472.700122.705632.711412.717752.72492.733442.743272.754332.766422.779312.792722.806492.820642.835412.851212.868722.889052.912892.940882.973253.009463.04783.085543.119473.146963.166773.179383.186313.189243.18953.188013.184983.180393.174113.166113.156413.145123.132413.118433.103293.087143.070143.052373.033933.015042.995692.976122.956422.93662.916672.896552.876222.855632.834742.813612.792352.771132.750152.729562.709342.689512.670092.651122.632622.614612.597182.580342.564042.54816
@head dat.Y
... (178, 4)
3×4 DataFrame
Rowwaterfatproteintyp
Float64Float64Float64String
160.522.516.7train
246.040.113.5train
371.08.420.5train
  • Situation 2

The name of the object contained in the .jl2 file is unknown

res = load(db)  
keys(res)
KeySet for a Dict{String, Any} with 1 entry. Keys:
  "dat"
dat = res["dat"] 
@names dat
(:X, :Y)

2. Building a JLD2 dataset from CSV files

In this example, two CSV files are imported, transformed to dataframes and saved to a JLD2 file (.jld2)

db = joinpath(path_jdat, "data/cassav_X.csv")       # full path to the CSV file; can be changed to any other existing .CSV file 
X = CSV.read(db, DataFrame; header = 1, decimal = '.', delim = ';')  # same as below:
#X = CSV.File(db; header = 1, delim = ';') |> DataFrame 
#X = DataFrame(CSV.File(db, header = 1, delim = ';'))
@head X
... (280, 1050)
3×1050 DataFrame
950 columns omitted
Row400402404406408410412414416418420422424426428430432434436438440442444446448450452454456458460462464466468470472474476478480482484486488490492494496498500502504506508510512514516518520522524526528530532534536538540542544546548550552554556558560562564566568570572574576578580582584586588590592594596598
Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64
10.3999960.4065220.4130080.419580.4260730.432190.4380070.4439490.449950.4550810.459340.4635350.4676630.47110.4736880.4758090.4775850.4791750.4806020.4817080.4826130.4834570.4842220.4848730.4853590.4857290.4861140.4864690.4864780.485990.4852580.4841750.4825940.4810780.480040.4789110.4773380.4757690.4743130.4724190.4702430.4682270.4657270.4617790.4562780.449930.4424130.4330930.4225630.4112680.3994690.3871540.3741090.3603910.3462330.3318320.3171790.3026230.2883420.2740520.2603250.2479760.2368270.2263260.2168170.2086730.2016580.1953950.1898230.1850990.1810550.1775440.1744850.1717480.1693480.1673290.1654110.1631880.1610340.1596110.1584240.156920.1552450.1535790.1518260.1501280.1485040.146870.1454720.1443960.1431970.1418240.1406150.139590.1386680.1378750.1371490.1364320.1357210.134834
20.4608960.467060.4756770.4834380.4908090.498770.5064650.5132680.5194210.5251450.5307440.5361440.5409570.5453280.5493670.5527270.5553330.5577080.5601850.5623570.5638940.5649090.5652770.5651880.5650420.5648770.5643550.5636490.5632570.5630120.5623940.5613790.5603410.5593980.5580060.5557780.5531710.5508120.5486780.5466160.5445790.5421490.5384480.5338930.5290180.5234220.5166820.5091480.5010180.4914440.4805270.4690450.4568730.4439330.4304790.4167390.4024840.3875220.3728580.3595460.3472540.3349750.3227340.3114260.3011260.2916580.2829980.274830.2669870.2595670.252440.2453520.2383490.231530.2246450.2177890.2110070.2040990.1973680.1913490.1858420.1803740.174980.1699710.1654020.1610850.1569950.1532940.1501040.1474430.1450520.1428270.1407070.1386630.1367620.135060.1335520.1321910.1309640.129819
30.4647310.4714160.478280.487330.4971170.5030040.5055790.5093160.5146490.5193170.5231920.5273090.5314340.5347480.5370760.5390330.5408110.5418830.5421830.542320.5427440.5432930.5436750.543720.5435890.5438680.5444870.5446480.5442390.5436420.5429870.5421220.5410040.5398530.5389040.5378620.5361590.5340210.5321380.5300540.5273270.5244620.5213290.517030.5112950.5045190.4965510.4872210.4768650.4649690.4517860.4382250.4246160.410580.3958090.3805950.3652250.3497050.3342530.3193780.3052040.2912550.2776190.2652140.254450.2449070.2362550.2284980.2217740.2160810.2109530.2062540.2022260.1987690.1955770.1925920.1899740.1876270.1853560.1833140.1814770.1792740.1765020.1740010.1724650.1710990.1692750.1672750.1655310.1640220.1625290.1611490.1600610.1591140.1579310.1565770.1553410.1542780.1532730.152306
db = joinpath(path_jdat, "data/cassav_Y.csv")      # full path to the CSV file; can be changed to any other existing .CSV file 
Y = CSV.read(db, DataFrame; header = 1, decimal = '.', delim = ';')
@head Y
... (280, 2)
3×2 DataFrame
Rowyeartbc
Int64Float64
120091.58068
220097.85516
320091.77595
dat = (X = X, Y)  # create a tuple with the dataframes
@names dat
(:X, :Y)

Saving dataset my_cassav

path_out = tempdir()   # path receiving the result file; can be changed to any other existing path
db_out = joinpath(path_out, "my_cassav.jld2")  
@save db_out dat       # same as below:
#jldsave(db_out; dat)  
#jldsave(db_out, true; dat)  # 'true" ==> compression

3. Building CSV files from a JLD2 dataset

db = joinpath(path_jdat, "data/tecator.jld2")  # full path to the jld2 file; can be changed to any other existing .jld2 file 
@load db dat
@names dat
(:X, :Y)

Saving datasets X and Y

path_out = tempdir()   # path that will receive the result file; can be changed to any other existing path
db_out = joinpath(path_out, "X.csv")
CSV.write(db_out, dat.X; delim = ";") # same as:  :dat.X |> CSV.write(db_out; delim = ";")
"C:\\Users\\lesnoff\\AppData\\Local\\Temp\\X.csv"
db_out = joinpath(path_out, "Y.csv")
CSV.write(db_out, dat.X; delim = ";")
"C:\\Users\\lesnoff\\AppData\\Local\\Temp\\Y.csv"