# R, HDF5 Data and Lightning

I used to spend an inordinate amount of time digging through lightning data. These data came from a number of sources, the World Wide Lightning Location Network (WWLLN) and LIS/OTD being the most common. I recently needed to work with some Hierarchical Data Format (HDF) data. HDF is something of a niche format and, since that was the format used for the LIS/OTD data, I went to review those old scripts. It was very pleasant rediscovering work I did some time ago.

## LIS and OTD

The Optical Transient Detector (OTD) and Lightning Imaging Sensor (LIS) were instruments for detecting lightning discharges from Low Earth Orbit. OTD was launched in 1995 on the MicroLab-1 satellite into a near polar orbit with inclination 70°. OTD achieved global (spatial) coverage for the period May 1995 to April 2000 with roughly 60% uptime. LIS was an instrument on the TRMM satellite, launched into a 35° inclination orbit during 1997. Data from LIS were thus confined to more tropical latitudes. The TRMM mission only ended in April 2015.

The seminal work using data from OTD, Global frequency and distribution of lightning as observed from space by the Optical Transient Detector, was published by Hugh Christian and his collaborators in 2003. It’s open access and well worth a read if you are interested in where and when lightning happens across the Earth’s surface.

## Preprocessing HDF4 to HDF5

The LIS/OTD data are available as HDF4 files from https://lightning.nsstc.nasa.gov/data/. To load them into R I first converted to HDF5 using a tool from the h5utils suite:

h5fromh4 -d lrfc LISOTD\_LRFC\_V2.3.2014.hdf


Support for HDF5 in R appears to have evolved appreciably in recent years. I originally used the hdf5 package. Then some time later transitioned to the h5r package. Neither of these appear on CRAN at present. Current support for HDF5 is via the h5 package. This package depends on the h5c++ library, which I needed to grab.

sudo apt-get install libhdf5-dev


Then, back in R I installed and loaded the h5 package.

> install.packages("h5")
> library(h5)


## Low Resolution Full Climatology

The next step was to interrogate the contents of one of the HDF files. A given file may contain multiple data sets (this is part of the “hierarchical” nature of HDF), so we’ll check on what data sets are packed into one of those files. Let’s look at the Low Resolution Full Climatology (LRFC) data.

> file = h5file(name = "data/LISOTD\_LRFC\_V2.3.2014.h5", mode = "r")
> dataset = list.datasets(file)
> cat("datasets:", dataset, "\n")
datasets: /lrfc


Just a single data set, but that’s by design: we only extracted one using h5fromh4 above. What are the characteristics of that data set?

> print(file[dataset])
DataSet 'lrfc' (72 x 144)
type: numeric
chunksize: NA
maxdim: 72 x 144


It contains numerical data and has dimensions 72 by 144, which means that it has been projected onto a latitude/longitude grid with 2.5° resolution. We’ll just go ahead and read in those data.

> lrfc = readDataSet(file[dataset])
> class(lrfc)
[1] "matrix"
> dim(lrfc)
[1] 72 144


That wasn’t too hard. And it’s not much more complicated if there are multiple data sets per file.

Below is a ggplot showing the annualised distribution of lightning across the Earth’s surface. It’s apparent that most lightning occurs over land in the tropics, with the highest concentration in Central Africa. The units of the colour scale are flashes per square km per year. Higher resolution data can be found the HRFC file (High Resolution Full Climatology), but the LRFC is quite sufficient to get a flavour of the data.

## Low Resolution Annual Climatology

The Low Resolution Annual Climatology (LRAC) data have the same spatial resolution as LRFC but the data are broken down by day of year. This allows us to see how lightning activity varies at a particular location during the course of a year.

> file = h5file(name = "data/LISOTD\_LRFC\_V2.3.2014.h5", mode = "r")
> dataset = list.datasets(file)