Topographic Maps for South Africa

Using MrSID with GDAL

I’ve been adding topological maps of South Africa to the saffer-data-map repository. The maps are originally in MrSID format (.sid files), which is a proprietary file format developed by LizardTech (a company which has been consumed by Extensis).

Although MrSID is efficient from a storage perspective, it’s challenging to work with. Especially on a Linux machine (the support has always been focused on Windows).

I wanted to translate the .sid files into GeoTIFF. And I wanted to do this on Linux.

Docker to the Rescue 

I spent some time researching the problem and learned that there was a MrSID plugin for GDAL. But installation seemed to be tricky. Fortunately I stumbled across a GDAL Docker image with integrated support for MrSID developed by Klokan Technologies.

Pull the image.

docker pull klokantech/gdal

Check that MrSID plugin is available.

docker run klokantech/gdal gdalinfo --formats | grep MrSID

The expected output looks like this:

MrSID -raster- (rov): Multi-resolution Seamless Image Database (MrSID)
MG4Lidar -raster- (ro): MrSID Generation 4 / Lidar (.sid)

Yes! MrSID is supported. :)

Using GDAL to Extract Information from Raster Data Files 

The gdalinfo utility can be used to extract general informtion from spatial raster data files. Let’s see how to do that with the Docker image.

docker run -ti -v $(pwd):/data klokantech/gdal gdalinfo WGS2929BC.sid
Driver: MrSID/Multi-resolution Seamless Image Database (MrSID)
Files: WGS2929BC.sid
WGS2929BC.sid.aux.xml
Size is 6922, 6921
Coordinate System is `'
Origin = (26.749632892002410,-26.750512179722136)
Pixel Size = (0.000036126395571,-0.000036126395571)
Metadata:
IMAGE__COMPRESSION_BLOCK_SIZE=512
IMAGE__COMPRESSION_GAMMA=2.000000
IMAGE__COMPRESSION_NLEV=8
IMAGE__COMPRESSION_VERSION=1,6,1
IMAGE__COMPRESSION_WEIGHT=4.000000
IMAGE__CREATION_DATE=Thu Jul 24 08:20:08 2003

IMAGE__INPUT_FILE_SIZE=12033322.000000
IMAGE__INPUT_NAME=D:\Free State\5\WGS2929BC.TIF
IMAGE__TARGET_COMPRESSION_RATIO=29.999962
IMAGE__Z_ORIGIN=0.000000
IMAGE__Z_RESOLUTION=0.000000
VERSION=MG2
Image Structure Metadata:
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (  26.7496329, -26.7505122) 
Lower Left  (  26.7496329, -27.0005430) 
Upper Right (  26.9996998, -26.7505122) 
Lower Right (  26.9996998, -27.0005430) 
Center      (  26.8746663, -26.8755276) 
Band 1 Block=1024x128 Type=Byte, ColorInterp=Red
Min=145.000 Max=255.000 
Minimum=145.000, Maximum=255.000, Mean=235.534, StdDev=13.500
Overviews: 3461x3461, 1731x1731, 866x866, 433x433, 217x217, 109x109, 55x55, 28x28
Metadata:
STATISTICS_APPROXIMATE=YES
STATISTICS_MAXIMUM=255
STATISTICS_MEAN=235.53421487603
STATISTICS_MINIMUM=145
STATISTICS_STDDEV=13.499711766257
STATISTICS_VALID_PERCENT=100
Band 2 Block=1024x128 Type=Byte, ColorInterp=Green
Min=149.000 Max=255.000 
Minimum=149.000, Maximum=255.000, Mean=238.647, StdDev=12.531
Overviews: 3461x3461, 1731x1731, 866x866, 433x433, 217x217, 109x109, 55x55, 28x28
Metadata:
STATISTICS_APPROXIMATE=YES
STATISTICS_MAXIMUM=255
STATISTICS_MEAN=238.64727272727
STATISTICS_MINIMUM=149
STATISTICS_STDDEV=12.531463080668
STATISTICS_VALID_PERCENT=100
Band 3 Block=1024x128 Type=Byte, ColorInterp=Blue
Min=152.000 Max=255.000 
Minimum=152.000, Maximum=255.000, Mean=237.746, StdDev=12.490
Overviews: 3461x3461, 1731x1731, 866x866, 433x433, 217x217, 109x109, 55x55, 28x28
Metadata:
STATISTICS_APPROXIMATE=YES
STATISTICS_MAXIMUM=255
STATISTICS_MEAN=237.74611570248
STATISTICS_MINIMUM=152
STATISTICS_STDDEV=12.490247658278
STATISTICS_VALID_PERCENT=100

Cool. Very informative. However, that lengthy command with all of the Docker options is a little clunky. To make it more efficient I’ll create a BASH alias.

alias gdalinfo="docker run -ti -v $(pwd):/data klokantech/gdal gdalinfo"

Now I just need to do this:

gdalinfo WGS2929BC.sid

Something to note from the gdalinfo output above is that the .sid file does not specify a coordinate system. It appears that these files are simply projected onto a longitude/latitude. We’ll want to be specific about that when we translate them.

Translating MrSID with GDAL 

To transtale .sid files to GeoTIFF I’ll use the gdal_translate utility. An alias will also make this easier.

alias gdal_translate="docker run -ti -v $(pwd):/data klokantech/gdal gdal_translate"

Now translation is as simple as this:

gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif

The -of GTiff flag just makes it explicit that the output format is GeoTIFF.

We can take this up a notch and specify the correct coordinate system, EPSG:4326 (which is equivalent to WGS84).

gdal_translate -of GTiff WGS2929BC.sid -a_srs EPSG:4326 WGS2929BC.tif

The size of the resulting .tif file is pretty big (around 140 Mb), whereas the .sid file is relatively miniscule (only around 4 Mb). It’d be great to make the .tif a little smaller.

Let’s explore a few compression options.

Lossless Compression 

First let’s try out some lossless compression options. We use the -co (creation options) flag to specify the compression algorithm.

# Compression: LZW
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=lzw
# Compression: DEFLATE
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=deflate
# Compression: PACKBITS
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=packbits
# Compression: LZMA
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=lzma

Here are the results:

  • LZW — 86 Mb
  • DEFLATE — 78 Mb
  • PACKBITS — 118 Mb and
  • LZMA — 71 Mb (compression is seriously slow!).

Looks like DEFLATE might be the best option for lossless compression (factoring in compression ratio and speed).

Lossy Compression 

What about some lossy compression options?

# Compression: JPEG
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=jpeg
# Compression: LERC
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=lerc

Here are the results:

  • JPEG — 22 Mb and
  • LERC — 78 Mb.

The compression ratio with JPEG is by far the winner. We’ll stick with that and see what the image quality is like.

Colourspace 

We also have some freedom in choosing the colourspace.

# Colourspace: YCbCr (with JPEG compression)
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co photometric=ycbcr -co compress=jpeg

Here are the results:

  • YCbCr (with JPEG compression) — 8 Mb.

Wow! Okay, that seriously shrinks the size of the file!

Tiling 

Tiling can also be used to generate a more efficient .tif file.

# Tiling
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co tiled=yes

Tiling does produce a slightly larger file, but this file is, in principle, much more efficient to query.

Using the Results in R 

My ultimate goal is to have data which I can easily import into R. The translated GeoTIFF files are perfect for this.

Setup location of the GeoTIFF file.

FILEPATH <- "WGS2929BC.tif"

Using {rgdal} 

Now load up the {rgdal} package.

library(rgdal)

Use GDALinfo() to extract metadata from the .tif file (compare this with the metadata extracted from the corresponding .sid file above).

GDALinfo(FILEPATH)
rows        6903 
columns     6904 
bands       3 
lower left origin.x        29.49971 
lower left origin.y        -29.50041 
res.x       3.622124e-05 
res.y       3.622124e-05 
ysign       -1 
oblique.x   0 
oblique.y   0 
driver      GTiff 
projection  +proj=longlat +datum=WGS84 +no_defs 
file        WGS2929BC.tif 
apparent band summary:
  GDType hasNoDataValue NoDataValue blockSize1 blockSize2
1   Byte          FALSE           0        256        256
2   Byte          FALSE           0        256        256
3   Byte          FALSE           0        256        256
apparent band statistics:
  Bmin Bmax Bmean Bsd
1    0  255    NA  NA
2    0  255    NA  NA
3    0  255    NA  NA
Metadata:
AREA_OR_POINT=Area 
IMAGE__COMPRESSION_BLOCK_SIZE=512 
IMAGE__COMPRESSION_GAMMA=2.000000 
IMAGE__COMPRESSION_NLEV=8 
IMAGE__COMPRESSION_VERSION=1,6,1 
IMAGE__COMPRESSION_WEIGHT=4.000000 
IMAGE__CREATION_DATE=Thu Jul 24 15:01:46 2003
 
IMAGE__INPUT_FILE_SIZE=17737854.000000 
IMAGE__INPUT_NAME=D:\KZN\3\WGS2929BC.TIF 
IMAGE__TARGET_COMPRESSION_RATIO=29.999962 
IMAGE__Z_ORIGIN=0.000000 
IMAGE__Z_RESOLUTION=0.000000 
VERSION=MG2 

Note that

  • the projection is WGS84 as specified when we translated from .sid format and
  • there are three colour bands.

Using {raster} 

We can also use the {raster} package to read the file.

library(raster)

The raster() function will read in just a single colour band.

grey <- raster(FILEPATH)

What’s the Coordinate Reference System (CRS)?

crs(grey)
CRS arguments: +proj=longlat +datum=WGS84 +no_defs 

Check whether the coordinate system is longitude/latitude.

isLonLat(grey)
[1] TRUE

Confirmed.

What is the extent of the data?

grey@extent
class      : Extent 
xmin       : 29.49971 
xmax       : 29.74978 
ymin       : -29.50041 
ymax       : -29.25037 

This is the range of the data in longitude and latitude.

How many layers are there?

nlayers(grey)
[1] 1

Well, that makes sense since we only loaded a single colour band. Let’s now use the brick() function to load all of the colour bands.

rgb <- brick(FILEPATH)

How many layers?

nlayers(rgb)
[1] 3

Excellent.

Take a look at the results.

plotRGB(rgb)

Looks good. But let’s zoom in to see the details.

plotRGB(
  rgb,
  interpolate = TRUE,
  ext = extent(29.65, 29.70, -29.40, -29.35)
)

Very nice indeed. One of my favourite parts of the Drakensberg. And the image quality does not seem to be compromised by the aggressive compression (these images are a little “fuzzy” to start with!).

Conclusion 

You can access the original MrSID and processed GeoTIFF files in the saffer-data-map repository. Have fun.