I’ve been adding topological maps of South Africa to the saffer-data-map
repository. The maps are originally in MrSID format (.sid
files), which is a proprietary file format developed by LizardTech (a company which has been consumed by Extensis).
Although MrSID is efficient from a storage perspective, it’s challenging to work with. Especially on a Linux machine (the support has always been focused on Windows).
I wanted to translate the .sid
files into GeoTIFF. And I wanted to do this on Linux.
Docker to the Rescue
I spent some time researching the problem and learned that there was a MrSID plugin for GDAL. But installation seemed to be tricky. Fortunately I stumbled across a GDAL Docker image with integrated support for MrSID developed by Klokan Technologies.
Pull the image.
docker pull klokantech/gdal
Check that MrSID plugin is available.
docker run klokantech/gdal gdalinfo --formats | grep MrSID
The expected output looks like this:
MrSID -raster- (rov): Multi-resolution Seamless Image Database (MrSID)
MG4Lidar -raster- (ro): MrSID Generation 4 / Lidar (.sid)
Yes! MrSID is supported. :)
Using GDAL to Extract Information from Raster Data Files
The gdalinfo
utility can be used to extract general information from spatial raster data files. Let’s see how to do that with the Docker image.
docker run -ti -v $(pwd):/data klokantech/gdal gdalinfo WGS2929BC.sid
Driver: MrSID/Multi-resolution Seamless Image Database (MrSID)
Files: WGS2929BC.sid
WGS2929BC.sid.aux.xml
Size is 6922, 6921
Coordinate System is `'
Origin = (26.749632892002410,-26.750512179722136)
Pixel Size = (0.000036126395571,-0.000036126395571)
Metadata:
IMAGE__COMPRESSION_BLOCK_SIZE=512
IMAGE__COMPRESSION_GAMMA=2.000000
IMAGE__COMPRESSION_NLEV=8
IMAGE__COMPRESSION_VERSION=1,6,1
IMAGE__COMPRESSION_WEIGHT=4.000000
IMAGE__CREATION_DATE=Thu Jul 24 08:20:08 2003
IMAGE__INPUT_FILE_SIZE=12033322.000000
IMAGE__INPUT_NAME=D:\Free State\5\WGS2929BC.TIF
IMAGE__TARGET_COMPRESSION_RATIO=29.999962
IMAGE__Z_ORIGIN=0.000000
IMAGE__Z_RESOLUTION=0.000000
VERSION=MG2
Image Structure Metadata:
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left ( 26.7496329, -26.7505122)
Lower Left ( 26.7496329, -27.0005430)
Upper Right ( 26.9996998, -26.7505122)
Lower Right ( 26.9996998, -27.0005430)
Center ( 26.8746663, -26.8755276)
Band 1 Block=1024x128 Type=Byte, ColorInterp=Red
Min=145.000 Max=255.000
Minimum=145.000, Maximum=255.000, Mean=235.534, StdDev=13.500
Overviews: 3461x3461, 1731x1731, 866x866, 433x433, 217x217, 109x109, 55x55, 28x28
Metadata:
STATISTICS_APPROXIMATE=YES
STATISTICS_MAXIMUM=255
STATISTICS_MEAN=235.53421487603
STATISTICS_MINIMUM=145
STATISTICS_STDDEV=13.499711766257
STATISTICS_VALID_PERCENT=100
Band 2 Block=1024x128 Type=Byte, ColorInterp=Green
Min=149.000 Max=255.000
Minimum=149.000, Maximum=255.000, Mean=238.647, StdDev=12.531
Overviews: 3461x3461, 1731x1731, 866x866, 433x433, 217x217, 109x109, 55x55, 28x28
Metadata:
STATISTICS_APPROXIMATE=YES
STATISTICS_MAXIMUM=255
STATISTICS_MEAN=238.64727272727
STATISTICS_MINIMUM=149
STATISTICS_STDDEV=12.531463080668
STATISTICS_VALID_PERCENT=100
Band 3 Block=1024x128 Type=Byte, ColorInterp=Blue
Min=152.000 Max=255.000
Minimum=152.000, Maximum=255.000, Mean=237.746, StdDev=12.490
Overviews: 3461x3461, 1731x1731, 866x866, 433x433, 217x217, 109x109, 55x55, 28x28
Metadata:
STATISTICS_APPROXIMATE=YES
STATISTICS_MAXIMUM=255
STATISTICS_MEAN=237.74611570248
STATISTICS_MINIMUM=152
STATISTICS_STDDEV=12.490247658278
STATISTICS_VALID_PERCENT=100
Cool. Very informative. However, that lengthy command with all of the Docker options is a little clunky. To make it more efficient I’ll create a BASH alias.
alias gdalinfo="docker run -ti -v $(pwd):/data klokantech/gdal gdalinfo"
Now I just need to do this:
gdalinfo WGS2929BC.sid
Something to note from the gdalinfo
output above is that the .sid
file does not specify a coordinate system. It appears that these files are simply projected onto a longitude/latitude. We’ll want to be specific about that when we translate them.
Translating MrSID with GDAL
To translate .sid
files to GeoTIFF I’ll use the gdal_translate
utility. An alias will also make this easier.
alias gdal_translate="docker run -ti -v $(pwd):/data klokantech/gdal gdal_translate"
Now translation is as simple as this:
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif
The -of GTiff
flag just makes it explicit that the output format is GeoTIFF.
We can take this up a notch and specify the correct coordinate system, EPSG:4326 (which is equivalent to WGS84).
gdal_translate -of GTiff WGS2929BC.sid -a_srs EPSG:4326 WGS2929BC.tif
The size of the resulting .tif
file is pretty big (around 140 Mb), whereas the .sid
file is relatively minuscule (only around 4 Mb). It’d be great to make the .tif
a little smaller.
Let’s explore a few compression options.
Lossless Compression
First let’s try out some lossless compression options. We use the -co
(creation options) flag to specify the compression algorithm.
# Compression: LZW
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=lzw
# Compression: DEFLATE
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=deflate
# Compression: PACKBITS
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=packbits
# Compression: LZMA
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=lzma
Here are the results:
- LZW — 86 Mb
- DEFLATE — 78 Mb
- PACKBITS — 118 Mb and
- LZMA — 71 Mb (compression is seriously slow!).
Looks like DEFLATE might be the best option for lossless compression (factoring in compression ratio and speed).
Lossy Compression
What about some lossy compression options?
# Compression: JPEG
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=jpeg
# Compression: LERC
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co compress=lerc
Here are the results:
- JPEG — 22 Mb and
- LERC — 78 Mb.
The compression ratio with JPEG is by far the winner. We’ll stick with that and see what the image quality is like.
Colour Space
We also have some freedom in choosing the colour space.
# Colourspace: YCbCr (with JPEG compression)
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co photometric=ycbcr -co compress=jpeg
Here are the results:
- YCbCr (with JPEG compression) — 8 Mb.
Wow! Okay, that seriously shrinks the size of the file!
Tiling
Tiling can also be used to generate a more efficient .tif
file.
# Tiling
gdal_translate -of GTiff WGS2929BC.sid WGS2929BC.tif -co tiled=yes
Tiling does produce a slightly larger file, but this file is, in principle, much more efficient to query.
Using the Results in R
My ultimate goal is to have data which I can easily import into R. The translated GeoTIFF files are perfect for this.
Setup location of the GeoTIFF file.
FILEPATH <- "WGS2929BC.tif"
Using {rgdal}
Now load up the {rgdal}
package.
library(rgdal)
Use GDALinfo()
to extract metadata from the .tif
file (compare this with the metadata extracted from the corresponding .sid
file above).
GDALinfo(FILEPATH)
rows 6903
columns 6904
bands 3
lower left origin.x 29.49971
lower left origin.y -29.50041
res.x 3.622124e-05
res.y 3.622124e-05
ysign -1
oblique.x 0
oblique.y 0
driver GTiff
projection +proj=longlat +datum=WGS84 +no_defs
file WGS2929BC.tif
apparent band summary:
GDType hasNoDataValue NoDataValue blockSize1 blockSize2
1 Byte FALSE 0 256 256
2 Byte FALSE 0 256 256
3 Byte FALSE 0 256 256
apparent band statistics:
Bmin Bmax Bmean Bsd
1 0 255 NA NA
2 0 255 NA NA
3 0 255 NA NA
Metadata:
AREA_OR_POINT=Area
IMAGE__COMPRESSION_BLOCK_SIZE=512
IMAGE__COMPRESSION_GAMMA=2.000000
IMAGE__COMPRESSION_NLEV=8
IMAGE__COMPRESSION_VERSION=1,6,1
IMAGE__COMPRESSION_WEIGHT=4.000000
IMAGE__CREATION_DATE=Thu Jul 24 15:01:46 2003
IMAGE__INPUT_FILE_SIZE=17737854.000000
IMAGE__INPUT_NAME=D:\KZN\3\WGS2929BC.TIF
IMAGE__TARGET_COMPRESSION_RATIO=29.999962
IMAGE__Z_ORIGIN=0.000000
IMAGE__Z_RESOLUTION=0.000000
VERSION=MG2
Note that
- the projection is WGS84 as specified when we translated from
.sid
format and - there are three colour bands.
Using {raster}
We can also use the {raster}
package to read the file.
library(raster)
The raster()
function will read in just a single colour band.
grey <- raster(FILEPATH)
What’s the Coordinate Reference System (CRS)?
crs(grey)
Coordinate Reference System:
Deprecated Proj.4 representation: +proj=longlat +datum=WGS84 +no_defs
WKT2 2019 representation:
GEOGCRS["WGS 84 (with axis order normalized for visualization)",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic longitude (Lon)",east,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]],
AXIS["geodetic latitude (Lat)",north,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]]]
Check whether the coordinate system is longitude/latitude.
isLonLat(grey)
[1] TRUE
Confirmed.
What is the extent of the data?
grey@extent
class : Extent
xmin : 29.49971
xmax : 29.74978
ymin : -29.50041
ymax : -29.25037
This is the range of the data in longitude and latitude.
How many layers are there?
nlayers(grey)
[1] 1
Well, that makes sense since we only loaded a single colour band. Let’s now use the brick()
function to load all of the colour bands.
rgb <- brick(FILEPATH)
How many layers?
nlayers(rgb)
[1] 3
Excellent. Let’s zoom in to see the details.
plotRGB(
rgb,
interpolate = TRUE,
ext = extent(29.65, 29.70, -29.40, -29.35)
)
Very nice indeed. One of my favourite parts of the Drakensberg. And the image quality does not seem to be compromised by the aggressive compression (these images are a little “fuzzy” to start with!).
Conclusion
You can access the original MrSID and processed GeoTIFF files in the saffer-data-map repository. Have fun.