Yesterday we looked at how time series data can be sucked into Julia from Quandl. What happens if your data are sitting in a database? No problem, Julia can handle that too using one of a number of available database packages. I’ll be focusing on SQLite
and ODBC
, but it might be worthwhile checking out JDBC
, LevelDB
and LMDB
too.
SQLite
SQLite is a lightweight transactional SQL database engine that does not require a server or any major configuration. Installation is straightforward on most platforms.
The first step towards using SQLite from Julia is to load the package.
using SQLite
Next, for illustrative purposes, we’ll create a database (which exists as a single file in the working directory) and add a table which we’ll populate directly from a delimited file.
db = SQLiteDB("passwd.sqlite")
SQLiteDB{UTF8String}("passwd.sqlite",Ptr{Void} @0x00000000059cde38,0)
create(
db,
"passwd",
readdlm("/etc/passwd", ':'),
["username", "password", "UID", "GID", "comment", "homedir", "shell"]
)
1x1 ResultSet
| Row | "Rows Affected" |
|-----|-----------------|
| 1 | 0 |
Then the interesting bit: we execute a simple query.
query(db, "SELECT username, homedir FROM passwd LIMIT 10;")
10x2 ResultSet
| Row | "username" | "homedir" |
|-----|------------|-------------------|
| 1 | "root" | "/root" |
| 2 | "daemon" | "/usr/sbin" |
| 3 | "bin" | "/bin" |
| 4 | "sys" | "/dev" |
| 5 | "sync" | "/bin" |
| 6 | "games" | "/usr/games" |
| 7 | "man" | "/var/cache/man" |
| 8 | "lp" | "/var/spool/lpd" |
| 9 | "mail" | "/var/mail" |
| 10 | "news" | "/var/spool/news" |
Most of the expected SQL operations are supported by SQLite (check the documentation) and hence also by the Julia interface. When we’re done we close the database connection.
close(db)
Of course, the database we created in Julia is now available through the shell too.
ls -l passwd.sqlite
-rw-r-r- 1 colliera colliera 6144 Sep 18 07:21 passwd.sqlite
sqlite3 passwd.sqlite
SQLite version 3.8.7.4 2014-12-09 01:34:36
Enter ".help" for usage hints.
sqlite> pragma table_info(passwd);
0|username|TEXT|0||0
1|password|TEXT|0||0
2|UID|REAL|0||0
3|GID|REAL|0||0
4|comment|TEXT|0||0
5|homedir|TEXT|0||0
6|shell|TEXT|0||0
sqlite>
ODBC
If you need to access an enterprise DB (for example, Oracle, PostgreSQL, MySQL, Microsoft SQL Server or DB2) then the ODBC interface will be the way to go. To avoid the overhead of using one of these fancy DBs, I will demonstrate Julia’s ODBC functionality using the SQLite database we created above. Before we do that though, you’ll need to setup ODBC for SQLite. It’s not an onerous procedure at all. Then we fire up the ODBC
package and we’re ready to roll.
using ODBC
First we’ll check which drivers are available for ODBC (just SQLite in my case) and what data source names (DSNs) are registered.
listdrivers()
(String["SQLite","SQLite3"],String["Description=SQLite ODBC Driver\0Driver=libsqliteodbc.so\0Setup=libsqliteodbc.so\0UsageCount=1\0","Description=SQLite3 ODBC Driver\0Driver=libsqlite3odbc.so\0Setup=libsqlite3odbc.so\0UsageCount=1\0"])
listdsns()
(String["passwd"],String["SQLite3"])
We see that there is a DSN available for the passwd
database. First we create a connection:
db = ODBC.connect("passwd")
ODBC Connection Object
----------------------
Connection Data Source: passwd
passwd Connection Number: 1
Contains resultset(s)? No
At this point I’d like to execute a query. However, somewhat disappointingly, this doesn’t work. No error message but also no results. I’ve logged an issue with the package maintainer, so hopefully this will be resolved soon.
query("SELECT * FROM passwd LIMIT 5;", db)
0x0 DataFrame
What’s promising though is that I can still retrieve the metadata for that query.
querymeta("SELECT * FROM passwd LIMIT 5;", db)
Resultset metadata for executed query
-------------------------------------
Query: SELECT * FROM passwd LIMIT 5
Columns: 7
Rows: 0
7x5 DataFrame
| Row | Names | Types | Sizes | Digits | Nullable |
|-----|------------|------------------------|-------|--------|----------|
| 1 | "username" | ("SQL_LONGVARCHAR",-1) | 65536 | 0 | 1 |
| 2 | "password" | ("SQL_LONGVARCHAR",-1) | 65536 | 0 | 1 |
| 3 | "UID" | ("SQL_DOUBLE",8) | 54 | 0 | 1 |
| 4 | "GID" | ("SQL_DOUBLE",8) | 54 | 0 | 1 |
| 5 | "comment" | ("SQL_LONGVARCHAR",-1) | 65536 | 0 | 1 |
| 6 | "homedir" | ("SQL_LONGVARCHAR",-1) | 65536 | 0 | 1 |
| 7 | "shell" | ("SQL_LONGVARCHAR",-1) | 65536 | 0 | 1 |
Again, when we’re done, we close the database connection.
disconnect(db)
We’re now covered a number of means for getting data into Julia. Over the next few days we’ll be looking at Julia’s capabilities for analysing data. Stay tuned. In the meantime you can check out the code for today (and previous days) on GitHub. Take a look at the talk below. Also, there’s a great tutorial on working with SQLite, which is well worth looking at.