Day 4: Functions

Julia performs Just-in-Time (JIT) compilation using a Low Level Virtual Machine (LLVM) to create machine-specific assembly code. The first time a function is called, Julia compiles the function’s source code and the results are cached and used for any subsequent calls to the same function. However, there are some additional wrinkles to this story.

Julia caters for generic functions which will operate on variables with a variety of data types. But it gains a lot of speed by compiling specialised versions of a function which are optimised to work on particular data types. As a result the code generated by Julia achieves performance which is comparable to that of lower level languages like C and Fortran, as illustrated by the benchmark results below.

Declaring Functions

In Julia a function can be defined in various ways, ranging from verbose to succinct. The last statement in a function definition becomes the return value, so it is not necessary to have an explicit return statement (although for clarity it is often a good idea to be explicit!).

# Verbose form.
#
function square(x)
    return x^2 # The return keyword is redundant, but still a good idea.
end

# One-line form.
#
square(x) = x^2
hypotenuse(x, y) = sqrt(x^2 + y^2)
 = x -> x*x # To get x² type x^2 followed by Tab in the REPL.

# Anonymous (lambda) form.
#
x -> x^2
#
anonsquare = function (x) # Creates an anonymous function and assigns it to anonsquare.
    x^2
end

The functions defined above are generic in the sense that they do not pertain to arguments of any particular data type: they should work for all (reasonable) arguments.

Functions can return multiple values as a tuple.

julia> function divide(n, m)
           div(n,m), n%m
       end
divide (generic function with 1 method)
julia> divide(7,3)
(2,1)

Julia has an interesting convention for functions with side effects: function names which end in a ! modify their first argument. For example, sort() will return a sorted version of its input, while sort!() will reorder the elements of the input in place. Maintaining this syntax is important since all function arguments are passed by reference and it is thus perfectly permissible that the values of arguments be modified within a function.

Introspection

We can get a glimpse of at what happens under the hood during the compilation process. code_llvm() returns the bytecode generated by the LLVM for a generic function when its applied to an argument of a given type. For example, below we see the bytecode for the function sqrt() applied to a 64 bit floating point argument.

julia> code_llvm(sqrt, (Float64,))

define double @julia_sqrt_20446(double) {
top:
  %1 = fcmp uge double %0, 0.000000e+00, !dbg !1150
  br i1 %1, label %pass, label %fail, !dbg !1150

fail: ; preds = %top
  %2 = load %jl_value_t** @jl_domain_exception, align 8, !dbg !1150, !tbaa %jtbaa_const
  call void @jl_throw_with_superfluous_argument(%jl_value_t* %2, i32 131), !dbg !1150
  unreachable, !dbg !1150

pass: ; preds = %top
  %3 = call double @llvm.sqrt.f64(double %0), !dbg !1150
  ret double %3, !dbg !1150
}

Digging deeper we can scrutinise the native assembly instructions using code_native().

julia> code_native(sqrt, (Float64,))
        .text
Filename: math.jl
Source line: 131
        push RBP
        mov RBP, RSP
        xorpd XMM1, XMM1
        ucomisd XMM1, XMM0
Source line: 131
        ja 6
        sqrtsd XMM0, XMM0
        pop RBP
        ret
        movabs RAX, 140208263704872
        mov RDI, QWORD PTR [RAX]
        movabs RAX, 140208248879392
        mov ESI, 131
        call RAX

Not being a Computer Scientist, this information is of limited use to me. But it will serve to illustrate the next point. As one might expect, the assembly code generated for a particular function depends on type of the arguments being fed into that function. If, for example, we look at the assembly code for sqrt() applied to a 64 bit integer argument then we see that there are a number of differences.

julia> code_native(sqrt, (Int64,))
        .text
Filename: math.jl
Source line: 133
        push RBP
        mov RBP, RSP
Source line: 133
        cvtsi2sd XMM0, RDI
        xorpd XMM1, XMM1
        ucomisd XMM1, XMM0
        ja 6
        sqrtsd XMM0, XMM0
        pop RBP
        ret
        movabs RAX, 140208263704872
        mov RDI, QWORD PTR [RAX]
        movabs RAX, 140208248879392
        mov ESI, 133
        call RAX

The JIT compiler is effectively generating versions of the generic function sqrt() which are specialised for different argument types. This is good for performance because the code being executed is always optimised for the appropriate argument types.

Type Specification and Multiple Dispatch

Julia implements multiple dispatch, which means that the code executed for a specific function depends dynamically on the data type of the arguments. In the case of generic functions, each time that a function is called with a new data type specialised code is generated. But it’s also possible to define different versions of a function which are selected on the basis of the argument data type. These would then be applied in lieu of the corresponding generic function. This is similar to function overloading.

The Julia documentation articulates these ideas better than I can:

To facilitate using many different implementations of the same concept smoothly, functions need not be defined all at once, but can rather be defined piecewise by providing specific behaviors for certain combinations of argument types and counts. A definition of one possible behavior for a function is called a method. Thus far, we have presented only examples of functions defined with a single method, applicable to all types of arguments. However, the signatures of method definitions can be annotated to indicate the types of arguments in addition to their number, and more than a single method definition may be provided. When a function is applied to a particular tuple of arguments, the most specific method applicable to those arguments is applied. Thus, the overall behavior of a function is a patchwork of the behaviors of its various method definitions. If the patchwork is well designed, even though the implementations of the methods may be quite different, the outward behavior of the function will appear seamless and consistent.

Specific data types are indicated by the :: type annotation operator. For example, we can create a version of square() which does something unique (and inane) for integer arguments.

julia> function square(x::Int64)
           println("Squaring an integer.")
           x^2
       end
square (generic function with 2 methods)
julia> square(1.5) # Using the generic function
2.25
julia> square(2) # Using the function specialised for Int64 arguments
Squaring an integer.
4

You can get a list of the methods associated with a function using methods(). Below we have both the specialised (Int64) version as well as the fully generic version of square().

julia> methods(square)
# 2 methods for generic function "square":
square(x::Int64) at none:2
square(x) at none:1

Type annotations can also be used within the local scope of a function body (as well as for, while, try, let, and type blocks).

function foo(x)
    y::Float64 = 3
    return(x + y)
end

I’ve really only scratched the surface here. There’s a lot more to say about default, positional and keyword arguments, operators, parametric types and a host of other topics. You can find further details of today’s Julia ramblings on GitHub and there’s even more in the documentation for functions and methods. Finally, check out some thoughts on writing good Julia functions from Chris von Csefalvay.