JuliaStats is a meta-project which consolidates various packages related to statistics and machine learning in Julia. Well worth taking a look if you plan on working in this domain.

```
x = rand(10);
mean(x)
```

0.5287191472784906

```
```julia
std(x)
```

```
0.2885446536178459
```

Julia already has some builtin support for statistical operations, so additional packages are not strictly necessary. However they do increase the scope and ease of possible operations (as we’ll see below).Julia already has some builtin support for statistical operations. Let’s kick off by loading all the packages that we’ll be looking at today.

```
using StatsBase, StatsFuns, StreamStats
```

## StatsBase

The documentation for StatsBase can be found here. As the package name implies, it provides support for basic statistical operations in Julia.

High level summary statistics are generated by `summarystats()`

.

```
summarystats(x)
```

```
Summary Stats:
Mean: 0.528719
Minimum: 0.064803
1st Quartile: 0.317819
Median: 0.529662
3rd Quartile: 0.649787
Maximum: 0.974760
```

Weighted versions of the mean, variance and standard deviation are implemented. There’re also geometric and harmonic means.

```
w = WeightVec(rand(1:10, 10)); # A weight vector.
mean(x, w) # Weighted mean.
```

```
0.48819933297961043
```

```
var(x, w) # Weighted variance.
```

```
0.08303843715334995
```

```
std(x, w) # Weighted standard deviation.
```

```
0.2881639067498738
```

```
skewness(x, w)
```

```
0.11688162715805048
```

```
kurtosis(x, w)
```

```
-0.9210456851144664
```

```
mean_and_std(x, w)
```

```
(0.48819933297961043,0.2881639067498738)
```

There’s a weighted median as well as functions for calculating quantiles.

```
median(x) # Median.
```

```
0.5296622773635412
```

```
median(x, w) # Weighted median.
```

```
0.5729104703595038
```

```
quantile(x)
```

```
5-element Array{Float64,1}:
0.0648032
0.317819
0.529662
0.649787
0.97476
```

```
nquantile(x, 8)
```

```
9-element Array{Float64,1}:
0.0648032
0.256172
0.317819
0.465001
0.529662
0.60472
0.649787
0.893513
0.97476
```

```
iqr(x) # Inter-quartile range.
```

```
0.3319677541313941
```

Sampling from a population is also catered for, with a range of algorithms which can be applied to the sampling procedure.

```
sample(['a':'z'], 5) # Sampling (with replacement).
```

```
5-element Array{Char,1}:
'w'
'x'
'e'
'e'
'o'
```

```
wsample(['T', 'F'], [5, 1], 10) # Weighted sampling (with replacement).
```

```
10-element Array{Char,1}:
'F'
'T'
'T'
'T'
'F'
'T'
'T'
'T'
'T'
'T'
```

There’s also functionality for empirical estimation of distributions from histograms and a range of other interesting and useful goodies.

## StatsFuns

The StatsFuns package provides constants and functions for statistical computing. The constants are by no means essential but certainly very handy. Take, for example, `twoπ`

and `sqrt2`

.

There are some mildly exotic mathematical functions available like logistic, logit and softmax.

```
logistic(-5)
```

```
0.0066928509242848554
```

```
logistic(5)
```

```
0.9933071490757153
```

```
logit(0.25)
```

```
-1.0986122886681098
```

```
logit(0.75)
```

```
1.0986122886681096
```

```
softmax([1, 3, 2, 5, 3])
```

```
5-element Array{Float64,1}:
0.0136809
0.101089
0.0371886
0.746952
0.101089
```

Finally there is a suite of functions relating to various statistical distributions. The functions for the Normal distribution are illustrated below, but there’re functions for Beta and Binomial distribution, the Gamma and Hypergeometric distribution and many others. The function naming convention is consistent across all distributions.

```
normpdf(0); # PDF
normlogpdf(0); # log PDF
normcdf(0); # CDF
normccdf(0); # Complementary CDF
normlogcdf(0); # log CDF
normlogccdf(0); # log Complementary CDF
norminvcdf(0.5); # inverse-CDF
norminvccdf(0.99); # inverse-Complementary CDF
norminvlogcdf(-0.693147180559945); # inverse-log CDF
norminvlogccdf(-0.693147180559945); # inverse-log Complementary CDF
```

## StreamStats

Finally, the StreamStats package supports calculating online statistics for a stream of data which is being continuously updated.

```
average = StreamStats.Mean()
```

```
Online Mean
* Mean: 0.000000
* N: 0
```

```
variance = StreamStats.Var()
```

```
Online Variance
* Variance: NaN
* N: 0
```

```
for x in rand(10)
update!(average, x)
update!(variance, x)
@printf("x = %3.f: mean = %.3f | variance = %.3f\n", x, state(average), state(variance))
end
x = 0.928564: mean = 0.929 | variance = NaN
x = 0.087779: mean = 0.508 | variance = 0.353
x = 0.253300: mean = 0.423 | variance = 0.198
x = 0.778306: mean = 0.512 | variance = 0.164
x = 0.566764: mean = 0.523 | variance = 0.123
x = 0.812629: mean = 0.571 | variance = 0.113
x = 0.760074: mean = 0.598 | variance = 0.099
x = 0.328495: mean = 0.564 | variance = 0.094
x = 0.303542: mean = 0.535 | variance = 0.090
x = 0.492716: mean = 0.531 | variance = 0.080
```

In addition to the mean and variance illustrated above, the package also supports online versions of min() and max(), and can be used to generate incremental confidence intervals for Bernoulli and Poisson processes.

That’s it for today. Check out the full code on github and watch the video below.