Retail Data: R Package

How (not) to get screwed on Valentine’s Day

Have you ever noticed how things seem to get really expensive at specific times of the year? Like Mother’s Day and Valentine’s Day? Have you ever felt a bit ripped off when buying an over-priced bouquet of flowers or box of chocolates? Have you ever wondered just how much those prices have been inflated?

Of course you have!

But it’s always been a niggling suspicion, never a fact. Where’s the evidence?

I set out to gather that evidence using the [Retail Pricing Data API](/blog/2020/03/retail-data-scraping-api/).

I could have gathered the required data by hitting the API with a series of curl requests from the command line. Or using the {httr} package from R. However, interacting with the API directly is somewhat laborious. It’d be a lot easier if the API was wrapped up in an R package.

The R Package 

So I made a hex logo and after that the R package basically wrote itself.

The repository is at and there’s a homepage too.

Install the package from GitHub.

> remotes::install_github("datawookie/trundler")

Then load it up.

> library(trundler)

List of Retailers 

The retailer() function gives access to a table of retailers, their website URLs and operating currencies.

> retailer()
# A tibble: 67 x 4
   retailer_id name             url                              currency
         <int> <chr>            <chr>                            <chr>   
 1           1 EEM Technologies USD     
 2           2 Clicks             ZAR     
 3           3 Dischem       ZAR     
 4           4 Game             ZAR     
 5           5 Woolworths    ZAR     
 6           6 Fortnum & Mason GBP     
 7           7 John Lewis       GBP     
 8           8 Marks & Spencer GBP     
 9           9 Pick 'n Pay           ZAR     
10          10 Makro           ZAR     
# … with 57 more rows

List of Products 

To dig any deeper than that you’ll need an API key (ping me if you want one!). Use set_api_key() to specify the key. You only need to do this once. The key will then be used for all subsequent transactions.

# NOTE: This key will not work. Please ping me for a valid key!
> API_KEY = "5bed3ac9-6dc9-4926-aed8-8c97a7cb8057"
> set_api_key(API_KEY)

The retailer_products() function yields a product table for a specific retailer, where each product is assigned a name, brand, model, SKU and barcode (if available).

> retailer_products(5) %>% select(product_id, name, brand, sku)
# A tibble: 52,025 x 4
   product_id name                                                     brand                         sku          
        <int> <chr>                                                    <chr>                         <chr>        
 1    1291054 Stay New Stretch Cotton Leggings                         Woolworths Classic Collection 6009207524412
 2    1291055 Zip Pocket Straight Leg Pants                            W Collection                  6009211116443
 3    1291056 Knit Indigo Dungaree Set                                 NA                            6009217050864
 4    1291057 Fluffy Collar Boots (Size 4-13) Younger Girl             Walkmates                     6009214818078
 5    1291058 T-Bar Leather School Shoes (Size 2 - 8) Older Girl       Walkmates                     6009211506220
 6    1291060 Kitty Ankle Strap Leather Pumps (Size 4-13) Younger Girl Walkmates                     6009214250779
 7    1291062 Leather T-Bar Shoes (Size 2 - 8) Older Girl              Walkmates                     6009211506589
 8    1291063 Sparkly Boots (Size 4-13) Younger Girl                   Walkmates                     6009214822778
 9    1291065 Lace Up Leather School Shoes (Size 8 - 1) Younger Boy    Walkmates                     6009214505084
10    1291067 Lace Up Leather School Shoes (Size 2 - 10) Older Boy     Walkmates                     6009214800431
# … with 52,015 more rows

The data are paginated and this function can take a little while to return.

Product Details 

You can get more granular, looking at a specific product using the product() and product_prices() functions.

Product: Wine 

Let’s take a look at a bottle of wine.

> nederburg_lyric <- product(531589)
> names(nederburg_lyric)
[1] "product_id"  "retailer_id" "url"         "name"        "brand"       "sku"         "barcodes"
> nederburg_lyric$name
[1] "Nederburg Lyric 750ml"
> nederburg_lyric$sku
[1] "000000000000230428_EA"
> nederburg_lyric$barcodes
[1] "6001452314503"

The price history data includes both regular and promotion prices, as well as availability. Availability data are not currently being gathered for this product.

> product_prices(531589)
  product_id                      time price price_promotion available
1     531589 2020-03-07T01:04:08+00:00 55.00              45        NA
2     531589 2020-02-22T01:00:32+00:00 55.00              NA        NA
3     531589 2020-02-15T01:00:02+00:00 55.00              45        NA
4     531589 2020-02-08T00:43:46+00:00 51.99              45        NA
5     531589 2020-02-01T00:57:02+00:00 51.99              45        NA

Product: Clothing 

Let’s take a look at an item of clothing from another retailer. In this case we are gathering availability data.

> product_prices(788165)
   product_id                      time  price price_promotion available
1      788165 2020-03-12T01:24:47+00:00  79.99              NA     FALSE
2      788165 2020-03-11T01:14:35+00:00  79.99              NA     FALSE
3      788165 2020-03-10T01:16:10+00:00  79.99              NA     FALSE
4      788165 2020-03-09T01:20:16+00:00  79.99              NA     FALSE
5      788165 2020-03-08T01:13:43+00:00  79.99              NA     FALSE
6      788165 2020-03-07T01:18:16+00:00  79.99              NA     FALSE
7      788165 2020-03-06T01:07:52+00:00  79.99              NA     FALSE
8      788165 2020-03-05T01:18:41+00:00  79.99              NA     FALSE
9      788165 2020-03-04T01:10:30+00:00  79.99              NA     FALSE
10     788165 2020-03-03T01:12:49+00:00  79.99              NA     FALSE
11     788165 2020-03-02T01:17:34+00:00  79.99              NA     FALSE
12     788165 2020-03-01T04:51:23+00:00  79.99              NA     FALSE
13     788165 2020-02-27T01:18:01+00:00  79.99              NA      TRUE
14     788165 2020-02-26T01:23:26+00:00  79.99              NA      TRUE
15     788165 2020-02-25T01:10:53+00:00  79.99              NA      TRUE
16     788165 2020-02-24T01:21:07+00:00  79.99              NA      TRUE
17     788165 2020-02-23T01:21:20+00:00  79.99              NA      TRUE
18     788165 2020-02-22T01:22:24+00:00  79.99              NA      TRUE
19     788165 2020-02-21T01:24:15+00:00  79.99              NA      TRUE
20     788165 2020-02-20T01:23:19+00:00  79.99              NA      TRUE
21     788165 2020-02-19T01:34:00+00:00  99.99              NA      TRUE
22     788165 2020-02-18T01:13:51+00:00  99.99              NA      TRUE
23     788165 2020-02-17T01:40:17+00:00  99.99              NA      TRUE
24     788165 2020-02-16T01:30:09+00:00  99.99              NA      TRUE
25     788165 2020-02-15T01:37:56+00:00  99.99              NA      TRUE
26     788165 2020-02-14T01:21:11+00:00 119.99              NA      TRUE
27     788165 2020-02-13T01:30:18+00:00 119.99              NA      TRUE

This product was initially selling at R 119.99. The price dropped to R 99.99 on 15 February 2020 and then R 79.99 on 20 February 2020. On 1 March 2020 it sold out (no longer available).

Let’s take a look at the product in question.

Yeah, I can see why that sold out!

Valentine’s Day 

Let’s return to our original question: to what degree are prices inflated around “special” days like Valentine’s Day?

Getting Screwed 

If you want to get screwed on Valentines day, buy roses.

From the middle of January the price of this bunch of roses increased repeatedly, so that by Valentine’s Day, you were paying at least 50% more per bunch. That’s really getting screwed. And not in a good way.

Here’s another example where the price of a bunch of 100 roses varied wildly before Valentine’s Day, the final price being roughly 60% higher than it was a month earlier.

The price of a small vase of pink roses would set you back by less than R300 at the beginning of January, but this shot up to just less than R500 in the latter part of January and remained around that level until after Valentine’s Day.

Clearly flowers in general (and roses in particular!) are not an economical gift on Valentine’s Day.

Not Getting Screwed 

What if you don’t want to get screwed? Well, then the obvious solution is… buy beer.

Why? Because the price of beer, from the same retailer, remains completely unchanged.

The price of a bunch of roses is included in the above plot (grey dashed line) for reference.


This is just a fun illustration of what’s possible with retail price historical data. There are many serious and useful things that you can do with it too! If you’re interested in these data, then check out the project page and get in touch for an API key.

Finally, a note on sampling frequency: we are currently sampling most retailers only once a week. If there is interest in these data then we’ll be able to scale out our infrastructure and start to sample more retailers daily. Ultimate goal is to sample all retailers and products once per day.

Below is the talk that I gave with Emma Collier about these data at satRday Johannesburg (2020).