HCRIS Field Labels

Fathom Data has been doing a lot of work with the HCRIS (Healthcare Cost Report Information System) data. The underlying reports are submitted as a spreadsheet with multiple sheets. The data are then extracted and recorded in a simple tabular format, with each field linked to a worksheet code (wksht_cd), column number (clmn_num) and line number (clmn_num). These three keys are then mapped to a single compound key. The resulting data look something like this:

       key      value
   a_c1_30	  8960498
   a_c2_30	  2559809
   a_c6_30	 -1581948
  a_c1_200	 45448537
  a_c2_200	100473288
  a_c6_200	-23947062
 a7_1_c1_1	  2696198
 a7_1_c1_6	 87655935
 a7_1_c2_6	  4190056
 a7_1_c1_8	168428817
 a7_1_c2_8	  4391574
a7_1_c1_10	168428817
a7_1_c2_10	  4391574

Of course, for these data to make any sense at all it’s necessary to map backwards from the compound key to the location within the original spreadsheet. And once you know where the data belong, you’ll also want to know what the data mean.

Fortunately this relationship is documented and available as a PDF file.

Extract from SAS file record layout.

This is useful, but for analytical purposes a PDF is not ideal. We used the {pdftools} package to scrape the text from the PDF and, after a significant bit of wrangling ended up with a nice clean CSV file.

# A tibble: 6 × 6
  key        type    wksht_cd clmn_num line_num label                 
  <chr>      <chr>   <chr>    <chr>    <chr>    <chr>                 
1 a7_1_c1_1  NUMERIC A700001  00100    00100    Land                  
2 a7_1_c1_10 NUMERIC A700001  00100    01000    Total                 
3 a7_1_c1_2  NUMERIC A700001  00100    00200    Land Improvements     
4 a7_1_c1_3  NUMERIC A700001  00100    00300    Buildings and Fixtures
5 a7_1_c1_4  NUMERIC A700001  00100    00400    Building Improvements 
6 a7_1_c1_5  NUMERIC A700001  00100    00500    Fixed Equipment       


Here are the artefacts of this analysis: