Config file/objects

Introduction

To make the pre-processing of the data for the further analysis more user-friendly and easier to automatize, the annex package provides a configuration feature.

Technically is just a regular data.frame used as a lookup table to bring the data into the expected format which can come from any source (e.g., from a text file, XLSX file, created via script, …).

Configuration content and purpose

The following output shows the first n = 6 lines from the configuration object used in the article Prepare data from textfiles; the first six lines of a data.frame with $6$ variables (columns).

head(config, n = 6)

##     column variable     study  unit        home room
## 1        X datetime      <NA>  <NA>        <NA> <NA>
## 2      co2      CO2 DEMO_STUD   ppm Casa_Blanca Bed1
## 3 humidity       rH DEMO_STUD     % Casa_Blanca Bed1
## 4 pressure Pressure DEMO_STUD   hPa Casa_Blanca Bed1
## 5     temp        T DEMO_STUD     C Casa_Blanca Bed1
## 6      voc      VOC DEMO_STUD ug/m3 Casa_Blanca Bed1

This is used to tell the annex package which information needs to be processed, where to find the important information, and how to translate (rename) the variables to meet the annex standard format.

The six required columns

column: The name of the column in the imported data set (e.g., from an XLSX file or text file).
variable: Defines the new variable name; one of the different variables (parameters) expected by the annex package (see Variable definition).
study: Name of the study the data belong to.
home: Name of the home (building) the data belong to.
room: Abbreviation of the type of room the data belong to (see Room definition).
unit: Units of the input data (raw data); annex_prepare() will automatically convert to ’annex standard units (see Variable definition).

While column, study and home can be chosen freely by the user, only a series of pre-defined values are available for variable (plus unit) and room (not case sensitive). See Lookup functions to get a full list of the defined values.

The required rows

There is one required row which must exist which is the one which defines where the date and time information can be found. This row defines the variable "datetime" and where it is located in the imported data set (in this case a variable called "X").

All other rows specify which column in the imported data set contains and which information should be processed. Variables not listed in this configuration (but available in the imported data set) will be ignored in the processing steps (namely when calling annex_prepare()).

Purpose

The purpose of this object is to prepare an imported data set into the object expected by the annex package for the analysis before saving the data for later use.

It is used to define where the required information is stored and used as a lookup table (translation table) to prepare/rename the variables, to format the data according to the annex standard format to be used for the further processing steps.

Read/import configuration

The two articles Prepare data from XLSX Prepare data from textfiles show examples how to import (read) such a configuration from an XLSX file or a text file. Any file which can be imported into a data.frame via R can be used.

Technical note

Note that the configuration is a convenience feature. Technically it can be bypassed by preparing the data set outside annex before starting with the analysis.