Introduction
To make the pre-processing of the data for the further analysis more
user-friendly and easier to automatize, the annex
package
provides a configuration feature.
Technically is just a regular data.frame
used as a lookup table to bring the data into the expected format which
can come from any source (e.g., from a text file, XLSX file, created via
script, …).
Configuration content and purpose
The following output shows the first n = 6
lines from
the configuration object used in the article Prepare data from textfiles; the
first six lines of a data.frame with \(6\) variables (columns).
head(config, n = 6)
## column variable study unit home room
## 1 X datetime <NA> <NA> <NA> <NA>
## 2 co2 CO2 DEMO_STUD ppm Casa_Blanca Bed1
## 3 humidity rH DEMO_STUD % Casa_Blanca Bed1
## 4 pressure Pressure DEMO_STUD hPa Casa_Blanca Bed1
## 5 temp T DEMO_STUD C Casa_Blanca Bed1
## 6 voc VOC DEMO_STUD ug/m3 Casa_Blanca Bed1
This is used to tell the annex
package which information
needs to be processed, where to find the important information, and how
to translate (rename) the variables to meet the annex
standard format.
The six required columns
-
column
: The name of the column in the imported data set (e.g., from an XLSX file or text file). -
variable
: Defines the new variable name; one of the different variables (parameters) expected by the annex package (see Variable definition). -
study
: Name of the study the data belong to. -
home
: Name of the home (building) the data belong to. -
room
: Abbreviation of the type of room the data belong to (see Room definition). -
unit
: Units of the input data (raw data);annex_prepare()
will automatically convert to ’annex standard units (see Variable definition).
While column
, study
and home
can be chosen freely by the user, only a series of pre-defined values
are available for variable
(plus unit
) and
room
(not case sensitive). See Lookup functions to get a full
list of the defined values.
The required rows
There is one required row which must exist which is the one which
defines where the date and time information can be found. This row
defines the variable "datetime"
and where it is located in
the imported data set (in this case a variable called
"X"
).
All other rows specify which column in the imported data set contains
and which information should be processed. Variables not listed in this
configuration (but available in the imported data set) will be ignored
in the processing steps (namely when calling
annex_prepare()
).
Purpose
The purpose of this object is to prepare an imported data set into
the object expected by the annex
package for the analysis
before saving the data for later use.
It is used to define where the required information is stored and
used as a lookup table (translation table) to prepare/rename the
variables, to format the data according to the annex
standard format to be used for the further processing steps.
Read/import configuration
The two articles Prepare data from XLSX Prepare data from textfiles show examples how to import (read) such a configuration from an XLSX file or a text file. Any file which can be imported into a data.frame via R can be used.