This article explains how to import and process data with the
package when the require data is available as tabular
text files (CSV).
To demonstrate this, two files are used called demo_Bedroom.txt (contains the measurement data) as well as demo_Bedroom_config.TXT (contains configuration; see article Config file).
Both files can easily be read using base R functions, namely
and its interfacing functions like
, utils::read.delim()
etc. (see
for more details).
Reading the data
The first step is to import both (i) the measurement data (stored on
) and (ii) the configuration (stored on
raw_df <- read.csv("demo_Bedroom.txt")
config <- read.table("demo_Bedroom_config.TXT",
comment.char = "#", sep = "",
header = TRUE, na.strings = c("NA", "empty"))
# see ?read.table for details
# Class and dimension of the objects
c("raw_df" =, "config" =
## raw_df config
## raw_df config
## [1,] 51890 7
## [2,] 8 6
Both objects are of class data.frame
(tibble data frames
to be precise) with a dimension of \(51890
\times 8\) (raw_df
) and \(7 \times 6\) (config
The first few observations (rows) of the two objects look as follows:
head(raw_df[, 1:4], n = 3) # First three columns only
## X radonShortTermAvg temp humidity
## 1 2011-01-01 00:01:26 151 18.8 51
## 2 2011-01-01 00:06:25 151 18.8 51
## 3 2011-01-01 00:11:25 151 18.8 51
head(config, n = 3)
## column variable study unit home room
## 1 X datetime <NA> <NA> <NA> <NA>
## 2 co2 CO2 DEMO_STUD ppm Casa_Blanca Bed1
## 3 humidity rH DEMO_STUD % Casa_Blanca Bed1
The object raw_df
contains variables (columns) named
“X”, “radonShortTermAvg”, “temp”, “humidity” which are the original
names from the XLSX sheet, the config
object contains the
definition what the columns in raw_df
contains and where
they belong to. For more details read the article about the Config file.
Checking the config object
To check whether or not the config
object is as expected
by the annex
package, the function
can be used. In case problems would be
detected, an error will be thrown (see Config
file). Else, the function is silent as in this example:
… no errors, the config
object meets the
requirements. Note that this step is not
necessary as it will be performed automatically when calling
but can be handy during development.
Preparing data
While raw_df
contains the raw data set, the
object contains the information on how to rename the
columns and where the observations belong to.
is a helper function to prepare the data
set for further steps.
prepared_df <- annex_prepare(raw_df, config, quiet = TRUE)
## [1] "datetime" "study" "home" "room" "CO2" "Pressure"
## [7] "Radon" "RH" "T" "VOC"
## Error in annex_prepare(raw_df, config, quiet = TRUE): variable `datetime` (originally column `X`) must be of class POSIXt
At this moment we get an error as the variable containing the date
and time information is not a proper datetime object (object of class
) but a character. As the information comes in a
proper ISO format, we simply convert the column (column X
in raw_df
) and call annex_prepare()
# see ?as.POSIXct for details and options
raw_df <- transform(raw_df, X = as.POSIXct(X, tz = "UTC"))
## [1] "POSIXct" "POSIXt"
prepared_df <- annex_prepare(raw_df, config, quiet = TRUE)
## datetime study home room CO2 Pressure Radon RH T VOC
## 1 2011-01-01 00:01:26 DEMO_STUD Casa_Blanca BED1 470 1026.5 151 51 18.8 136
## 2 2011-01-01 00:06:25 DEMO_STUD Casa_Blanca BED1 477 1026.5 151 51 18.8 142
## 3 2011-01-01 00:11:25 DEMO_STUD Casa_Blanca BED1 483 1026.5 151 51 18.8 131
## 4 2011-01-01 00:16:25 DEMO_STUD Casa_Blanca BED1 477 1026.5 151 51 18.8 140
## 5 2011-01-01 00:21:25 DEMO_STUD Casa_Blanca BED1 481 1026.4 151 51 18.8 135
## 6 2011-01-01 00:26:25 DEMO_STUD Casa_Blanca BED1 483 1026.4 168 51 18.7 131
performs a series of tasks:
- Checking the
object (callsannex_check_config()
internally). If theconfig
object is valid, - the variables (columns) in
are renamed and checked to be of the correct class, - informs the user if there are any columns in
not included inconfig
(just a note) and additional columns defined inconfig
which do not occur inraw_df
, and returns the modified (possibly subsetted) object, - ensures that
is a proper datetime object (POSIXt
The checks of missing/additional definitions in config
are intended to inform the user about possible misspecifications and
will not result in an error.