Introduction
Once the data is prepared, the analysis can be performed. This article continues the example used in the article Prepare data from XLSX but works the very same way with the other examples included in the documentation, e.g., Prepare data from textfiles.
Please have a look at this
article for more information on what the following lines of code do
and/or to get the "demo_UIBK.xlsx"
file to follow this
article.
library("readxl")
# Importing data and config object (meta information)
raw_df <- read_excel("demo_UIBK.xlsx", sheet = "measurements")
config <- read_excel("demo_UIBK.xlsx", sheet = "annex_configuration")
config <- subset(config, process == TRUE) # Custom subsetting
# Prepare data for annex()
library("annex")
prepared_df <- annex_prepare(raw_df, config, quiet = TRUE)
head(prepared_df, n = 3)
## datetime study home room RH SolRad T CO2 Other
## 1 2011-10-01 00:05:00 demo General AMB 88.4 3 13.35 NA NA
## 2 2011-10-01 00:10:00 demo General AMB 88.9 3 13.25 NA NA
## 3 2011-10-01 00:15:00 demo General AMB 89.2 3 13.17 NA NA
Performing the analysis
Once the data set is prepared properly (note that
annex_prepare()
is a convenience function, can also be done
manually) the final object can be prepared.
Prepare annex
object
annex()
is the creator function which creates an object
of class annex
(S3) providing a series of methods and
functions to conduct the final analysis. More information on object
orientation system S3 can be found e.g. here
The function expects a formula as input which describes how to process the data. The three parts of the formula are:
<measurements to be processed> ~ <datetime> | <grouping variables>
- The first part defines which variables (measurements) should be processed
- Part two is always
~ datetime
; the date and time information for the statistics - Part three the grouping, typically
study + home + room
annex_df <- annex(RH + T + CO2 ~ datetime | study + home + room,
data = prepared_df, tz = "Europe/Berlin")
head(annex_df)
## datetime study home room year month tod RH T CO2
## 1 2011-10-01 02:05:00 demo General AMB 2011 10 23-07 88.4 13.35 NA
## 2 2011-10-01 02:10:00 demo General AMB 2011 10 23-07 88.9 13.25 NA
## 3 2011-10-01 02:15:00 demo General AMB 2011 10 23-07 89.2 13.17 NA
## 4 2011-10-01 02:20:00 demo General AMB 2011 10 23-07 90.2 13.00 NA
## 5 2011-10-01 02:25:00 demo General AMB 2011 10 23-07 90.9 12.86 NA
## 6 2011-10-01 02:30:00 demo General AMB 2011 10 23-07 90.7 12.80 NA
class(annex_df)
## [1] "annex" "data.frame"
A series of S3 methods exist for annex
objects which
might be extended in the future.
Calculating statistics
Based on the object returned by annex()
the analysis can
be performed by calling annex_stats()
. The function
aggregates the data based on the formula provided above, calculates a
series of statistical properties, and returns an object of class
annex_stats
.
head(annex_df)
## datetime study home room year month tod RH T CO2
## 1 2011-10-01 02:05:00 demo General AMB 2011 10 23-07 88.4 13.35 NA
## 2 2011-10-01 02:10:00 demo General AMB 2011 10 23-07 88.9 13.25 NA
## 3 2011-10-01 02:15:00 demo General AMB 2011 10 23-07 89.2 13.17 NA
## 4 2011-10-01 02:20:00 demo General AMB 2011 10 23-07 90.2 13.00 NA
## 5 2011-10-01 02:25:00 demo General AMB 2011 10 23-07 90.9 12.86 NA
## 6 2011-10-01 02:30:00 demo General AMB 2011 10 23-07 90.7 12.80 NA
stats <- annex_stats(annex_df, format = "long")
head(stats)
## study home room year month tod variable stats value
## 1 demo General AMB 2011 10 all RH quality_lower 0
## 2 demo General AMB 2011 10 all RH quality_upper 0
## 3 demo General AMB 2011 10 all RH quality_start 15248
## 4 demo General AMB 2011 10 all RH quality_end 15253
## 5 demo General AMB 2011 10 all RH interval_Min 300
## 6 demo General AMB 2011 10 all RH interval_Q1 300
By default, the argument format is set to "wide"
which
will return the statistics in a wide format, i.e. all calculated values
as columns while format = "long"
will create one line for
every calculated value. The chosen format (long or wide) does not matter
for the further analysis but a particular format may be convenient when
processed manually.