Calculate statistics

Introduction

Once the data is prepared, the analysis can be performed. This article continues the example used in the article Prepare data from XLSX but works the very same way with the other examples included in the documentation, e.g., Prepare data from textfiles.

Please have a look at this article for more information on what the following lines of code do and/or to get the "demo_UIBK.xlsx" file to follow this article.

library("readxl")
# Importing data and config object (meta information)
raw_df <- read_excel("demo_UIBK.xlsx", sheet = "measurements")
config <- read_excel("demo_UIBK.xlsx", sheet = "annex_configuration")
config <- subset(config, process == TRUE) # Custom subsetting

# Prepare data for annex()
library("annex")
prepared_df <- annex_prepare(raw_df, config, quiet = TRUE)
head(prepared_df, n = 3)

##              datetime study    home room   RH SolRad     T CO2 Other
## 1 2011-10-01 00:05:00  demo General  AMB 88.4      3 13.35  NA    NA
## 2 2011-10-01 00:10:00  demo General  AMB 88.9      3 13.25  NA    NA
## 3 2011-10-01 00:15:00  demo General  AMB 89.2      3 13.17  NA    NA

Performing the analysis

Once the data set is prepared properly (note that annex_prepare() is a convenience function, can also be done manually) the final object can be prepared.

Prepare `annex` object

annex() is the creator function which creates an object of class annex (S3) providing a series of methods and functions to conduct the final analysis. More information on object orientation system S3 can be found e.g. here

The function expects a formula as input which describes how to process the data. The three parts of the formula are:

<measurements to be processed> ~ <datetime> | <grouping variables>
The first part defines which variables (measurements) should be processed
Part two is always ~ datetime; the date and time information for the statistics
Part three the grouping, typically study + home + room

annex_df <- annex(RH + T + CO2 ~ datetime | study + home + room,
                  data = prepared_df, tz = "Europe/Berlin")
head(annex_df)

##              datetime study    home room year month   tod   RH     T CO2
## 1 2011-10-01 02:05:00  demo General  AMB 2011    10 23-07 88.4 13.35  NA
## 2 2011-10-01 02:10:00  demo General  AMB 2011    10 23-07 88.9 13.25  NA
## 3 2011-10-01 02:15:00  demo General  AMB 2011    10 23-07 89.2 13.17  NA
## 4 2011-10-01 02:20:00  demo General  AMB 2011    10 23-07 90.2 13.00  NA
## 5 2011-10-01 02:25:00  demo General  AMB 2011    10 23-07 90.9 12.86  NA
## 6 2011-10-01 02:30:00  demo General  AMB 2011    10 23-07 90.7 12.80  NA

class(annex_df)

## [1] "annex"      "data.frame"

A series of S3 methods exist for annex objects which might be extended in the future.

summary()
head()/tail()
is.regular()
plot()

Calculating statistics

Based on the object returned by annex() the analysis can be performed by calling annex_stats(). The function aggregates the data based on the formula provided above, calculates a series of statistical properties, and returns an object of class annex_stats.

head(annex_df)

##              datetime study    home room year month   tod   RH     T CO2
## 1 2011-10-01 02:05:00  demo General  AMB 2011    10 23-07 88.4 13.35  NA
## 2 2011-10-01 02:10:00  demo General  AMB 2011    10 23-07 88.9 13.25  NA
## 3 2011-10-01 02:15:00  demo General  AMB 2011    10 23-07 89.2 13.17  NA
## 4 2011-10-01 02:20:00  demo General  AMB 2011    10 23-07 90.2 13.00  NA
## 5 2011-10-01 02:25:00  demo General  AMB 2011    10 23-07 90.9 12.86  NA
## 6 2011-10-01 02:30:00  demo General  AMB 2011    10 23-07 90.7 12.80  NA

stats <- annex_stats(annex_df, format = "long")
head(stats)

##   study    home room year month tod variable         stats value
## 1  demo General  AMB 2011    10 all       RH quality_lower     0
## 2  demo General  AMB 2011    10 all       RH quality_upper     0
## 3  demo General  AMB 2011    10 all       RH quality_start 15248
## 4  demo General  AMB 2011    10 all       RH   quality_end 15253
## 5  demo General  AMB 2011    10 all       RH  interval_Min   300
## 6  demo General  AMB 2011    10 all       RH   interval_Q1   300

By default, the argument format is set to "wide" which will return the statistics in a wide format, i.e. all calculated values as columns while format = "long" will create one line for every calculated value. The chosen format (long or wide) does not matter for the further analysis but a particular format may be convenient when processed manually.

Next steps

After calculating the statistics, the following steps can be performed:

Reto Stauffer

Introduction

Performing the analysis

Prepare annex object

Calculating statistics

Next steps

Prepare `annex` object