Calculate Statistics on Annex object
Arguments
- object
an object of class
annex
.- format
character, either
"wide"
(default) or"long"
.- ...
currently unused.
- probs
NULL
(default; see Details) or a numeric vector of probabilities with values in[0,1]
(Values will be rounded to closest 3 digits).
Details
The function allows to return the statistics in a wide format or long format.
Both can be used when calling annex_write_stats()
, but he long/wide
format can be handy fur custom applications (e.g., plotting, ...).
Argument probs
will be forwarded to the stats::quantile()
function.
If probs = NULL
(default) the empirical quantiles will be calculated
from 0
(the minimum) up to 1
(the maximum) in an interval of
0.01
(one percent steps), including quantiles 0.005
,
0.025
, 0.975
and 0.995
. Can be specified differently
by the user if needed, however, this no longer yields the standard statistics
and the validation will report a problem.
Statistics
Grouping: Statistics are calculated on different subsets (or groups),
typically study
, home
, room
, year
, month
,
tod
(time of day). However, this set can vary depending on the users
function call to annex
(see argument formula
).
annex_stats
calculates a series of data/quality flags as well as statistical
measures.
Quality: quality_lower
and quality_upper
contain the fraction of
observations (in percent) falling below the lower and upper defined threshold
(see annex_variable_definition
).
quality_start
and quality_end
contain the day (date only)
where the first non-missing observation was given for the current group; used to
estimate Nestim
(see below).
Interval: Time increments of all non-missing observations are calculated in seconds.
The interval_
columns show the five digit summary plus the arithmetic mean of these
intervals. interval_Median
is used to calculate estimate Nestim
(see below).
Nestim: Number of estimated observations (see section below)
N: Number of non-missing observations
NAs: Number of missing observations (NA
in the data set)
Mean: $$\bar{x} = \frac{1}{N} \sum_{i = 1}^N x_i$$ (arithmetic mean)
Sd: $$\text{sd}(x) = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^N \big( (x_i - \bar{x})^2\big)}$$
p: Probabilites for different quantiles. p00
represents the overall minimum,
p50
the median, p100
the overall maximum of all non-missing values. Uses
the empirical quantile function with type = 7
(default; see quantile
).
Note: If N - NAs
lower than 30, both Mean
and Sd
will be set to NA
!
Estimated number of observations
The value Nestim
contains an estimate for the number of possible observations
for a specific group. This estimate is based on the first/last date an observation
was available (non-missing) as well as the year
, month
, and tod
. Last but not least
the interval_Median
is used.
As an example: Imagine the statistics for temperature observations for one speicifc
year and month (monthly level aggregation) with tod = "07-23"
. The first non-missing
value has been reported on the first day of the month, the last one on day 10.
Given that tod = "07-23"
covers 16 hours, this indicates that observations could
be available 16 hours over 10 days = 160 hours in total. Based on the best guess
for interval_median
this allows to calculate Nestim
. E.g., if the median interval
is 300 (300 seconds = 5 minutes) this would leas to a possible number of observations
Nestim = 10 days * 16 hours per day * 3600 seonds per hour / 300 seconds = 1920
.
Keep in mind that this is only an estimate or best guess!