@@ -76,7 +76,7 @@ Data were collected at the Dairy Campus research facilities of Wageningen Univer
This study aimed at developing algorithms to derive lying behavior from UWB positioning data time series. As continuous visual observation of the animals is too labor intensive over a longer period of time, the lying bouts returned by the IceQube accelerometers (\textbf{IQB}) were used as the 'reference' gold standard for lying behavior. Despite we know this sensor-based measure is not 100\% precise, it allows to include multiple cows simultaneously and it has been shown to have sufficient accuracy to detect the actual lying behavior {\color{b}[refs]}. For each cow, the time a lying down or getting up event was registered was retrieved from the IceQube software. The data were visually assessed to verify time synchronization and cow identity across the different sensor systems. Only data for which UWB and IceQube data overlapped were retained, and more specifically, for each cow, data were kept from the first lying bout onward until the end of the last lying bout registered, such that the analysis was carried out on the data for which accelerometer data were certainly attached to the animals, and a lack of lying bout registrations was not caused by cows not wearing a sensor.
\subsection{Ultra-wide band data editing}
Raw binary data were extracted from daily Tracklab back-up files (.tlp) (Noldus, Wageningen, the Netherlands) and converted with Python 3.0 into \textit{(x,y,z)}-position time series containing one measurement per second per cow. All further data processing was done using Matlab 2018b (The MathWorks, Inc., Natick, Massachusetts, USA). The \textit{(x,y,z)}-position was expressed relative to a pre-specified origin \textit{(x,y,z)=(0,0,0)}. In the barns at Dairy Campus, the \textit{(x)} coordinate gives the position in the direction of the feeding racks (range 0 to 23m in the and 23 to 46m), whereas the \textit{(y)} coordinate represents the position perpendicular to the feeding alley (range 0 to 14m). The \textit{(z)} position can be considered the height of the tag on the neck collar. To interpret the raw position time series and derive cow behavior from them, multiple data editing steps were needed to deal with noise and missing data. First of all, outliers indicating a position outside the barn edges were replaced with the edge value when it were single measurements originating from normal measurement inaccuracy. When multiple successive measurements were registered out of the barn edges, they probably resulted from a lost tag and they were replaced by missing values. Secondly, based on a data exploration step not further detailed in this paper, a methodology to deal with missing data was developed, in which how we dealt with the missing data depended on (1) the gap size and (2) the amount of non-missing data in predefined window preceding the gap. Missing data always occurred at cow/measurement level, i.e., if data were missing both the \textit{(x,y)} and \textit{z} position lacked. For gaps smaller than 60 seconds we assumed that the cow's behavior would remain constant or the error made when this assumption was untrue would be negligible, and a Bayesian data imputation was applied. To this end, the missing data was simulated by sampling from a normal distribution with mean and standard deviation calculated from the data preceding the gap in a window of twice the gap size in each dimension. For gaps between 60 and 180 seconds, making assumptions on the consistency of the behavior was more tricky but these gaps could still be due to failure of the sensor system. For these gaps, we used a simple linear interpolation. Missing data in gaps longer than 180 seconds were left without data, as these often resulted from the animals not being in the barn e.g. during milking. Assumptions on these gaps could not be made and were not of interest for this study. A third data editing step consisted in smoothing the \textit{x}, \textit{y} and \textit{z} data with a moving median filter in a window of 45 seconds. In order to make sensible assumptions for the settings of the changepoint analysis, data of each cow-day was analysed separately.
Raw binary data were extracted from daily Tracklab back-up files (.tlp) (Noldus, Wageningen, the Netherlands) and converted with Python 3.0 into \textit{(x,y,z)}-position time series containing one measurement per second per cow. All further data processing was done using Matlab 2018b (The MathWorks, Inc., Natick, Massachusetts, USA). The \textit{(x,y,z)}-position was expressed relative to a pre-specified origin \textit{(x,y,z)=(0,0,0)}. In the barns at Dairy Campus, the \textit{(x)} coordinate gives the position in the direction of the feeding racks (range 0 to 23m in the and 23 to 46m), whereas the \textit{(y)} coordinate represents the position perpendicular to the feeding alley (range 0 to 14m). The \textit{(z)} position can be considered the height of the tag on the neck collar. To interpret the raw position time series and derive cow behavior from them, multiple data editing steps were needed to deal with noise and missing data. First of all, outliers indicating a position outside the barn edges were replaced with the edge value when it were single measurements originating from normal measurement inaccuracy. When multiple successive measurements were registered out of the barn edges, they probably resulted from a lost tag and they were replaced by missing values. Secondly, based on a data exploration step not further detailed in this paper, a methodology to deal with missing data was developed, in which how we dealt with the missing data depended on (1) the gap size and (2) the amount of non-missing data in predefined window preceding the gap. Missing data always occurred at cow/measurement level, i.e., if data were missing both the \textit{(x,y)} and \textit{z} position lacked. For gaps smaller than 60 seconds we assumed that the cow's behavior would remain constant or the error made when this assumption was untrue would be negligible, and a Bayesian data imputation was applied. To this end, the missing data was simulated by sampling from a normal distribution with mean and standard deviation calculated from the data preceding the gap in a window of twice the gap size in each dimension. For gaps between 60 and 180 seconds, making assumptions on the consistency of the behavior was more tricky but these gaps could still be due to failure of the sensor system. For these gaps, we used a simple linear interpolation. Missing data in gaps longer than 180 seconds were left without data, as these often resulted from the animals not being in the barn e.g. during milking. Assumptions on these gaps could not be made and were not of interest for this study. A third data editing step consisted in smoothing the \textit{x}, \textit{y} and \textit{z} data with a moving median filter in a window of 45 seconds. In order to make sensible assumptions for the settings of the changepoint analysis, data of each cow-day was analyzed separately.