Skip to content
Snippets Groups Projects
Commit 7a423bb9 authored by Dainius's avatar Dainius
Browse files

Add example of how to use a custom function to summarise columns

parent d64fcbec
Branches
No related tags found
No related merge requests found
%% Cell type:code id: tags:
``` R
# Define sameple data (you'd import via read.delim() instead)
(my.data = data.frame(Plot=c("FF1", "FF1", "DB1", "DB1"), Type=c("FF", "FF", "DB", "DB"), Trial=c("1","2","1","2"), D1=1:4, D2=seq(10, by=5, length.out=4), D3=30:33))
```
%% Output
| Plot | Type | Trial | D1 | D2 | D3 |
|---|---|---|---|---|---|
| FF1 | FF | 1 | 1 | 10 | 30 |
| FF1 | FF | 2 | 2 | 15 | 31 |
| DB1 | DB | 1 | 3 | 20 | 32 |
| DB1 | DB | 2 | 4 | 25 | 33 |
\begin{tabular}{r|llllll}
Plot & Type & Trial & D1 & D2 & D3\\
\hline
FF1 & FF & 1 & 1 & 10 & 30 \\
FF1 & FF & 2 & 2 & 15 & 31 \\
DB1 & DB & 1 & 3 & 20 & 32 \\
DB1 & DB & 2 & 4 & 25 & 33 \\
\end{tabular}
%% Cell type:code id: tags:
``` R
# Our custom col means; the input is a chunk of your data.frame with a particular Plot value (so same columns but fewer rows)
CustomColMeans = function(df)
{
# Run colMeans over that part of our data, but only on the columns from D1 to D3 (others are not numeric)
Means = colMeans(df[, which(names(df)=="D1"):which(names(df)=="D3")])
# For the rest of the variables, take the first occurrence
OtherCols = df[1, c("Plot", "Type")]
# Merge the means with the other columns so we don't lose them.
# Also we need to transpose Means with t(), so that it's treated as rows and not a column.
# And add something to Trial indicate that the trial is actually the mean of the trials.
Result = cbind(OtherCols, Trial="Mean", t(Means))
}
# Run the custom function, and turn the result into a data.frame using Reduce(rbind, ...)
(my.data.means = Reduce(rbind, by(my.data, my.data$Plot, CustomColMeans)))
```
%% Output
| <!--/--> | Plot | Type | Trial | D1 | D2 | D3 |
|---|---|---|---|---|---|---|
| 3 | DB1 | DB | Mean | 3.5 | 22.5 | 32.5 |
| 1 | FF1 | FF | Mean | 1.5 | 12.5 | 30.5 |
\begin{tabular}{r|llllll}
& Plot & Type & Trial & D1 & D2 & D3\\
\hline
3 & DB1 & DB & Mean & 3.5 & 22.5 & 32.5\\
1 & FF1 & FF & Mean & 1.5 & 12.5 & 30.5\\
\end{tabular}
%% Cell type:code id: tags:
``` R
# Merge our means with the original data
(my.data.combined = rbind(my.data, my.data.means))
```
%% Output
| <!--/--> | Plot | Type | Trial | D1 | D2 | D3 |
|---|---|---|---|---|---|---|
| 1 | FF1 | FF | 1 | 1.0 | 10.0 | 30.0 |
| 2 | FF1 | FF | 2 | 2.0 | 15.0 | 31.0 |
| 3 | DB1 | DB | 1 | 3.0 | 20.0 | 32.0 |
| 4 | DB1 | DB | 2 | 4.0 | 25.0 | 33.0 |
| 31 | DB1 | DB | Mean | 3.5 | 22.5 | 32.5 |
| 11 | FF1 | FF | Mean | 1.5 | 12.5 | 30.5 |
\begin{tabular}{r|llllll}
& Plot & Type & Trial & D1 & D2 & D3\\
\hline
1 & FF1 & FF & 1 & 1.0 & 10.0 & 30.0\\
2 & FF1 & FF & 2 & 2.0 & 15.0 & 31.0\\
3 & DB1 & DB & 1 & 3.0 & 20.0 & 32.0\\
4 & DB1 & DB & 2 & 4.0 & 25.0 & 33.0\\
31 & DB1 & DB & Mean & 3.5 & 22.5 & 32.5\\
11 & FF1 & FF & Mean & 1.5 & 12.5 & 30.5\\
\end{tabular}
%% Cell type:code id: tags:
``` R
# Make it into a long format
my.data.long = reshape2::melt(my.data.combined, variable.name="Depth", value.name="Resistance")
# Make depth numeric
my.data.long$Depth = substr(my.data.long$Depth, 2, 3)
# Check it
my.data.long
```
%% Output
Using Plot, Type, Trial as id variables
| Plot | Type | Trial | Depth | Resistance |
|---|---|---|---|---|
| FF1 | FF | 1 | 1 | 1.0 |
| FF1 | FF | 2 | 1 | 2.0 |
| DB1 | DB | 1 | 1 | 3.0 |
| DB1 | DB | 2 | 1 | 4.0 |
| DB1 | DB | Mean | 1 | 3.5 |
| FF1 | FF | Mean | 1 | 1.5 |
| FF1 | FF | 1 | 2 | 10.0 |
| FF1 | FF | 2 | 2 | 15.0 |
| DB1 | DB | 1 | 2 | 20.0 |
| DB1 | DB | 2 | 2 | 25.0 |
| DB1 | DB | Mean | 2 | 22.5 |
| FF1 | FF | Mean | 2 | 12.5 |
| FF1 | FF | 1 | 3 | 30.0 |
| FF1 | FF | 2 | 3 | 31.0 |
| DB1 | DB | 1 | 3 | 32.0 |
| DB1 | DB | 2 | 3 | 33.0 |
| DB1 | DB | Mean | 3 | 32.5 |
| FF1 | FF | Mean | 3 | 30.5 |
\begin{tabular}{r|lllll}
Plot & Type & Trial & Depth & Resistance\\
\hline
FF1 & FF & 1 & 1 & 1.0\\
FF1 & FF & 2 & 1 & 2.0\\
DB1 & DB & 1 & 1 & 3.0\\
DB1 & DB & 2 & 1 & 4.0\\
DB1 & DB & Mean & 1 & 3.5\\
FF1 & FF & Mean & 1 & 1.5\\
FF1 & FF & 1 & 2 & 10.0\\
FF1 & FF & 2 & 2 & 15.0\\
DB1 & DB & 1 & 2 & 20.0\\
DB1 & DB & 2 & 2 & 25.0\\
DB1 & DB & Mean & 2 & 22.5\\
FF1 & FF & Mean & 2 & 12.5\\
FF1 & FF & 1 & 3 & 30.0\\
FF1 & FF & 2 & 3 & 31.0\\
DB1 & DB & 1 & 3 & 32.0\\
DB1 & DB & 2 & 3 & 33.0\\
DB1 & DB & Mean & 3 & 32.5\\
FF1 & FF & Mean & 3 & 30.5\\
\end{tabular}
%% Cell type:code id: tags:
``` R
# Plot
library(ggplot2)
# "group" is necessary for geom_line() as well, for some reason it doesn't take them from colour
ggplot(my.data.long, aes(Resistance, Depth, colour=Trial, group=Trial)) + geom_line() + facet_wrap(vars(Type))
```
%% Output
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment