Title: | Parametric Time Warping |
---|---|
Description: | Parametric Time Warping aligns patterns, i.e. it aims to put corresponding features at the same locations. The algorithm searches for an optimal polynomial describing the warping. It is possible to align one sample to a reference, several samples to the same reference, or several samples to several references. One can choose between calculating individual warpings, or one global warping for a set of samples and one reference. Two optimization criteria are implemented: RMS (Root Mean Square error) and WCC (Weighted Cross Correlation). Both warping of peak profiles and of peak lists are supported. A vignette for the latter is contained in the inst/doc directory of the source package - the vignette source can be found on the package github site. |
Authors: | Jan Gerretzen [ctb], Paul Eilers [aut], Hans Wouters [ctb], Tom Bloemberg [aut], Ron Wehrens [aut, cre] |
Maintainer: | Ron Wehrens <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.9-16 |
Built: | 2024-11-07 03:56:07 UTC |
Source: | https://github.com/rwehrens/ptw |
Parametric Time Warping aligns patterns, i.e. it aims to put corresponding features at the same locations. The algorithm searches for an optimal polynomial describing the warping. It is possible to align one sample to a reference, several samples to the same reference, or several samples to several references. One can choose between calculating individual warpings, or one global warping for a set of samples and one reference. Two optimization criteria are implemented: RMS (Root Mean Square error) and WCC (Weighted Cross Correlation). Both warping of peak profiles and of peak lists are supported. A vignette for the latter is contained in the inst/doc directory of the source package - the vignette source can be found on the package github site.
The DESCRIPTION file:
Package: | ptw |
Type: | Package |
Title: | Parametric Time Warping |
Version: | 1.9-16 |
Authors@R: | c(person("Jan", "Gerretzen", role = "ctb"), person("Paul", "Eilers", role = "aut"), person("Hans", "Wouters", role = "ctb"), person("Tom", "Bloemberg", role = "aut", email = "[email protected]"), person("Ron", "Wehrens", role = c("aut", "cre"), email = "[email protected]")) |
Description: | Parametric Time Warping aligns patterns, i.e. it aims to put corresponding features at the same locations. The algorithm searches for an optimal polynomial describing the warping. It is possible to align one sample to a reference, several samples to the same reference, or several samples to several references. One can choose between calculating individual warpings, or one global warping for a set of samples and one reference. Two optimization criteria are implemented: RMS (Root Mean Square error) and WCC (Weighted Cross Correlation). Both warping of peak profiles and of peak lists are supported. A vignette for the latter is contained in the inst/doc directory of the source package - the vignette source can be found on the package github site. |
URL: | https://github.com/rwehrens/ptw |
License: | GPL (>= 2) |
Imports: | RcppDE, graphics, grDevices, stats |
Repository: | https://rwehrens.r-universe.dev |
RemoteUrl: | https://github.com/rwehrens/ptw |
RemoteRef: | HEAD |
RemoteSha: | fd5e4776a1be6f61056decd37fde0ef8c88cdcca |
Author: | Jan Gerretzen [ctb], Paul Eilers [aut], Hans Wouters [ctb], Tom Bloemberg [aut], Ron Wehrens [aut, cre] |
Maintainer: | Ron Wehrens <[email protected]> |
Jan Gerretzen [ctb], Paul Eilers [aut], Hans Wouters [ctb], Tom Bloemberg [aut], Ron Wehrens [aut, cre]
Maintainer: Ron Wehrens <[email protected]>
@ArticleBloemberg2010, title = Improved Parametric Time Warping for Proteomics, author = Tom G. Bloemberg and Jan Gerretzen and Hans J. P. Wouters and Jolein Gloerich and Maurice van Dael and Hans J. C. T. Wessels and Lambert P. van den Heuvel and Paul H. C. Eilers and Lutgarde M. C. Buydens and Ron Wehrens, year = 2010, journal = Chemometrics and Intelligent Laboratory Systems, volume = 104, number = 1, pages = 65-74,
ArticleWehrens2015, title = Fast parametric warping of peak lists, author = Ron Wehrens and Tom G. Bloemberg and Paul H. C. Eilers, year = 2015, volume = 31, pages = 3063–3065, journal = Bioinformatics, doi = 10.1093/bioinformatics/btv299,
Estimates a trend based on asymmetric least squares. In this case used to estimate the baseline of a given spectrum.
asysm(y, lambda = 1e+07, p = 0.001, eps = 1e-8, maxit = 25)
asysm(y, lambda = 1e+07, p = 0.001, eps = 1e-8, maxit = 25)
y |
data: either a vector or a data matrix containing spectra as rows |
lambda |
smoothing parameter (generally 1e5 - 1e8) |
p |
asymmetry parameter |
eps |
numerical precision for convergence |
maxit |
max number of iterations. If no convergence is reached, a warning is issued. |
Asymmetric least squares (not to be confused with alternating least squares) assigns different weights to the data points that are above and below an iteratively estimated trendline, respectively. In this case, the asymmetry parameter p (0 <= p <= 1) is the weight for points above the trendline, whereas 1-p is the weight for points below it. Naturally, p should be small for estimating baselines. The parameter lambda controls the amount of smoothing: the larger it is, the smoother the trendline will be.
An estimated baseline
Paul Eilers, Jan Gerretzen
Eilers, P.H.C. Eilers, P.H.C. (2004) "Parametric Time Warping", Analytical Chemistry, 76 (2), 404 – 411.
Boelens, H.F.M., Eilers, P.H.C., Hankemeier, T. (2005) "Sign constraints improve the detection of differences between complex spectral data sets: LC-IR as an example", Analytical Chemistry, 77, 7998 – 8007.
data(gaschrom) plot(gaschrom[1,], type = "l", ylim = c(0, 100)) lines(asysm(gaschrom[1,]), col = 2)
data(gaschrom) plot(gaschrom[1,], type = "l", ylim = c(0, 100)) lines(asysm(gaschrom[1,]), col = 2)
This function estimates a baseline using asymmetric least squares and subtracts it from the data.
baseline.corr(y, ...)
baseline.corr(y, ...)
y |
signal(s) to correct. This can be a vector (containing one signal) or a matrix of signals(one signal per row) |
... |
other arguments to the |
ycorr |
baseline corrected signal(s): a vector or a matrix of the same dimension as the input signal(s) |
Paul Eilers, Jan Gerretzen
Eilers, P.H.C. (2004) "Parametric Time Warping", Analytical Chemistry, 76 (2), 404 – 411.
data(gaschrom) plot(gaschrom[1,], type = "l", ylim = c(0, 100)) lines(baseline.corr(gaschrom[1,]), col = 2)
data(gaschrom) plot(gaschrom[1,], type = "l", ylim = c(0, 100)) lines(baseline.corr(gaschrom[1,]), col = 2)
This function calculates a similarity matrix and returns the sample number that is most similar to all other samples. This is possibly preferable as a reference sample since warping then may be kept to a minimum. Either RMS or WCC may be used as similarity functions.
bestref(x, optim.crit = c("WCC", "RMS"), trwdth=20, wghts = NULL, smooth.param = 0)
bestref(x, optim.crit = c("WCC", "RMS"), trwdth=20, wghts = NULL, smooth.param = 0)
x |
data matrix or array of signals, specified row-wise. In case of an array, the third dimension should differentiate between the different samples |
optim.crit |
either |
trwdth |
the width of the triangle in the WCC criterion, given as a number of data points. Default: 20 |
wghts |
Optional weights vector in the WCC criterion; will be calculated from the triangle width if necessary. Sometimes it is more efficient to pre-calculate it and give it as an argument |
smooth.param |
smoothing parameter for smoothing the signal when
|
A list containing two elements:
best.ref |
the index of the best reference(s) |
crit.values |
the qualities as measured by either RMS or WCC |
Jan Gerretzen, Ron Wehrens
data(gaschrom) bestref(gaschrom) bestref(gaschrom, optim.crit = "WCC", trwdth = 50) bestref(gaschrom, optim.crit = "RMS") bestref(gaschrom, optim.crit = "RMS", smooth.param = 1e5)
data(gaschrom) bestref(gaschrom) bestref(gaschrom, optim.crit = "WCC", trwdth = 50) bestref(gaschrom, optim.crit = "RMS") bestref(gaschrom, optim.crit = "RMS", smooth.param = 1e5)
Applying two (or more) warping function after each other can be described with one warping function of a higher warping degree. This function provides the coefficients of this higher degree warping function.
calc.multicoef(coef1, coef2)
calc.multicoef(coef1, coef2)
coef1 |
vector containing the warping coefficients of the first applied warping function |
coef2 |
vector containing the warping coefficients of the second applied warping function |
This function uses Pascal's simplex to calculate the new warping coefficients.
When applying three warping functions successively (first a, then b and
finally c - here a, b and c are vectors of warping coefficients), first
calculate the new coefficients for b and c, and afterwards the
coefficients for a with these new coefficients. So the coefficients for
the total warping function can be calculated via calc.multicoef(a,
calc.multicoef(b, c))
.
a vector containing the corrected warping coefficients
Jan Gerretzen
Bloemberg, T.G., et al. (2010) "Improved parametric time warping for Proteomics", Chemometrics and Intelligent Laboratory Systems, 104 (1), 65 – 74.
data(gaschrom) ref <- gaschrom[1,] samp <- gaschrom[16,] coef1 <- c(100,1.1, 1e-5) coef2 <- c(25, 0.95, 3.2e-5) gaschrom.ptw <- ptw(ref, samp, init.coef = coef1, try = TRUE) ref.w <- gaschrom.ptw$reference samp.w <- gaschrom.ptw$warped.sample samp.w[is.na(samp.w)] <- 0 gaschrom.ptw2 <- ptw(ref.w, samp.w, init.coef = coef2, try = TRUE) plot(c(gaschrom.ptw2$warped.sample), type = "l") corr.coef <- calc.multicoef(coef1, coef2) gaschrom.ptw3 <- ptw(ref, samp, init.coef = corr.coef, try = TRUE) lines(c(gaschrom.ptw3$warped.sample), col = 2, lty = 2)
data(gaschrom) ref <- gaschrom[1,] samp <- gaschrom[16,] coef1 <- c(100,1.1, 1e-5) coef2 <- c(25, 0.95, 3.2e-5) gaschrom.ptw <- ptw(ref, samp, init.coef = coef1, try = TRUE) ref.w <- gaschrom.ptw$reference samp.w <- gaschrom.ptw$warped.sample samp.w[is.na(samp.w)] <- 0 gaschrom.ptw2 <- ptw(ref.w, samp.w, init.coef = coef2, try = TRUE) plot(c(gaschrom.ptw2$warped.sample), type = "l") corr.coef <- calc.multicoef(coef1, coef2) gaschrom.ptw3 <- ptw(ref, samp, init.coef = corr.coef, try = TRUE) lines(c(gaschrom.ptw3$warped.sample), col = 2, lty = 2)
This function calculates the warping coefficients for the original range of the data, based on the warping of zero-filled data. Only needed when zeros are added in the beginning of the signal.
calc.zerocoef(coef, zeros)
calc.zerocoef(coef, zeros)
coef |
vector of warping coefficients of a PTW-calculation on a set of signals with zeros added to the beginning of the signal |
zeros |
the number of zeros added |
a vector containing the corrected warping coefficients
Jan Gerretzen
Bloemberg, T.G., et al. (2010) "Improved parametric time warping for Proteomics", Chemometrics and Intelligent Laboratory Systems, 104 (1), 65 – 74.
data(gaschrom) gaschrom.zf <- padzeros(gaschrom, 250) ref <- gaschrom[1,] samp <- gaschrom[16,] ref.zf <- gaschrom.zf[1,] samp.zf <- gaschrom.zf[16,] gaschrom.ptw <- ptw(ref.zf, samp.zf) layout(matrix(1:2,2,1, byrow=TRUE)) plot(gaschrom.ptw) corr.coef <- calc.zerocoef(gaschrom.ptw$warp.coef, 250) gaschrom.ptw2 <- ptw(ref, samp, init.coef = corr.coef, try = TRUE) plot(gaschrom.ptw2)
data(gaschrom) gaschrom.zf <- padzeros(gaschrom, 250) ref <- gaschrom[1,] samp <- gaschrom[16,] ref.zf <- gaschrom.zf[1,] samp.zf <- gaschrom.zf[16,] gaschrom.ptw <- ptw(ref.zf, samp.zf) layout(matrix(1:2,2,1, byrow=TRUE)) plot(gaschrom.ptw) corr.coef <- calc.zerocoef(gaschrom.ptw$warp.coef, 250) gaschrom.ptw2 <- ptw(ref, samp, init.coef = corr.coef, try = TRUE) plot(gaschrom.ptw2)
The CODA algorithm calculates a so-called MCQ (Mass Chromatogram Quality) value for every row of the input. High MCQ values correspond with those chromatograms not containing spikes and/or a baseline.
coda(x, window = 5, smoothing = c("median", "mean"))
coda(x, window = 5, smoothing = c("median", "mean"))
x |
data matrix containing chromatograms in the rows |
window |
width of the smoothing window |
smoothing |
type of smoothing: whether to use running means or running medians |
The MCQ value of a spectrum is the inner product between the standardized, smoothed chromatogram, and the length-scaled chromatogram. In literature, a cut-off of 0.85 has been reported to work well in selecting useful chromatograms, although this is strongly data-set dependent.
Windig, W., Phalp, J., Payna, A. (1996) "A noise and background reduction method for component detection in liquid chromatography/mass spectrometry", Analytical Chemistry, 68, 3602 – 3606.
data(gaschrom) coda(gaschrom)
data(gaschrom) coda(gaschrom)
This function smoothes signals with a finite difference penalty of order 2.
difsm(y, lambda)
difsm(y, lambda)
y |
signal to be smoothed: a vector |
lambda |
smoothing parameter: larger values lead to smoothing |
smoothed signal: a vector
Paul Eilers, Jan Gerretzen
Eilers, P.H.C. (2004) "Parametric Time Warping", Analytical Chemistry, 76 (2), 404 – 411.
Eilers, P.H.C. (2003) "A perfect smoother", Analytical Chemistry, 75, 3631 – 3636.
data(gaschrom) plot(gaschrom[1,], type = "l", ylim = c(0, 100)) lines(difsm(gaschrom[1,], lambda = 1e5), col = 2) lines(difsm(gaschrom[1,], lambda = 1e6), col = 3) lines(difsm(gaschrom[1,], lambda = 1e7), col = 4)
data(gaschrom) plot(gaschrom[1,], type = "l", ylim = c(0, 100)) lines(difsm(gaschrom[1,], lambda = 1e5), col = 2) lines(difsm(gaschrom[1,], lambda = 1e6), col = 3) lines(difsm(gaschrom[1,], lambda = 1e7), col = 4)
The object gaschrom
contains 16 calibration GC traces, measured
at 5,000 time points. A peak-picked version is available as object
gaschrom.st
(see example).
data(gaschrom)
data(gaschrom)
Claire Boucon, Unilever
Eilers, P.H.C. (2004) "Parametric Time Warping", Analytical Chemistry, 76 (2), 404 – 411.
data(gaschrom) ## the gaschrom.st object has been generated in the following way: ## Not run: pick.peaks <- function(x, span) { span.width <- span * 2 + 1 loc.max <- span.width + 1 - apply(embed(x, span.width), 1, which.max) loc.max[loc.max == 1 | loc.max == span.width] <- NA pks <- loc.max + 0:(length(loc.max)-1) pks <- pks[!is.na(pks)] pks.tab <- table(pks) pks.id <- as.numeric(names(pks.tab)[pks.tab > span]) cbind(rt = pks.id, I = x[pks.id]) } gaschrom <- t(apply(gaschrom, 1, baseline.corr)) gaschrom.st <- lapply(1:nrow(gaschrom), function(ii) pick.peaks(gaschrom[ii,], span = 11)) ## remove peaks with an intensity below 10 gaschrom.st <- lapply(gaschrom.st, function(pk) pk[pk[,2] >= 10,]) ## End(Not run) plot(gaschrom[1,], type = "l", xlim = c(3000, 3500), ylim = c(0, 200)) abline(h = 10, lty = 2, col = 2) abline(v = gaschrom.st[[1]], col = 4)
data(gaschrom) ## the gaschrom.st object has been generated in the following way: ## Not run: pick.peaks <- function(x, span) { span.width <- span * 2 + 1 loc.max <- span.width + 1 - apply(embed(x, span.width), 1, which.max) loc.max[loc.max == 1 | loc.max == span.width] <- NA pks <- loc.max + 0:(length(loc.max)-1) pks <- pks[!is.na(pks)] pks.tab <- table(pks) pks.id <- as.numeric(names(pks.tab)[pks.tab > span]) cbind(rt = pks.id, I = x[pks.id]) } gaschrom <- t(apply(gaschrom, 1, baseline.corr)) gaschrom.st <- lapply(1:nrow(gaschrom), function(ii) pick.peaks(gaschrom[ii,], span = 11)) ## remove peaks with an intensity below 10 gaschrom.st <- lapply(gaschrom.st, function(pk) pk[pk[,2] >= 10,]) ## End(Not run) plot(gaschrom[1,], type = "l", xlim = c(3000, 3500), ylim = c(0, 200)) abline(h = 10, lty = 2, col = 2) abline(v = gaschrom.st[[1]], col = 4)
The lcms
data consists of a 100 x 2000 x 3 array lcms
, a
vector time
of length 2000 and a vector mz
of length 100. The
LC-MS data in the array are a subset (samples 1, 2 and 5) of a larger
set measured on a tryptic digest of E. coli proteins (see
source
section). Separate objects mz
and rt
give
the values for the first two axis.
Peak picking leads to the object ldms.pks (see example section).
data(lcms)
data(lcms)
Nijmegen Proteomics Facility, Department of Laboratory Medicine, Radboud University Nijmegen Medical Centre. Data available (in different formats) at http://www.cac.science.ru.nl/research/data/ecoli/
Bloemberg, T.G., et al. (2010) "Improved parametric time warping for Proteomics", Chemometrics and Intelligent Laboratory Systems, 104 (1), 65 – 74.
## the lcms.pks object is generated in the following way: ## Not run: data(lcms) pick.peaks <- function(x, span) { span.width <- span * 2 + 1 loc.max <- span.width + 1 - apply(embed(x, span.width), 1, which.max) loc.max[loc.max == 1 | loc.max == span.width] <- NA pks <- loc.max + 0:(length(loc.max)-1) pks <- pks[!is.na(pks)] pks.tab <- table(pks) pks.id <- as.numeric(names(pks.tab)[pks.tab > span]) cbind(rt = pks.id, I = x[pks.id]) } ## bring all samples to the same scale, copied from ptw man page lcms.scaled <- aperm(apply(lcms, c(1,3), function(x) x/mean(x) ), c(2,1,3)) lcms.s.z <- aperm(apply(lcms.scaled, c(1,3), function(x) padzeros(x, 250) ), c(2,1,3)) lcms.pks <- lapply(1:3, function(ii) { lapply(1:nrow(lcms.s.z[,,ii]), function(jj) cbind("mz" = jj, pick.peaks(lcms.s.z[jj,,ii], 5))) }) ## End(Not run)
## the lcms.pks object is generated in the following way: ## Not run: data(lcms) pick.peaks <- function(x, span) { span.width <- span * 2 + 1 loc.max <- span.width + 1 - apply(embed(x, span.width), 1, which.max) loc.max[loc.max == 1 | loc.max == span.width] <- NA pks <- loc.max + 0:(length(loc.max)-1) pks <- pks[!is.na(pks)] pks.tab <- table(pks) pks.id <- as.numeric(names(pks.tab)[pks.tab > span]) cbind(rt = pks.id, I = x[pks.id]) } ## bring all samples to the same scale, copied from ptw man page lcms.scaled <- aperm(apply(lcms, c(1,3), function(x) x/mean(x) ), c(2,1,3)) lcms.s.z <- aperm(apply(lcms.scaled, c(1,3), function(x) padzeros(x, 250) ), c(2,1,3)) lcms.pks <- lapply(1:3, function(ii) { lapply(1:nrow(lcms.s.z[,,ii]), function(jj) cbind("mz" = jj, pick.peaks(lcms.s.z[jj,,ii], 5))) }) ## End(Not run)
stptw
.
Function pktab2mzchannel allows to split a list of peaks
into several sublists, for instance on the basis of m/z values. The
result can be aligned with stptw
. The peak list can be obtained
from packages like XCMS
. The reverse function,
mzchannel2pktab, simply gathers all peak positions in one matrix.
pktab2mzchannel(pktab, Ivalue = "maxo", masses = NULL, nMasses = 0, massDigits = 2) mzchannel2pktab(mzchannels)
pktab2mzchannel(pktab, Ivalue = "maxo", masses = NULL, nMasses = 0, massDigits = 2) mzchannel2pktab(mzchannels)
pktab |
a peak table as generated for example by XCMS. Necessary information: m/z value (column name "mz"), retention time (column name "rt") and intensity. |
Ivalue |
the name of the intensity value to be used. Default is "maxo", one of the columns generated by the XCMS package. |
masses |
a vector of specific masses |
nMasses |
an optional number limiting the number of mass
channels. When both |
massDigits |
number of significant mass digits - if no
|
mzchannels |
a list of peak matrices, e.g. the output of
|
Function pktab2mzchannel
returns a list of peak matrices; list
elements have the name of the mz value that they represent. Function
mzchannel2pktab
returns one peak matrix where all masses are in a
specific column.
Ron Wehrens
data(lcms) ## first couple of peaks in the first three channels (smallset <- lapply(lcms.pks[[1]][1:3], head)) ## all in one data matrix allpeaks <- mzchannel2pktab(smallset) ## and back again pktab2mzchannel(allpeaks, Ivalue = "I")
data(lcms) ## first couple of peaks in the first three channels (smallset <- lapply(lcms.pks[[1]][1:3], head)) ## all in one data matrix allpeaks <- mzchannel2pktab(smallset) ## and back again pktab2mzchannel(allpeaks, Ivalue = "I")
Adds zeros to the left side of a matrix or vector, to its right side, or to both sides.
padzeros(data, nzeros, side="both")
padzeros(data, nzeros, side="both")
data |
the original matrix or vector |
nzeros |
number of columns to add on one side |
side |
to which side to add the zeros - choose between " |
When data
is a numeric vector, it is converted to a matrix of a single row.
A matrix with the same number of rows as the original matrix, and extra columns containing zeros on the specified side or sides
Jan Gerretzen
Bloemberg, T.G., et al. (2010) "Improved parametric time warping for Proteomics", Chemometrics and Intelligent Laboratory Systems, 104 (1), 65 – 74.
data(lcms) lcms.z1 <- padzeros(lcms[75,,1], 250, side="left") lcms.z2 <- padzeros(lcms[75,,1], 250, side="right") lcms.z3 <- padzeros(lcms[75,,1], 250, side="both") zeros <- rep(0, 250) layout(matrix(1:4,2,2, byrow=TRUE)) plot(lcms[75,,1], type="l", main="Original signal") plot(as.vector(lcms.z1), type="l", main="Padzeros left side") points(1:250, zeros, col=2, lwd=0.08) plot(as.vector(lcms.z2), type="l", main="Padzeros right side") points(2001:2250, zeros, col=2, lwd=0.08) plot(as.vector(lcms.z3), type="l", main="Padzeros both sides") points(1:250, zeros, col=2, lwd=0.08) points(2251:2500, zeros, col=2, lwd=0.08)
data(lcms) lcms.z1 <- padzeros(lcms[75,,1], 250, side="left") lcms.z2 <- padzeros(lcms[75,,1], 250, side="right") lcms.z3 <- padzeros(lcms[75,,1], 250, side="both") zeros <- rep(0, 250) layout(matrix(1:4,2,2, byrow=TRUE)) plot(lcms[75,,1], type="l", main="Original signal") plot(as.vector(lcms.z1), type="l", main="Padzeros left side") points(1:250, zeros, col=2, lwd=0.08) plot(as.vector(lcms.z2), type="l", main="Padzeros right side") points(2001:2250, zeros, col=2, lwd=0.08) plot(as.vector(lcms.z3), type="l", main="Padzeros both sides") points(1:250, zeros, col=2, lwd=0.08) points(2251:2500, zeros, col=2, lwd=0.08)
The function plots a ptw
object. It shows either the original and
warped signals, or the warping function.
## S3 method for class 'ptw' plot(x, what = c("signal", "function"), type = c("simultaneous", "individual"), ask = TRUE, ...)
## S3 method for class 'ptw' plot(x, what = c("signal", "function"), type = c("simultaneous", "individual"), ask = TRUE, ...)
x |
object of class 'ptw' |
what |
|
type |
|
ask |
logical, whether to ask before showing a new plot |
... |
further arguments to the plotting functions |
Jan Gerretzen, Ron Wehrens, Tom Bloemberg
data(gaschrom) ref <- gaschrom[1:8,] samp <- gaschrom[9:16,] gaschrom.ptw <- ptw(ref, samp, warp.type = "individual", optim.crit = "RMS", init.coef = c(0, 1, 0)) plot(gaschrom.ptw) plot(gaschrom.ptw, what = "function")
data(gaschrom) ref <- gaschrom[1:8,] samp <- gaschrom[9:16,] gaschrom.ptw <- ptw(ref, samp, warp.type = "individual", optim.crit = "RMS", init.coef = c(0, 1, 0)) plot(gaschrom.ptw) plot(gaschrom.ptw, what = "function")
Given a ptw object, predict either the signal at a certain warped time, or the warped time itself.
## S3 method for class 'ptw' predict(object, newdata, what = c("response", "time"), RTref = NULL, ...)
## S3 method for class 'ptw' predict(object, newdata, what = c("response", "time"), RTref = NULL, ...)
object |
An object of class "ptw" |
newdata |
Optional vector or matrix of new data points. If
|
what |
Either "response", in which case the function returns the warped signal, or "time", and then the function returns the warped time axis. That is, the time point in the warped sample corresponding to the given time point in the original sample. |
RTref |
Optional vector of retention times in the reference. |
... |
Further arguments, at the moment not used. |
The function returns a matrix (possibly containing only one
row) of either warped time points or signals, warped according to the
warping function defined in object
. When warping signals
individually, predict.ptw
will check the dimension of
newdata
: if this is a vector or a matrix of one row, every
single warping function will be applied to the one row. If the
number of rows equals the number of warping functions, each row will
be warped with its corresponding function. If the number of rows does
not match the number of warping functions and is not equal to one, an
error is given.
Ron Wehrens
Eilers, P.H.C. "Parametric Time Warping." Anal. Chem., 2004, 76, 404-411
Bloemberg, T.G. et al. "Improved parametric time warping for proteomics." Chemom. Intell. Lab. Syst., 2010, 104, pp. 65-74
## educational example, contributed by zeehio (Sergio Oller) x1 <- c(rep(0, 5), 1,1,1, 20, 40, 20, 1, 1, 1, rep(0, 5)) x2 <- c(rep(0, 6), 1,1,1, 20, 40, 20, 1, 1, 1, rep(0, 4)) time <- 1:length(x1) ## get time-warped object. Default: 'forward' warping, also works ## with backward warping w1b <- ptw(ref = x1, samp = x2) ## predict intensities of object x2 after warping at the times used in x1 x2wb <- predict(w1b, newdata = x2, what = "response") ## predict times where the original elements of x2 will end up t2wb <- as.numeric(predict(w1b, newdata = time, what = "time")) graphics.off() par(mfrow = c(2,1)) plot(x1, type = "h", col = 2, lwd = 2, main = "Orig data") points(x2, type = "h", col = 4) plot(x1, type = "h", col = 2, lwd = 2, main = "Backward warping") points(c(x2wb), type = "h", col = 4) # what = "response" points(t2wb, x2, col = 4) # what = "time" ## more relevant example data(gaschrom) ## Global warping: all samples warped with the same function ref <- gaschrom[1,] samp <- gaschrom[14:16,] gp <- ptw(ref, samp, init.coef = c(0, 1), warp.type = "global") matplot(t(samp), type = "l", xlim = c(2200, 2400), lty = 1, col = 1:3) lines(ref, type = "l", col = "gray", lwd = 2) ## plot predicted warped signal directly matlines(t(predict(gp)), lty = 2, col = 1:3) ## plot original signal at warped time axis matlines(t(predict(gp, newdata = 2001:2600, what = "time")), t(samp[,2001:2600]), col = 1:3, lwd = 3, lty = 2) ## OK ## result: good alignment with ref, differences between three profiles persist ## Individual warping: all samples warped individually gp <- ptw(ref, samp, init.coef = c(0, 1), warp.type = "indiv") predict(gp, what = "time", newdata = 2001:2600) matplot(t(samp), type = "l", xlim = c(2200, 2400), lty = 1, col = 1:3) lines(ref, type = "l", col = "gray", lwd = 2) matlines(t(predict(gp, what = "time")), t(samp), col = 1:3, lty = 2) ## result: each individual profile is aligned to the ref ## How would samples 11:13 be warped using the coefficients from samples ## 14:16 (silly but just to make the point)? samp.pred <- predict(gp, what = "response", newdata = gaschrom[11:13,])
## educational example, contributed by zeehio (Sergio Oller) x1 <- c(rep(0, 5), 1,1,1, 20, 40, 20, 1, 1, 1, rep(0, 5)) x2 <- c(rep(0, 6), 1,1,1, 20, 40, 20, 1, 1, 1, rep(0, 4)) time <- 1:length(x1) ## get time-warped object. Default: 'forward' warping, also works ## with backward warping w1b <- ptw(ref = x1, samp = x2) ## predict intensities of object x2 after warping at the times used in x1 x2wb <- predict(w1b, newdata = x2, what = "response") ## predict times where the original elements of x2 will end up t2wb <- as.numeric(predict(w1b, newdata = time, what = "time")) graphics.off() par(mfrow = c(2,1)) plot(x1, type = "h", col = 2, lwd = 2, main = "Orig data") points(x2, type = "h", col = 4) plot(x1, type = "h", col = 2, lwd = 2, main = "Backward warping") points(c(x2wb), type = "h", col = 4) # what = "response" points(t2wb, x2, col = 4) # what = "time" ## more relevant example data(gaschrom) ## Global warping: all samples warped with the same function ref <- gaschrom[1,] samp <- gaschrom[14:16,] gp <- ptw(ref, samp, init.coef = c(0, 1), warp.type = "global") matplot(t(samp), type = "l", xlim = c(2200, 2400), lty = 1, col = 1:3) lines(ref, type = "l", col = "gray", lwd = 2) ## plot predicted warped signal directly matlines(t(predict(gp)), lty = 2, col = 1:3) ## plot original signal at warped time axis matlines(t(predict(gp, newdata = 2001:2600, what = "time")), t(samp[,2001:2600]), col = 1:3, lwd = 3, lty = 2) ## OK ## result: good alignment with ref, differences between three profiles persist ## Individual warping: all samples warped individually gp <- ptw(ref, samp, init.coef = c(0, 1), warp.type = "indiv") predict(gp, what = "time", newdata = 2001:2600) matplot(t(samp), type = "l", xlim = c(2200, 2400), lty = 1, col = 1:3) lines(ref, type = "l", col = "gray", lwd = 2) matlines(t(predict(gp, what = "time")), t(samp), col = 1:3, lty = 2) ## result: each individual profile is aligned to the ref ## How would samples 11:13 be warped using the coefficients from samples ## 14:16 (silly but just to make the point)? samp.pred <- predict(gp, what = "response", newdata = gaschrom[11:13,])
The main functions of the ptw
package, performing
parametric time warping of one or more samples. Features in the
samples are optimally aligned with features in the reference(s). One
may align a single sample to a single reference, several samples to a
single reference, and several samples to several references. In the
latter case, the number of references and samples should be equal. One
may require that all samples are warped with the same warping
function, or one may allow individual warpings for all samples.
ptw(ref, samp, selected.traces, init.coef = c(0, 1, 0), try = FALSE, warp.type = c("individual", "global"), optim.crit = c("WCC", "RMS"), mode = c("forward", "backward"), smooth.param = ifelse(try, 0, 1e05), trwdth = 20, trwdth.res = trwdth, verbose = FALSE, ...) stptw(ref, samp, init.coef = c(0, 1, 0), trwdth = 20, trwdth.res = trwdth, nGlobal = ifelse(length(init.coef) > 3, 5, 0), ...) ## S3 method for class 'ptw' summary(object, ...) ## S3 method for class 'ptw' print(x, ...)
ptw(ref, samp, selected.traces, init.coef = c(0, 1, 0), try = FALSE, warp.type = c("individual", "global"), optim.crit = c("WCC", "RMS"), mode = c("forward", "backward"), smooth.param = ifelse(try, 0, 1e05), trwdth = 20, trwdth.res = trwdth, verbose = FALSE, ...) stptw(ref, samp, init.coef = c(0, 1, 0), trwdth = 20, trwdth.res = trwdth, nGlobal = ifelse(length(init.coef) > 3, 5, 0), ...) ## S3 method for class 'ptw' summary(object, ...) ## S3 method for class 'ptw' print(x, ...)
ref |
reference. Either a vector (containing one reference signal) or a matrix (one reference per row). If more than one reference is specified, the number of reference signals must equal the number of sample signals. |
samp |
sample. A vector (containing one sample signal) or a matrix (one sample per row). |
selected.traces |
optional vector containing the row numbers to
use from |
init.coef |
starting coefficients. The first number is the
zeroth-order coefficient (i.e., a constant shift); further numbers
indicate linear, quadratic, ... stretches. The default is to start
from the identity warping using a quadratic function
( |
try |
if |
warp.type |
default is to treat samples and references as single
entities and align them individually and independently. Using the
argument |
optim.crit |
either |
mode |
either "forward" or "backward": the latter was the original implementation, basically for a point i in the original signal predicting the point j in the signal that would be in position i in the warped signal. The interpretation of the coefficients is counterintuitive. Therefore the default is "forward", simply predicting the location (time) in the warped signal of a particular point. Apart from possible numerical optimisation issues, both warpings should give the same net result. |
smooth.param |
smoothing parameter for smoothing the reference
and sample when |
trwdth |
the width of the triangle in the WCC criterion during the optimization, given as a number of data points. Default: 20 |
trwdth.res |
the width of the triangle in the WCC calculation in
the calculation of the quality of the final result. Default: equal
to |
verbose |
logical, default is |
... |
further arguments to optim |
nGlobal |
for stick optimisations with more than three
coefficients, a more powerful optimisation method is used
( |
x , object
|
an object of class "ptw" |
Function ptw
and friends is meant for profile data,
where intensities have been recorded at regular time points; function
stptw
is meant for lists of peaks, for instance obtained after
peak-picking the profile data. The latter option is less flexible
(Euclidean distance and backward warping have not been implemented,
and only global alignment is possible)
but is much faster, especially for larger data sets.
In the optimization mode (try = FALSE
), the function
optimizes the warping coefficients using the chosen criterion (either
"WCC" or "RMS"). For "RMS", the data are smoothed before the
optimization, but the quality of the final warping is measured on the
unsmoothed data. For "WCC", the warping is performed using
trwdth
as the triangle width, but the quality of the final
solution is measured using trwdth.res
.
If try = TRUE
is used as an argument, the function does not
start an optimization, but just calculates the warping for the given
warp function (init.coef
); if smooth.param
is larger
than zero for the RMS criterion, the RMS of the smoothed patterns is
calculated. The WCC criterion uses trwidth.res
as the triangle
width in this case.
Five situations can be distinguished:
One sample and one reference: this obviously leads to one warping
function regardless of the setting of warp.type
.
Several samples, all warped to the same single reference, each with
its own warping function: this is the default behaviour
(warp.type = "individual"
)
Several samples, warped to an equal number of references
(pair-wise), with their own warping functions: this is the default
behaviour (warp.type = "individual"
)
Several samples, warped to one reference, with one warping function
(warp.type = "global"
)
Several samples, warped to an equal number of references
(pair-wise), with one warping function
(warp.type = "global"
)
A list of class "ptw" containing:
reference |
the reference(s) used as input |
sample |
the sample(s) used as input |
warped.sample |
the warped sample |
warp.coef |
the warping coefficients |
warp.fun |
the warped indices (not for |
crit.value |
the value of the chosen criterion, either "WCC" or "RMS" |
optim.crit |
the chosen criterion, either "WCC" or "RMS" |
warp.type |
the chosen type of warping, either "individual" or "global" |
Jan Gerretzen, Ron Wehrens
Eilers, P.H.C. (2004) "Parametric Time Warping", Analytical Chemistry, 76 (2), 404 – 411.
Bloemberg, T.G., et al. (2010) "Improved parametric time warping for Proteomics", Chemometrics and Intelligent Laboratory Systems, 104 (1), 65 – 74.
WCC
, RMS
, select.traces
,
gaschrom
, lcms
data(gaschrom) ref <- gaschrom[1,] samp <- gaschrom[16,] gaschrom.ptw <- ptw(ref, samp) summary(gaschrom.ptw) ## same with sticks (peak lists) refst <- gaschrom.st[1] sampst <- gaschrom.st[16] gaschrom.st.ptw <- stptw(refst, sampst, trwdth = 100) summary(gaschrom.st.ptw) ## Not run: ## comparison between backward and forward warping gaschrom.ptw <- ptw(ref, samp, init.coef = c(0, 1, 0, 0), mode = "backward") summary(gaschrom.ptw) gaschrom.ptw <- ptw(ref, samp, init.coef = c(-10, 1, 0, 0), mode = "forward") summary(gaschrom.ptw) ## ############################# ## many samples warped on one reference ref <- gaschrom[1,] samp <- gaschrom[2:16,] gaschrom.ptw <- ptw(ref, samp, warp.type = "individual", verbose = TRUE, optim.crit = "WCC", trwdth = 100, init.coef = c(0, 1, 0)) summary(gaschrom.ptw) ## "individual" warping not implemented for sticks; do separate warpings ## instead refst <- gaschrom.st[1] sampst <- gaschrom.st[2:16] gaschrom.st.ptw.list <- lapply(sampst, function(smpl) stptw(refst, list(smpl), trwdth = 100, init.coef = c(0, 1, 0))) t(sapply(gaschrom.st.ptw.list, "[[", "warp.coef")) t(sapply(gaschrom.st.ptw.list, "[[", "crit.value")) ## ############################# ## several samples on several references individually ref <- gaschrom[1:8,] samp <- gaschrom[9:16,] gaschrom.ptw <- ptw(ref, samp, warp.type = "individual", optim.crit = "WCC", trwdth = 100, init.coef = c(0, 1, 0)) summary(gaschrom.ptw) ## stick version gaschrom.st.ptw.list <- mapply(function(x, y) stptw(list(x), list(y), trwdth = 100, init.coef = c(0, 1, 0)), gaschrom.st[1:8], gaschrom.st[9:16], SIMPLIFY = FALSE) t(sapply(gaschrom.st.ptw.list, coef)) ## ############################# ## several samples on several references: one, global warping gaschrom.ptw <- ptw(ref, samp, warp.type = "global", optim.crit = "WCC", init.coef = c(0, 1, 0)) summary(gaschrom.ptw) refst <- gaschrom.st[1:8] sampst <- gaschrom.st[9:16] gaschrom.st.ptw <- stptw(refst, sampst, trwdth=100, init.coef = c(0, 1, 0)) summary(gaschrom.st.ptw) ## ################################################################# ## Example of a three-way data set# ## ################################################################# ## first bring all samples to the same scale data(lcms) lcms.scaled <- aperm(apply(lcms, c(1,3), function(x) x/mean(x) ), c(2,1,3)) ## add zeros to the start and end of the chromatograms lcms.s.z <- aperm(apply(lcms.scaled, c(1,3), function(x) padzeros(x, 250) ), c(2,1,3)) ## define a global 2nd degree warping warp1 <- ptw(lcms.s.z[,,2], lcms.s.z[,,3], warp.type="global") warp.samp <- warp1$warped.sample warp.samp[is.na(warp.samp)] <- 0 ## refine by adding 5th degree warpings for individual chromatograms warp2 <- ptw(lcms.s.z[,,2], warp.samp, init.coef=c(0,1,0,0,0,0)) warp.samp2 <- warp2$warped.sample warp.samp2[is.na(warp.samp2)] <- 0 ## compare TICs layout(matrix(1:2,2,1, byrow=TRUE)) plot(colSums(lcms.s.z[,,2]), type="l", ylab = "", main = "TIC: original data") lines(colSums(lcms.s.z[,,3]), col=2, lty=2) plot(colSums(lcms.s.z[,,2]), type="l", ylab = "", main = "TIC: warped data") lines(colSums(warp.samp2), lty=2, col=2) ## ########################### ## stick version of this warping - note that the peaks have been picked ## from the scaled profiles. Note that here we need to take list ## elements: every sample is a list of mz channels. warp1.st <- stptw(lcms.pks[[2]], lcms.pks[[3]], trwdth = 100) summary(warp1.st) ## End(Not run)
data(gaschrom) ref <- gaschrom[1,] samp <- gaschrom[16,] gaschrom.ptw <- ptw(ref, samp) summary(gaschrom.ptw) ## same with sticks (peak lists) refst <- gaschrom.st[1] sampst <- gaschrom.st[16] gaschrom.st.ptw <- stptw(refst, sampst, trwdth = 100) summary(gaschrom.st.ptw) ## Not run: ## comparison between backward and forward warping gaschrom.ptw <- ptw(ref, samp, init.coef = c(0, 1, 0, 0), mode = "backward") summary(gaschrom.ptw) gaschrom.ptw <- ptw(ref, samp, init.coef = c(-10, 1, 0, 0), mode = "forward") summary(gaschrom.ptw) ## ############################# ## many samples warped on one reference ref <- gaschrom[1,] samp <- gaschrom[2:16,] gaschrom.ptw <- ptw(ref, samp, warp.type = "individual", verbose = TRUE, optim.crit = "WCC", trwdth = 100, init.coef = c(0, 1, 0)) summary(gaschrom.ptw) ## "individual" warping not implemented for sticks; do separate warpings ## instead refst <- gaschrom.st[1] sampst <- gaschrom.st[2:16] gaschrom.st.ptw.list <- lapply(sampst, function(smpl) stptw(refst, list(smpl), trwdth = 100, init.coef = c(0, 1, 0))) t(sapply(gaschrom.st.ptw.list, "[[", "warp.coef")) t(sapply(gaschrom.st.ptw.list, "[[", "crit.value")) ## ############################# ## several samples on several references individually ref <- gaschrom[1:8,] samp <- gaschrom[9:16,] gaschrom.ptw <- ptw(ref, samp, warp.type = "individual", optim.crit = "WCC", trwdth = 100, init.coef = c(0, 1, 0)) summary(gaschrom.ptw) ## stick version gaschrom.st.ptw.list <- mapply(function(x, y) stptw(list(x), list(y), trwdth = 100, init.coef = c(0, 1, 0)), gaschrom.st[1:8], gaschrom.st[9:16], SIMPLIFY = FALSE) t(sapply(gaschrom.st.ptw.list, coef)) ## ############################# ## several samples on several references: one, global warping gaschrom.ptw <- ptw(ref, samp, warp.type = "global", optim.crit = "WCC", init.coef = c(0, 1, 0)) summary(gaschrom.ptw) refst <- gaschrom.st[1:8] sampst <- gaschrom.st[9:16] gaschrom.st.ptw <- stptw(refst, sampst, trwdth=100, init.coef = c(0, 1, 0)) summary(gaschrom.st.ptw) ## ################################################################# ## Example of a three-way data set# ## ################################################################# ## first bring all samples to the same scale data(lcms) lcms.scaled <- aperm(apply(lcms, c(1,3), function(x) x/mean(x) ), c(2,1,3)) ## add zeros to the start and end of the chromatograms lcms.s.z <- aperm(apply(lcms.scaled, c(1,3), function(x) padzeros(x, 250) ), c(2,1,3)) ## define a global 2nd degree warping warp1 <- ptw(lcms.s.z[,,2], lcms.s.z[,,3], warp.type="global") warp.samp <- warp1$warped.sample warp.samp[is.na(warp.samp)] <- 0 ## refine by adding 5th degree warpings for individual chromatograms warp2 <- ptw(lcms.s.z[,,2], warp.samp, init.coef=c(0,1,0,0,0,0)) warp.samp2 <- warp2$warped.sample warp.samp2[is.na(warp.samp2)] <- 0 ## compare TICs layout(matrix(1:2,2,1, byrow=TRUE)) plot(colSums(lcms.s.z[,,2]), type="l", ylab = "", main = "TIC: original data") lines(colSums(lcms.s.z[,,3]), col=2, lty=2) plot(colSums(lcms.s.z[,,2]), type="l", ylab = "", main = "TIC: warped data") lines(colSums(warp.samp2), lty=2, col=2) ## ########################### ## stick version of this warping - note that the peaks have been picked ## from the scaled profiles. Note that here we need to take list ## elements: every sample is a list of mz channels. warp1.st <- stptw(lcms.pks[[2]], lcms.pks[[3]], trwdth = 100) summary(warp1.st) ## End(Not run)
Calculates values of the chosen optimization criterion (RMS or WCC) on a grid defined by the coefficients of the warping function.
ptwgrid(ref, samp, selected.traces, coef.mins, coef.maxs, coef.lengths, optim.crit = c("WCC", "RMS"), smooth.param = 1e05, trwdth = 20)
ptwgrid(ref, samp, selected.traces, coef.mins, coef.maxs, coef.lengths, optim.crit = c("WCC", "RMS"), smooth.param = 1e05, trwdth = 20)
ref |
reference. Either a vector (containing one reference signal) or a matrix (one reference per row). If more than one reference is specified, the number of reference signals must equal the number of sample signals |
samp |
sample. A vector (containing one sample signal) or a matrix (one sample per row). If more than one sample is specified, the number of sample signals must equal the number of reference signals |
selected.traces |
optional vector containing the row numbers to
use from |
coef.mins |
a vector containing the respective minimal values of coefficients for the grid. The first number is the minimal zeroth-order coefficient (i.e., a constant shift); further numbers indicate the minimal linear, quadratic, ... stretches |
coef.maxs |
a vector containing the maximal values of coefficients for the grid |
coef.lengths |
a vector of the same length as coef.maxs and coef.mins containing the number of steps in which to vary the respective coefficients between their minimum and maximum value |
optim.crit |
either |
smooth.param |
smoothing parameter for smoothing the reference
and sample when |
.
trwdth |
the width of the triangle in the WCC criterion during the optimization, given as a number of data points. Default: 20 |
In contrast to ptw, only the three situations leading to one warping function are distinguished here:
One sample and one reference;
Several samples, warped to an equal number of references (pair-wise);
Several samples, warped to a single reference.
Which situation is applicable is determined from the dimensions of
ref
and samp
.
An array of dimension length(coef.mins)
and maximal indices
per dimension specified by coef.lengths
Tom Bloemberg, Jan Gerretzen, Ron Wehrens
## Not run: data(gaschrom) mygrid <- ptwgrid(gaschrom[1,], gaschrom[16,], coef.mins = c(-10, .9), coef.max = c(10, 1.1), coef.lengths = c(21, 21)) image(seq(-10, 10, length = 21), seq(.9, 1.1, length = 21), mygrid) ## End(Not run)
## Not run: data(gaschrom) mygrid <- ptwgrid(gaschrom[1,], gaschrom[16,], coef.mins = c(-10, .9), coef.max = c(10, 1.1), coef.lengths = c(21, 21)) image(seq(-10, 10, length = 21), seq(.9, 1.1, length = 21), mygrid) ## End(Not run)
Functions to compare patterns with shifted features. These functions compare warped sample patterns to one or more reference patterns. RMS returns the usual root-mean-squared difference measure; WCC returns 1-wcc, where wcc indicates the weighted cross-correlation. Perfect alignment leads to a value of 0 for both criteria.
Internal function, not meant to be called directly by the user. In
particular, note that the identity warping may lead to slightly
different estimates than a direct comparison of the reference and
sample signals - a warping even slightly outside the original range of
1 : ncol(ref)
leads to NA values.
RMS(warp.coef, ref, samp, B, mode) WCC(warp.coef, ref, samp, B, trwdth = 20, wghts, mode, ref.acors = NULL)
RMS(warp.coef, ref, samp, B, mode) WCC(warp.coef, ref, samp, B, trwdth = 20, wghts, mode, ref.acors = NULL)
warp.coef |
a vector of warping coefficients |
ref |
reference signal; a matrix with one or more rows. If the
number of rows is greater than one, it should equal the number of
rows in |
samp |
sample signal; a matrix with one or more rows |
B |
basis for warping function |
mode |
either "forward" (new implementation, also used for warping peak lists) or "backward" (classical implementation). |
trwdth |
triangle width for the WCC function, expressed in the number of data points |
wghts |
optional weights vector, will be calculated from triangle width if necessary. Sometimes it is more efficient to pre-calculate it and give it as an argument |
ref.acors |
autocorrelation of the reference. Since the reference is often unchanged over multiple evaluations (e.g., during an optimization), it is useful to pre-calculate this number |
All patterns in samp
are warped using the same warping
function, and then compared to ref
, either pair-wise (when
ref
and samp
are of the same size), or with the one
pattern in ref
.
One number - either the WCC or RMS value
Jan Gerretzen, Tom Bloemberg, Ron Wehrens
Eilers, P.H.C. (2004) "Parametric Time Warping", Analytical Chemistry, 76 (2), 404 – 411.
de Gelder, R., Wehrens, R. and Hageman, J.A. (2001) "A generalized expression for the similarity of spectra: Application to powder diffraction pattern classification", Journal of Computational Chemistry, 22, 273 – 289.
For alignment purposes, it may be useful to select traces which show clear features, and to throw away traces that contain mainly noise. This function implements three ways to achieve this: CODA, a criterion similar to varimax, and a criterion based on the highest intensity.
select.traces(X, criterion = c("coda", "var", "int"), window = 5, smoothing = c("median", "mean"))
select.traces(X, criterion = c("coda", "var", "int"), window = 5, smoothing = c("median", "mean"))
X |
a data matrix or an array. The first dimension signifies the traces from which a selection is to be made. If X is a matrix, the first usually corresponds to samples and the second dimension is the spectral dimension. If X is an array, the data are assumed to come from a hyphenated experiment, with the first dimension the chromatographic dimension, the second the spectral dimension and the third dimension corresponding to samples |
criterion |
either Windig's CODA algorithm, a criterion calculating the variances of the length-scaled spectra, or a criterion giving the height of the highest peak |
window , smoothing
|
arguments to the coda function. |
The CODA criterion in essence selects traces with no baseline and no spikes, but still containing significant intensities. The variance criterion aims at something similar: it calculates the variance (or standard deviation) of every trace after length scaling - traces with a high value show few highly structured features, whereas traces with a low value show noise or a significant baseline. The intensity criterion simply returns the intensity of the highest peak. The latter two criteria are simpler than CODA but implicitly assume that the traces have been preprocessed (i.c., spikes have been removed).
The function returns a list with components
crit.val |
a vector containing the values of the criterion for all traces. If X is an array, the function is recursively applied to all samples (elements of the third dimension) - the results are multiplied to obtain one criterion value per trace |
trace.nrs |
the order of the traces (from large to small) |
Ron Wehrens
data(lcms) ntrace <- dim(lcms)[1] lcms.selection <- select.traces(lcms[,,1:2], criterion = "var") good <- lcms.selection$trace.nrs[1] bad <- lcms.selection$trace.nrs[ntrace] par(mfrow = c(1,2)) matplot(lcms[good,,1:2], type = 'l', lty = 1) matplot(lcms[bad,,1:2], type = 'l', lty = 1)
data(lcms) ntrace <- dim(lcms)[1] lcms.selection <- select.traces(lcms[,,1:2], criterion = "var") good <- lcms.selection$trace.nrs[1] bad <- lcms.selection$trace.nrs[ntrace] par(mfrow = c(1,2)) matplot(lcms[good,,1:2], type = 'l', lty = 1) matplot(lcms[bad,,1:2], type = 'l', lty = 1)
Warp time points according to a warping function.
warp.time(tp, coef)
warp.time(tp, coef)
tp |
A vector of time points, not necessarily equidistant. |
coef |
The coefficients of the parametric time warping function: the first coefficient is the constant shift, the second the linear stretch etcetera. |
The function returns a vector of warped time points.
Ron Wehrens
Wehrens, R. et al. (2015) "Fast parametric warping of peak lists", Bioinformatics. DOI: 10.1093/bioinformatics/btv299.
time <- 1:100 ## simple shift and compression warp.time(time, c(-10, .99))
time <- 1:100 ## simple shift and compression warp.time(time, c(-10, .99))
Functions to calculate weighted auto- and cross-correlation measures. The wcc is a suitable measure for the similarity of two patterns when features may be shifted. Identical patterns lead to a wcc value of 1.
Functions wcc
and wac
are meant for profile data
(intensities measured at equidistant time points), whereas
wcc.st
and wac.st
are meant for peak lists. In general,
wcc values calculated for profiles will be higher since they will also
include the large similarity in the empty spaces, i.e., parts of the
profiles where no peaks are present (and that will appear to be
perfectly aligned), whereas the peak-based version concentrates only
on the peaks.
wcc(pattern1, pattern2, trwdth, wghts = NULL, acors1 = NULL, acors2 = NULL) wac(pattern1, trwdth, wghts = NULL)
wcc(pattern1, pattern2, trwdth, wghts = NULL, acors1 = NULL, acors2 = NULL) wac(pattern1, trwdth, wghts = NULL)
pattern1 , pattern2
|
input patterns, typically spectra. Vectors |
trwdth |
triangle width, given in the number of data points for the profile functions, and in the actual retention times for the stick-based warpings. |
wghts |
optional weights vector, will be calculated from triangle width if necessary. Sometimes it is more efficient to pre-calculate it and give it as an argument |
acors1 , acors2
|
autocorrelations of the input patterns. If not provided, they are calculated |
Functions wcc
and wac
are defined such that the
triangle width stands for
the number of points on one side of and including the current
point. Thus, a trwdth
of 0 signifies a non-existent triangle and
results in an error; a trwdth
equal to 1 only includes the current
point with weight 1 and no neighbouring points. For the stick-based
equivalents, the units of the time axis are used for the triangle width.
One number, the weighted autocorrelation or crosscorrelation
Ron Wehrens
de Gelder, R., Wehrens, R. and Hageman, J.A. (2001) "A generalized expression for the similarity of spectra: Application to powder diffraction pattern classification", Journal of Computational Chemistry, 22, 273 – 289.
data(gaschrom) wcc(gaschrom[1,], gaschrom[2,], 20) wcc.st(gaschrom.st[[1]], gaschrom.st[[2]], 20)
data(gaschrom) wcc(gaschrom[1,], gaschrom[2,], 20) wcc.st(gaschrom.st[[1]], gaschrom.st[[2]], 20)
This function smoothes signals with a finite difference penalty of order 1.
whit1(y, lambda, w)
whit1(y, lambda, w)
y |
signal to be smoothed: a vector |
lambda |
smoothing parameter: larger values lead to more smoothing |
w |
weights: a vector of same length as y. Default weights are equal to one |
smoothed signal: a vector
Paul Eilers, Jan Gerretzen
Eilers, P.H.C. (2004) "Parametric Time Warping", Analytical Chemistry, 76 (2), 404 – 411.
Eilers, P.H.C. (2003) "A perfect smoother", Analytical Chemistry, 75, 3631 – 3636.
data(gaschrom) plot(gaschrom[1,], type = "l", ylim = c(0, 100)) lines(whit1(gaschrom[1,], lambda = 1e1), col = 2) lines(whit1(gaschrom[1,], lambda = 1e2), col = 3) lines(whit1(gaschrom[1,], lambda = 1e3), col = 4)
data(gaschrom) plot(gaschrom[1,], type = "l", ylim = c(0, 100)) lines(whit1(gaschrom[1,], lambda = 1e1), col = 2) lines(whit1(gaschrom[1,], lambda = 1e2), col = 3) lines(whit1(gaschrom[1,], lambda = 1e3), col = 4)
This function smoothes signals with a finite difference penalty of order 2.
whit2(y, lambda, w)
whit2(y, lambda, w)
y |
signal to be smoothed: a vector |
lambda |
smoothing parameter: larger values lead to more smoothing |
w |
weights: a vector of same length as y. Default weights are equal to one |
smoothed signal: a vector
Paul Eilers, Jan Gerretzen
Eilers, P.H.C. (2004) "Parametric Time Warping", Analytical Chemistry, 76 (2), 404 – 411.
Eilers, P.H.C. (2003) "A perfect smoother", Analytical Chemistry, 75, 3631 – 3636.
data(gaschrom) plot(gaschrom[1,], type = "l", ylim = c(0, 100)) lines(whit2(gaschrom[1,], lambda = 1e5), col = 2) lines(whit2(gaschrom[1,], lambda = 1e6), col = 3) lines(whit2(gaschrom[1,], lambda = 1e7), col = 4)
data(gaschrom) plot(gaschrom[1,], type = "l", ylim = c(0, 100)) lines(whit2(gaschrom[1,], lambda = 1e5), col = 2) lines(whit2(gaschrom[1,], lambda = 1e6), col = 3) lines(whit2(gaschrom[1,], lambda = 1e7), col = 4)