This vignette is supposed to be a short reference of the primitives
and tools supplied by the mlrCPO package.
CPOs are first-class objects in R that represent
data manipulation. They can be combined to form networks of operation,
they can be attached to mlr Learners, and they
have tunable Hyperparameters that influence their behaviour.
CPOs go through a lifecycle from construction to
CPO to a CPOTrained “retrafo” or “inverter”
object. The different stages of a CPO related object can be
distinguished using getCPOClass(), which
takes one of five values:
getCPOClass(cpoPca)
#> [1] "CPOConstructor"
getCPOClass(cpoPca())
#> [1] "CPO"
getCPOClass(pid.task %>|% cpoPca())
#> [1] "CPORetrafo"
getCPOClass(inverter(bh.task %>>% cpoLogTrafoRegr()))
#> [1] "CPOInverter"
getCPOClass(NULLCPO)
#> [1] "NULLCPO"CPOs are created using
CPOConstructors. These are R functions
with a print function and many parameters in common.
print(cpoAsNumeric) # example CPOConstructor
#> <<CPO as.numeric()>>
print(cpoAsNumeric, verbose = TRUE) # alternative: !cpoAsNumeric
#> <<CPO as.numeric()>>
#>
#> cpo.retrafo:
#> function (data)
#> {
#> as.data.frame(lapply(data, as.numeric), row.names = rownames(data))
#> }
#> <environment: namespace:mlrCPO>
class(cpoAsNumeric)
#> [1] "CPOConstructor" "function"
getCPOName(cpoPca) # same as getCPOName() of the *constructed* CPO
#> [1] "pca"
getCPOClass(cpoPca)
#> [1] "CPOConstructor"The function parameters of a CPOConstructor
CPO HyperparametersCPO id (default to the
CPO’s name)affect.*
parameters)CPO’s hyperparameters are
“exported”, i.e. can late be manipulated using
setHyperPars().names(formals(cpoPca))
#> [1] "center" "scale"
#> [3] "tol" "rank"
#> [5] "id" "export"
#> [7] "affect.type" "affect.index"
#> [9] "affect.names" "affect.pattern"
#> [11] "affect.invert" "affect.pattern.ignore.case"
#> [13] "affect.pattern.perl" "affect.pattern.fixed"(cpo = cpoScale()) # construct CPO with default Hyperparameter values
#> scale(center = TRUE, scale = TRUE)
print(cpo, verbose = TRUE) # detailed printing. Alternative: !cpo
#> Trafo chain of 1 cpos:
#> scale(center = TRUE, scale = TRUE)
#> Operating: feature
#> ParamSet:
#> Type len Def Constr Req Tunable Trafo
#> scale.center logical - TRUE - - TRUE -
#> scale.scale logical - TRUE - - TRUE -
class(cpo) # CPOs that are not compound are "CPOPrimitive"
#> [1] "CPOPrimitive" "CPO"
getCPOClass(cpo)
#> [1] "CPO"The inner “state” of a CPO can be inspected and
manipulated using various getters and setters.
getParamSet(cpo)
#> Type len Def Constr Req Tunable Trafo
#> scale.center logical - TRUE - - TRUE -
#> scale.scale logical - TRUE - - TRUE -
getHyperPars(cpo)
#> $scale.center
#> [1] TRUE
#>
#> $scale.scale
#> [1] TRUE
setHyperPars(cpo, scale.center = FALSE)
#> scale(center = FALSE, scale = TRUE)
getCPOId(cpo)
#> [1] "scale"
setCPOId(cpo, "MYID")
#> MYID<scale>(center = TRUE, scale = TRUE)
getCPOName(cpo)
#> [1] "scale"
getCPOAffect(cpo) # empty, since no affect set
#> named list()
getCPOAffect(cpoPca(affect.pattern = "Width$"))
#> $pattern
#> [1] "Width$"
getCPOConstructor(cpo) # the constructor used to create the CPO
#> <<CPO scale(center = TRUE, scale = TRUE)>>
getCPOProperties(cpo) # see properties explanation below
#> $handling
#> [1] "numerics" "factors" "ordered" "missings" "cluster"
#> [6] "classif" "multilabel" "regr" "surv" "oneclass"
#> [11] "twoclass" "multiclass" "prob" "se"
#>
#> $adding
#> character(0)
#>
#> $needed
#> character(0)
getCPOPredictType(cpo)
#> response prob se
#> "response" "prob" "se"
getCPOClass(cpo)
#> [1] "CPO"
getCPOOperatingType(cpo) # Operating on feature, target, retrafoless?
#> [1] "feature"Compare the predict type and operating type of a TOCPO or ROCPO:
getCPOPredictType(cpoResponseFromSE())
#> response se
#> "se" "se"
getCPOOperatingType(cpoResponseFromSE())
#> [1] "target"
getCPOOperatingType(cpoSample())
#> [1] "retrafoless"The identicalCPO() function is used to check whether the
underlying operation of two CPOs is identical. For
this understanding, CPOs with different hyperparameters can
still be “identical”.
identicalCPO(cpoScale(scale = TRUE), cpoScale(scale = FALSE))
#> [1] TRUE
identicalCPO(cpoScale(), cpoPca())
#> [1] FALSECPOs can be applied to data.frame and
Task objects using %>>% or
applyCPO.
head(iris) %>>% cpoPca()
#> Species PC1 PC2 PC3 PC4
#> 1 setosa -0.1634147 0.017230444 -0.11038321 -0.0231625616
#> 2 setosa 0.3324970 -0.189351624 -0.08152883 0.0005612917
#> 3 setosa 0.3268659 0.101103375 -0.02238439 0.0464537730
#> 4 setosa 0.4202367 0.005523981 0.17106514 -0.0222757931
#> 5 setosa -0.1768684 0.140149101 -0.04185224 -0.0194870755
#> 6 setosa -0.7393165 -0.074655279 0.08508352 0.0179103657
task = applyCPO(cpoPca(), iris.task)
head(getTaskData(task))
#> Species PC1 PC2 PC3 PC4
#> 1 setosa -2.684126 -0.3193972 0.02791483 0.002262437
#> 2 setosa -2.714142 0.1770012 0.21046427 0.099026550
#> 3 setosa -2.888991 0.1449494 -0.01790026 0.019968390
#> 4 setosa -2.745343 0.3182990 -0.03155937 -0.075575817
#> 5 setosa -2.728717 -0.3267545 -0.09007924 -0.061258593
#> 6 setosa -2.280860 -0.7413304 -0.16867766 -0.024200858CPO composition can be done using
%>>% or composeCPO. It results in a new
CPO which mostly behaves like a primitive CPO. Exceptions are:
idscale = cpoScale()
pca = cpoPca()compound = scale %>>% pca
composeCPO(scale, pca) # same
#> (scale >> pca)(scale.center = TRUE, scale.scale = TRUE, pca.center = TRUE, pca.scale = FALSE)
class(compound)
#> [1] "CPOPipeline" "CPO"
!compound
#> Trafo chain of 2 cpos:
#> scale(center = TRUE, scale = TRUE)
#> Operating: feature
#> ParamSet:
#> Type len Def Constr Req Tunable Trafo
#> scale.center logical - TRUE - - TRUE -
#> scale.scale logical - TRUE - - TRUE -
#> ====>
#> pca(center = TRUE, scale = FALSE)[not exp'd: tol = <NULL>, rank = <NULL>]
#> Operating: feature
#> ParamSet:
#> Type len Def Constr Req Tunable Trafo
#> pca.center logical - TRUE - - TRUE -
#> pca.scale logical - FALSE - - TRUE -
getCPOName(compound)
#> [1] "pca.scale"
getHyperPars(compound)
#> $scale.center
#> [1] TRUE
#>
#> $scale.scale
#> [1] TRUE
#>
#> $pca.center
#> [1] TRUE
#>
#> $pca.scale
#> [1] FALSE
setHyperPars(compound, scale.center = TRUE, pca.center = FALSE)
#> (scale >> pca)(scale.center = TRUE, scale.scale = TRUE, pca.center = FALSE, pca.scale = FALSE)getCPOId(compound) # error: no ID for compound CPOs
#> Error in getCPOId.CPO(compound): Compound CPOs have no IDs.
getCPOAffect(compound) # error: no affect for compound CPOs
#> Error in getCPOAffect.CPO(compound): Compound CPOs have no affect arguments.getCPOOperatingType() always considers the operating
type of the whole CPO chain and may return multiple
values:
getCPOOperatingType(NULLCPO)
#> character(0)
getCPOOperatingType(cpoScale())
#> [1] "feature"
getCPOOperatingType(cpoScale() %>>% cpoLogTrafoRegr() %>>% cpoSample())
#> [1] "feature" "target" "retrafoless"Composite CPO objects can be broken into their
constituent primitive CPOs using as.list().
The inverse of this operation is pipeCPO(), which composes
a list of CPOs in the given order.
as.list(compound)
#> [[1]]
#> scale(center = TRUE, scale = TRUE)
#>
#> [[2]]
#> pca(center = TRUE, scale = FALSE)[not exp'd: tol = <NULL>, rank = <NULL>]
pipeCPO(as.list(compound)) # chainCPO: (list of CPO) -> CPO
#> (scale >> pca)(scale.center = TRUE, scale.scale = TRUE, pca.center = TRUE, pca.scale = FALSE)
pipeCPO(list())
#> NULLCPOCPO-Learner attachment works using %>>% or
attachCPO.
lrn = makeLearner("classif.logreg")
(cpolrn = cpo %>>% lrn) # the new learner has the CPO hyperparameters
#> Learner classif.logreg.scale from package stats
#> Type: classif
#> Name: ; Short name:
#> Class: CPOLearner
#> Properties: numerics,factors,twoclass,prob
#> Predict-Type: response
#> Hyperparameters: model=FALSE
attachCPO(compound, lrn) # attaching compound CPO
#> Learner classif.logreg.pca.scale from package stats
#> Type: classif
#> Name: ; Short name:
#> Class: CPOLearner
#> Properties: numerics,factors,twoclass,prob
#> Predict-Type: response
#> Hyperparameters: model=FALSEThe new object is a CPOLearner, which performs the
operation given by the CPO before trainign the
Learner.
class(lrn)
#> [1] "classif.logreg" "RLearnerClassif" "RLearner" "Learner"The work performed by a CPOLearner can also be performed
manually:
lrn = cpoLogTrafoRegr() %>>% makeLearner("regr.lm")
model = train(lrn, subsetTask(bh.task, 1:300))
predict(model, subsetTask(bh.task, 301:500))
#> Prediction: 200 observations
#> predict.type: response
#> threshold:
#> time: 0.00
#> id truth response
#> 301 1 24.8 28.69715
#> 302 2 22.0 27.89821
#> 303 3 26.4 28.33370
#> 304 4 33.1 33.80868
#> 305 5 36.1 34.93957
#> 306 6 28.4 28.77130
#> ... (#rows: 200, #cols: 3)is equivalent to
trafo = subsetTask(bh.task, 1:300) %>>% cpoLogTrafoRegr()
model = train("regr.lm", trafo)
newdata = subsetTask(bh.task, 301:500) %>>% retrafo(trafo)
pred = predict(model, newdata)
invert(inverter(newdata), pred)
#> Prediction: 200 observations
#> predict.type: response
#> threshold:
#> time: 0.00
#> id truth response
#> 301 1 24.8 28.69715
#> 302 2 22.0 27.89821
#> 303 3 26.4 28.33370
#> 304 4 33.1 33.80868
#> 305 5 36.1 34.93957
#> 306 6 28.4 28.77130
#> ... (#rows: 200, #cols: 3)It is possible to obtain both the underlying Learner and
the attached CPO from a CPOLearner. Note that
if a CPOLearner is wrapped by some method (e.g. a
TuneWrapper), this does not work, since CPO
can not probe below the first wrapping layer.
getLearnerCPO(cpolrn) # the CPO
#> scale(center = TRUE, scale = TRUE)
getLearnerBare(cpolrn) # the Learner
#> Learner classif.logreg from package stats
#> Type: classif
#> Name: Logistic Regression; Short name: logreg
#> Class: classif.logreg
#> Properties: twoclass,numerics,factors,prob,weights
#> Predict-Type: response
#> Hyperparameters: model=FALSECPOs perform data-dependent operation. However, when this operation
becomes part of a machine-learning process, the operation on
predict-data must depend only on the training data. A
CPORetrafo object represents the re-application of a
trained CPO. A CPOInverter object represents the
transformation of a prediction made on a transformed task back to the
form of the original data.
The CPOTrained objects generated by application of a
CPO (or application of another CPOTrained) can
be retrieved using the retrafo() or the
inverter() function.
transformed = iris %>>% cpoScale()
head(transformed)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 -0.8976739 1.01560199 -1.335752 -1.311052 setosa
#> 2 -1.1392005 -0.13153881 -1.335752 -1.311052 setosa
#> 3 -1.3807271 0.32731751 -1.392399 -1.311052 setosa
#> 4 -1.5014904 0.09788935 -1.279104 -1.311052 setosa
#> 5 -1.0184372 1.24503015 -1.335752 -1.311052 setosa
#> 6 -0.5353840 1.93331463 -1.165809 -1.048667 setosa
(ret = retrafo(transformed))
#> CPO Retrafo chain
#> [RETRAFO scale(center = TRUE, scale = TRUE)]head(getTaskTargets(bh.task))
#> [1] 24.0 21.6 34.7 33.4 36.2 28.7
transformed = bh.task %>>% cpoLogTrafoRegr()
head(getTaskTargets(transformed))
#> [1] 3.178054 3.072693 3.546740 3.508556 3.589059 3.356897
(inv = inverter(transformed))
#> CPO Inverter chain {type:regr} (able to predict 'response', 'se')
#> [INVERTER fun.apply.regr.target(){type:regr}]
head(invert(inv, getTaskTargets(transformed)))
#> [1] 24.0 21.6 34.7 33.4 36.2 28.7Retrafos and inverters are stored as attributes:
attributes(transformed)
#> $names
#> [1] "type" "env" "weights" "blocking" "coordinates"
#> [6] "task.desc"
#>
#> $class
#> [1] "RegrTask" "SupervisedTask" "Task"
#>
#> $retrafo
#> CPO Retrafo / Inverter chain {type:regr} (able to predict 'response', 'se')
#> [RETRAFO fun.apply.regr.target(){type:regr}]
#>
#> $inverter
#> CPO Inverter chain {type:regr} (able to predict 'response', 'se')
#> [INVERTER fun.apply.regr.target(){type:regr}]It is possible to set the "retrafo" and
"inverter" attributes of an object using
retrafo() and inverter(). This can be useful
for writing elegant scripts, especially since CPOTrained are
automatically chained. To delete the CPOTrained
attribute of an object, set it to NULL or
NULLCPO, or use clearRI().
bh2 = bh.task
retrafo(bh2) = ret
attributes(bh2)
#> $names
#> [1] "type" "env" "weights" "blocking" "coordinates"
#> [6] "task.desc"
#>
#> $class
#> [1] "RegrTask" "SupervisedTask" "Task"
#>
#> $retrafo
#> CPO Retrafo chain
#> [RETRAFO scale(center = TRUE, scale = TRUE)]retrafo(bh2) = NULLCPO
# equivalent:
# retrafo(bh2) = NULL
attributes(bh2)
#> $names
#> [1] "type" "env" "weights" "blocking" "coordinates"
#> [6] "task.desc"
#>
#> $class
#> [1] "RegrTask" "SupervisedTask" "Task"# clearRI returns the object without retrafo or inverter attributes
bh3 = clearRI(transformed)
attributes(bh3)
#> $names
#> [1] "type" "env" "weights" "blocking" "coordinates"
#> [6] "task.desc"
#>
#> $class
#> [1] "RegrTask" "SupervisedTask" "Task"General methods that work on CPOTrained object to
inspect its object properties. Many methods that work on a
CPO also work on a CPOTrained and give the
same result.
getCPOName(ret)
#> [1] "scale"
getParamSet(ret)
#> Type len Def Constr Req Tunable Trafo
#> center logical - TRUE - - TRUE -
#> scale logical - TRUE - - TRUE -
getHyperPars(ret)
#> $center
#> [1] TRUE
#>
#> $scale
#> [1] TRUE
getCPOProperties(ret)
#> $handling
#> [1] "numerics" "factors" "ordered" "missings" "cluster"
#> [6] "classif" "multilabel" "regr" "surv" "oneclass"
#> [11] "twoclass" "multiclass" "prob" "se"
#>
#> $adding
#> character(0)
#>
#> $needed
#> character(0)
getCPOPredictType(ret)
#> response prob se
#> "response" "prob" "se"
getCPOOperatingType(ret) # Operating on feature, target, both?
#> [1] "feature"
getCPOOperatingType(inv)
#> [1] "target"A CPOTrained has information about whether it can be
used as a CPORetrafo object (and be applied to new data
using %>>%), or as a CPOInverter object
(and used by invert()), or possibly both. This is given by
getCPOTrainedCapability(), which returns a 1
if the object has an effect in the given role, 0 if the
object has no effect (but can be used), or -1 if the object
can not be used in the role.
getCPOTrainedCapability(ret)
#> retrafo invert
#> 1 0
getCPOTrainedCapability(inv)
#> retrafo invert
#> -1 1
getCPOTrainedCapability(NULLCPO)
#> retrafo invert
#> 0 0The “CPO class” of a CPOTrained is
determined by this as well. A pure inverter is CPOInverter,
an object that can be used for retrafo is a CPORetrafo.
getCPOClass(ret)
#> [1] "CPORetrafo"
getCPOClass(inv)
#> [1] "CPOInverter"The CPO and the CPOConstructor used to
create the `CPOTrained can be queried.
getCPOTrainedCPO(ret)
#> scale(center = TRUE, scale = TRUE)
getCPOConstructor(ret)
#> <<CPO scale(center = TRUE, scale = TRUE)>>CPOTrained objects can be inspected using
getCPOTrainedState(). The state contains the
hyperparameters, the control object (CPO dependent data
representing the data information needed to re-apply the operation), and
information about the Task / data.frame layout
used for training (column names, column types) in
data$shapeinfo.input and
data$shapeinfo.output.
The state can be manipulated and used to create new
CPOTraineds, using
makeCPOTrainedFromState().
(state = getCPOTrainedState(retrafo(iris %>>% cpoScale())))
#> $center
#> [1] TRUE
#>
#> $scale
#> [1] TRUE
#>
#> $control
#> $control$center
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 5.843333 3.057333 3.758000 1.199333
#>
#> $control$scale
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 0.8280661 0.4358663 1.7652982 0.7622377
#>
#>
#> $data
#> $data$shapeinfo.input
#> <ShapeInfo (input) Sepal.Length: num, Sepal.Width: num, Petal.Length: num, Petal.Width: num, Species: fac>
#>
#> $data$shapeinfo.output
#> <ShapeInfo (output)>:
#> numeric:
#> <ShapeInfo Sepal.Length: num, Sepal.Width: num, Petal.Length: num, Petal.Width: num>
#> factor:
#> <ShapeInfo Species: fac>
#> other:
#> <ShapeInfo (empty)>
state$control$center[1] = 1000 # will now subtract 1000 from the first column
new.retrafo = makeCPOTrainedFromState(cpoScale, state)
head(iris %>>% new.retrafo)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 -1201.474 1.01560199 -1.335752 -1.311052 setosa
#> 2 -1201.716 -0.13153881 -1.335752 -1.311052 setosa
#> 3 -1201.957 0.32731751 -1.392399 -1.311052 setosa
#> 4 -1202.078 0.09788935 -1.279104 -1.311052 setosa
#> 5 -1201.595 1.24503015 -1.335752 -1.311052 setosa
#> 6 -1201.112 1.93331463 -1.165809 -1.048667 setosaWhen executing data %>>% CPO, the result has an
associated CPORetrafo and CPOInverter object.
When applying another CPO, the CPORetrafo and
CPOInverter will be chained automatically. This is to make
(data %>>% CPO1) %>>% CPO2 work the same as
data %>>% (CPO1 %>>% CPO2).
data = head(iris) %>>% cpoPca()
retrafo(data)
#> CPO Retrafo chain
#> [RETRAFO pca(center = TRUE, scale = FALSE)]
data2 = data %>>% cpoScale()retrafo(data2) is the same as
retrafo(data %>>% pca %>>% scale):
retrafo(data2)
#> CPO Retrafo chain
#> [RETRAFO pca(center = TRUE, scale = FALSE)] =>
#> [RETRAFO scale(center = TRUE, scale = TRUE)]To interrupt this chain, set retrafo to NULL either
explicitly, or using clearRI().
data = clearRI(data)
data2 = data %>>% cpoScale()
retrafo(data2)
#> CPO Retrafo chain
#> [RETRAFO scale(center = TRUE, scale = TRUE)]this is equivalent to
retrafo(data) = NULL
inverter(data) = NULL
data3 = data %>>% cpoScale()
retrafo(data3)
#> CPO Retrafo chain
#> [RETRAFO scale(center = TRUE, scale = TRUE)]CPOTrained can be composed using %>>%
and pipeCPO(), just like CPOs. They can also
be split apart into primitive parts using as.list. It is
recommended to only chain CPOTrained objects if they were
created in the given order by preprocessing operations, since
CPOTraineds are very dependent on their position within a
preprocessing pipeline.
compound.retrafo = retrafo(head(iris) %>>% compound)
compound.retrafo
#> CPO Retrafo chain
#> [RETRAFO scale(center = TRUE, scale = TRUE)] =>
#> [RETRAFO pca(center = TRUE, scale = FALSE)](retrafolist = as.list(compound.retrafo))
#> [[1]]
#> CPO Retrafo chain
#> [RETRAFO scale(center = TRUE, scale = TRUE)]
#>
#> [[2]]
#> CPO Retrafo chain
#> [RETRAFO pca(center = TRUE, scale = FALSE)]retrafolist[[1]] %>>% retrafolist[[2]]
#> CPO Retrafo chain
#> [RETRAFO scale(center = TRUE, scale = TRUE)] =>
#> [RETRAFO pca(center = TRUE, scale = FALSE)]
pipeCPO(retrafolist)
#> CPO Retrafo chain
#> [RETRAFO scale(center = TRUE, scale = TRUE)] =>
#> [RETRAFO pca(center = TRUE, scale = FALSE)]Similarly to CPOs, CPOTrained objects can
be applied to data using %>>%, applyCPO,
or predict. This only works with objects that have the
"retrafo" capability and hence the CPORetrafo
class.
transformed = iris %>>% cpoScale()
head(iris) %>>% retrafo(transformed)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 -0.8976739 1.01560199 -1.335752 -1.311052 setosa
#> 2 -1.1392005 -0.13153881 -1.335752 -1.311052 setosa
#> 3 -1.3807271 0.32731751 -1.392399 -1.311052 setosa
#> 4 -1.5014904 0.09788935 -1.279104 -1.311052 setosa
#> 5 -1.0184372 1.24503015 -1.335752 -1.311052 setosa
#> 6 -0.5353840 1.93331463 -1.165809 -1.048667 setosaShould in general give the same as head(transformed),
since the same data was used:
head(transformed)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 -0.8976739 1.01560199 -1.335752 -1.311052 setosa
#> 2 -1.1392005 -0.13153881 -1.335752 -1.311052 setosa
#> 3 -1.3807271 0.32731751 -1.392399 -1.311052 setosa
#> 4 -1.5014904 0.09788935 -1.279104 -1.311052 setosa
#> 5 -1.0184372 1.24503015 -1.335752 -1.311052 setosa
#> 6 -0.5353840 1.93331463 -1.165809 -1.048667 setosaapplyCPO() and
predict() are synonyms of
%>>% when used for CPORetrafo
objects:
applyCPO(retrafo(transformed), head(iris))
predict(retrafo(transformed), head(iris))To use CPOTrained objects for inversion, the
invert() function is used. Besides the
CPOTrained, it takes the data to invert, and optionally the
predict.type. Typically CPOTrained objects
that were retrieved using inverter() from a transformed
dataset should be used for inversion. Retrafo CPOTrained
objects retrieved from a transformed data set using
retrafo() sometimes have both the "retrafo" as
well as the "invert" capability (precisely when all TOCPOs
used had the constant.invert flag set, see Building Custom CPOs) and can then also
be used for inversion. In that case, however, the "truth"
column of an inverted prediction is dropped.
transformed = bh.task %>>% cpoLogTrafoRegr()
prediction = predict(train("regr.lm", transformed), transformed)
inv = inverter(transformed)
invert(inv, prediction)
#> Prediction: 506 observations
#> predict.type: response
#> threshold:
#> time: 0.00
#> id truth response
#> 1 1 24.0 29.46569
#> 2 2 21.6 24.65039
#> 3 3 34.7 30.48177
#> 4 4 33.4 28.91454
#> 5 5 36.2 27.40745
#> 6 6 28.7 25.77416
#> ... (#rows: 506, #cols: 3)ret = retrafo(transformed)
invert(ret, prediction)
#> Prediction: 506 observations
#> predict.type: response
#> threshold:
#> time: 0.00
#> id response
#> 1 1 29.46569
#> 2 2 24.65039
#> 3 3 30.48177
#> 4 4 28.91454
#> 5 5 27.40745
#> 6 6 25.77416
#> ... (#rows: 506, #cols: 2)Inversion can be done on both predictions given by mlr
Learners, as well as plain vectors,
data.frames, and matrix objects.
Note that the prediction being inverted must have the form of a
prediction done with the predict.type that an inverter
expects as input for the predict.type given to
invert() as an argument. This can be queried using the
getCPOPredictType() function. If invert() is
called with predict.type = p, then the prediction must be
one made with a Learner that has predict.type
set to getCPOPredictType(cpo)[p].
NULLCPO is the neutral element of
%>>% and the operations it represents
(composeCPO(), applyCPO(), and
attachCPO()), i.e. when it is used as an argument of these
functions, the data, Learner or CPO is not
changed. NULLCPO is also the result pipeCPO()
called with the empty list, and of retrafo() and
inverter() when they are called for objects with no
CPOTrained objects attached.
pipeCPO(list())
#> NULLCPO
as.list(NULLCPO) # the inverse of pipeCPO
#> list()
retrafo(bh.task)
#> NULLCPO
inverter(bh.task %>>% cpoPca()) # cpoPca is a TOCPO, so no inverter is created
#> NULLCPOMany getters give characteristic results for
NULLCPO.
getCPOClass(NULLCPO)
#> [1] "NULLCPO"
getCPOName(NULLCPO)
#> [1] "NULLCPO"
getCPOId(NULLCPO)
#> [1] "NULLCPO"
getHyperPars(NULLCPO)
#> named list()
getParamSet(NULLCPO)
#> [1] "Empty parameter set."
getCPOAffect(NULLCPO)
#> named list()
getCPOOperatingType(NULLCPO) # operates neither on features nor on targets.
#> character(0)
getCPOProperties(NULLCPO)
#> $handling
#> [1] "numerics" "factors" "ordered" "missings" "cluster"
#> [6] "classif" "multilabel" "regr" "surv" "oneclass"
#> [11] "twoclass" "multiclass" "prob" "se"
#>
#> $adding
#> character(0)
#>
#> $needed
#> character(0)
# applying NULLCPO leads to a retrafo() of NULLCPO, so it is its own CPOTrainedCPO
getCPOTrainedCPO(NULLCPO)
#> NULLCPO
# NULLCPO has no effect on applyCPO and invert, so NULLCPO's capabilities are 0.
getCPOTrainedCapability(NULLCPO)
#> retrafo invert
#> 0 0
getCPOTrainedState(NULLCPO)
#> NULLSome helper functions convert NULLCPO to
NULL and back, while leaving other values as they are.
nullToNullcpo(NULL)
#> NULLCPO
nullcpoToNull(NULLCPO)
#> NULL
nullToNullcpo(10) # not changed
#> [1] 10
nullcpoToNull(10) # ditto
#> [1] 10A CPO has a “name” which identifies the general
operation done by this CPO. For example, it is
"pca" for a CPO created using
cpoPca(). Furthermore, a CPO has an “ID” which
is associated with the particular CPO object at hand. For
primitive CPOs, it can be queried and set using
getCPOId() and setCPOId(), and it can be set
during construction, but it defaults to the CPO’s
name. The ID will also be prefixed to the CPO’s
hyperparameters after construction, if they are exported. This can help
prevent hyperparameter name clashes when composing CPOs
with otherwise identical hyperparameter names. It is possible to set the
ID to NULL to have no prefix for hyperparameter names.
cpo = cpoPca()
getCPOId(cpo)
#> [1] "pca"getParamSet(cpo)
#> Type len Def Constr Req Tunable Trafo
#> pca.center logical - TRUE - - TRUE -
#> pca.scale logical - FALSE - - TRUE -getParamSet(setCPOId(cpo, "my.id"))
#> Type len Def Constr Req Tunable Trafo
#> my.id.center logical - TRUE - - TRUE -
#> my.id.scale logical - FALSE - - TRUE -getParamSet(setCPOId(cpo, NULL))
#> Type len Def Constr Req Tunable Trafo
#> center logical - TRUE - - TRUE -
#> scale logical - FALSE - - TRUE -In the following (silly) example an error is thrown because of hyperparameter name clash. This can be avoided by setting the ID of one of the constituents to a different value.
cpo %>>% cpo
#> Error in parameterClashAssert(cpo1, cpo2, cpo1$debug.name, cpo2$debug.name): Parameters "pca.center", "pca.scale" occur in both pca and pca
#> Use the id parameter when constructing, or setCPOId, to prevent name collisions.cpo %>>% setCPOId(cpo, "two")
#> (pca >> two<pca>)(pca.center = TRUE, pca.scale = FALSE, two.center = TRUE, two.scale = FALSE)CPOs contain information about the kind of data they can work with,
and what kind of data they produce. getCPOProperties
returns a list with the slots handling,
adding, needed.
properties$handling indicates the kind of data a CPO can
handle, properties$needed indicates the kind of data it
needs the data receiver (e.g. attached learner) to have, and
properties$adding lists the properties it adds to a given
learner. An example is cpoDummyEncode(), a CPO that
converts factors to numerics: The receiving learner needs to handle
numerics, so properties$needed == "numerics", but it
adds the ability to handle factors (since they are converted),
so properties$adding = c("factors", "ordered").
getCPOProperties(cpoDummyEncode())
#> $handling
#> [1] "numerics" "factors" "ordered" "missings" "cluster"
#> [6] "classif" "multilabel" "regr" "surv" "oneclass"
#> [11] "twoclass" "multiclass" "prob" "se"
#>
#> $adding
#> [1] "factors" "ordered"
#>
#> $needed
#> [1] "numerics"As a result, cpoDummyEncode endows a
Learner with the ability to train on data with factor
variables:
train("classif.fnn", bc.task) # gives an error
#> Error in checkLearnerBeforeTrain(task, learner, weights): Task 'BreastCancer-example' has factor inputs in 'Cl.thickness, Cell.size, Cell.shape, Marg.adhes...', but learner 'classif.fnn' does not support that!train(cpoDummyEncode(reference.cat = TRUE) %>>% makeLearner("classif.fnn"), bc.task)
#> Model for learner.id=classif.fnn.dummyencode; learner.class=CPOLearner
#> Trained on: task.id = BreastCancer-example; obs = 683; features = 9
#> Hyperparameters:getLearnerProperties("classif.fnn")
#> [1] "twoclass" "multiclass" "numerics"getLearnerProperties(cpoDummyEncode(TRUE) %>>% makeLearner("classif.fnn"))
#> [1] "numerics" "factors" "ordered" "twoclass" "multiclass".sometimes-PropertiesAs described in more detail in the Building Custom CPOs vignette,
CPOs can have properties that are considered only when
composing CPOs, or only when checking data returned by
CPOs. In short, consider a CPO that does
imputation, but only for factorial features. This CPO would
need to have "missings" in its $adding
properties slot, since it enables Learner to handle (some)
Tasks that have missing values. However, this
CPO may under certain circumstances still return data that
has missing values. This discrepancy is recorded internally by having
two “hidden” sets of properties that can be retrieved with
getCPOProperties() with get.internal set to
TRUE. These properties are adding.min, the
minimal set of properties added, and needed.max, the
maximal set of properties needed by consecutive operators. These can be
understood as a description of the “worst case” behaviour of the
CPO, since behaviour that is out of bounds of these sets
causes an error by the mlrCPO-framework.
An example is the cpoApplyFun CPO: When it
is constructed, it is not known what kind of properties will be added or
needed, so adding.min is empty while
needed.max is the set of all data properties. When
composing CPOs, this CPO is handled as if it
magically does exactly the data conversion necessary to make the
CPOs or Learner coming after it work with the
data. If this ends up not being the case, an error is thrown during
application or training by the following CPO or
Learner.
getCPOProperties(cpoApplyFun(export = "export.all"), get.internal = TRUE)
#> $handling
#> [1] "numerics" "factors" "ordered" "missings" "cluster"
#> [6] "classif" "multilabel" "regr" "surv" "oneclass"
#> [11] "twoclass" "multiclass" "prob" "se"
#>
#> $adding
#> [1] "numerics" "factors" "ordered" "missings"
#>
#> $needed
#> character(0)
#>
#> $adding.min
#> character(0)
#>
#> $needed.max
#> [1] "numerics" "factors" "ordered" "missings"When constructing a CPO, it is possible to restrict the
columns on which the CPO operates using the
affect.* parameters of the CPOConstructor.
These parameters are:
affect.index: Identify affected
columns by a vector of column indices.affect.names: Identify affected
columns by a vector of column names.affect.pattern: Match column names
against a grep() style regex pattern.affect.pattern.ignore.case: Ignore
case when matching by pattern.affect.pattern.perl: Use “perl” syntax
in affect.pattern.affect.pattern.fixed: Use fixed
pattern instead of regex in affect.pattern.affect.invert: Invert the columns to
affect: Only columns not matched by any of the other
affect.* parameters are affected.# onlhy PCA columns that have '.Length' in their name
cpo = cpoPca(affect.pattern = ".Length")
getCPOAffect(cpo)
#> $pattern
#> [1] ".Length"triris = iris %>>% cpo
head(triris)
#> Sepal.Width Petal.Width Species PC1 PC2
#> 1 3.5 0.2 setosa -2.460241 -0.24479165
#> 2 3.0 0.2 setosa -2.538962 -0.06093579
#> 3 3.2 0.2 setosa -2.709611 0.08355948
#> 4 3.1 0.2 setosa -2.565116 0.25420858
#> 5 3.6 0.2 setosa -2.499602 -0.15286372
#> 6 3.9 0.4 setosa -2.066375 -0.40249369Sometimes when using many CPOs, their hyperparameters may get messy.
mlrCPO enables the user to control which hyperparameter get
exported. The parameter “export” can be one of
"export.default", "export.set",
"export.unset", "export.default.set",
"export.default.unset", "export.all",
"export.none". “all” and “none” do what one expects;
“default” exports the “recommended” parameters; “set” and “unset” export
the values that have not been set, or only the values that were set (and
are not left as default). “default.set” and “default.unset” work as
“set” and “unset”, but restricted to the default exported
parameters.
!cpoScale()
#> Trafo chain of 1 cpos:
#> scale(center = TRUE, scale = TRUE)
#> Operating: feature
#> ParamSet:
#> Type len Def Constr Req Tunable Trafo
#> scale.center logical - TRUE - - TRUE -
#> scale.scale logical - TRUE - - TRUE -!cpoScale(export = "export.none")
#> Trafo chain of 1 cpos:
#> scale()[not exp'd: center = TRUE, scale = TRUE]
#> Operating: feature
#> ParamSet:
#> [1] "Empty parameter set."!cpoScale(scale = FALSE, export = "export.unset")
#> Trafo chain of 1 cpos:
#> scale(center = TRUE)[not exp'd: scale = FALSE]
#> Operating: feature
#> ParamSet:
#> Type len Def Constr Req Tunable Trafo
#> scale.center logical - TRUE - - TRUE -There are some %>>%-related operators that perform
similar operations but may be more concise in certain applications. In
general these operators are left-assiciative, i.e. they are evaluated
after the expressions to their left were evaluated. Therefore, for
example, a %>>% b %<<% c is equivalent to
(a %>>% b) %<<% c. Exceptions are the
assignment operators, %<>>% and
%<<<%, as well as the %>|%
operator, see below.
The operators are:
%>>%: The application,
composition or attachment operator.%<<%: The above with exchanged
arguments. a %<<% b is equivalent to
b %>>% a%<>>%:
%>>%, followed with assignment to the left. This
operator evaluates the arguments to its right before being evaluated
itself. a %<>>% b %>>% c is equivalent to
a = (a %>>% b %>>% c).%<<<%:
%<<%, followed with assignment to the left. Note this
is not the %<>>% operator with its
arguments flipped. This operator evaluates the arguments to its right
before being evaluated itself.
a %<<<% b %>>% c is equivalent to
a = (a %<<% (b %>>% c)).%>|%: %>>%,
followed by application of retrafo(). This operator
evaluates the arguments to its right before being evaluated itself.
a %>|% b %<<% c is equivalent to
retrafo(a %>>% (b %<<% c)).%|<%: The above with exchanged
arguments. Like most R operators, this one evaluates arguments to its
left before being evaluated itself.
a %>>% b %|<% c is equivalent to
retrafo((a %>>% b) %<<% c).