New models for tabular data:
Regularization Learning Networks (brulee_rln()) use
a conventional MLP architecture but each weight learns its own adaptive
regularization coefficient.
ResNet (brulee_resnet()) can fit a multilayer neural
network with skip (i.e. residual) connections and batch
normalization.
AutoInt (brulee_auto_int()) uses residual
connections and columnwise attention mechanisms to create embeddings
that encourage in-context learning of features.
Saint (brulee_saint()) uses column and/or row
attention mechanisms.
Chronos2 (brulee_chronos()) is a foundational model
for forecasting.
All modeling functions now support GPU acceleration via the
device parameter. Users can specify
device = "cpu", device = "cuda", or
device = "mps" (Apple Silicon). When
device = NULL (default), the package automatically selects
CUDA if available, otherwise defaults to CPU. Note: MPS is not
auto-selected because it doesn’t support float64 dtype required by
brulee. See?training_efficiency for some related
notes.
Float tensors were changed from 64-bit floats to 32-bit. This is to enable GPU usage on MPS devices.
Parameters are initialized on CPU devices and then converted to the chosen device. In some cases, the RNG initialization code is independent of the seed.
For classification, the softmax was moved out of every model’s
forward pass so the loss can use torch::nnf_cross_entropy()
(which applies the log-sum-exp trick internally) instead of
nll_loss(log(softmax(x))). This avoids log(0)
underflow that produced NaN losses and “numerical overflow”
early stopping on overspecified brulee_saint() /
brulee_auto_int() fits. Affects brulee_mlp(),
brulee_logistic_reg(),
brulee_multinomial_reg(), brulee_resnet(),
brulee_auto_int(), and brulee_saint(). New
fits carry output_type = "logits" so the predict path
applies softmax; serialized fits from earlier versions of brulee
continue to predict correctly.
Transition from the magrittr pipe to the base R pipe.
To try to help avoiding numeric overflow in the loss functions:
Tensors are stored as a 64-bit float instead of 32-bit.
Starting values were transitioned to using Gaussian distribution (instead of uniform) with a smaller standard deviation.
The results always contain the initial results to use as a fallback if there is overflow during the first epoch.
brulee_mlp() has two additional parameters,
grad_value_clip and grad_value_clip, that
prevent issues.
The warning was changed to “Early stopping occurred at epoch {X} due to numerical overflow of the loss function.”
Several new SGD optimizers were added: "ADAMw",
"Adadelta", "Adagrad", and
"RMSprop".
Mixture parameter values different than zero cannot be used for several optimizers since they require L2 penalties.
Added a convenience function,
brulee_mlp_two_layer(), to more easily fit two-layer
networks with parsnip.
Various changes and improvements to error and warning messages.
Fixed a bug that occurred when linear activation was used for neural networks (#68).
Fixed bug where coef() didn’t would error if used on
a brulee_logistic_reg() that was trained with a recipe.
(#66)
Fixed a bug where SGD always being used as the optimizer (#61).
Additional activation functions were added (#74).
Several learning rate schedulers were added to the modeling functions (#12).
An optimizer was added to [brulee_mlp()], with a new
default being LBFGS instead of stochastic gradient descent.
Modeling functions gained a mixture argument for the
proportion of L1 penalty that is used. (#50)
Penalization was not occurring when quasi-Newton optimization was chosen. (#50)
First CRAN release.