DeepLearning__L1 | L1 regularization (can add stability and improve generalization, causes many weights to become 0) | default: 1.0E-5 |
DeepLearning__L2 | L2 regularization (can add stability and improve generalization, causes many weights to be small | default: 0.0 |
DeepLearning__activation | The activation function (non-linearity) to be used the neurons in the hidden layers.
Tanh: Hyperbolic tangent function (same as scaled and shifted sigmoid).
Rectifier: Chooses the maximum of (0, x) where x is the input value.
Maxout: Choose the maximum coordinate of the input vector.
With Dropout: Zero out a random user-given fraction of the
incoming weights to each hidden layer during training, for each
training row. This effectively trains exponentially many models at
once, and can improve generalization. | default: Rectifier |
DeepLearning__adaptive_rate | Adaptive learning rate | default: true |
DeepLearning__compute_variable_importances | Compute variable importances for input features (Gedeon method) - can be slow for large networks | default: false |
DeepLearning__distribution_function | Distribution function. | default: AUTO |
DeepLearning__early_stopping | If true, parameters for early stopping needs to be specified. | default: false |
DeepLearning__epochs | How many times the dataset should be iterated (streamed), can be fractional | default: 10.0 |
DeepLearning__epsilon | The optimization is stopped if the training error gets below this epsilon value. | default: 1.0E-8 |
DeepLearning__expert_parameters | Advanced parameters that can be set. | |
DeepLearning__expert_parameters_ | Advanced parameters that can be set. | |
DeepLearning__hidden_dropout_ratios | A fraction of the inputs for each hidden layer to be omitted from training in order to improve generalization. Defaults to 0.5 for each hidden layer if omitted. | |
DeepLearning__hidden_layer_sizes | Describes the size of all hidden layers. | default: 50,50 |
DeepLearning__learning_rate | The learning rate, alpha. Higher values lead to less stable models, while lower values lead to slower convergence. Default is 0.005 | default: 0.005 |
DeepLearning__learning_rate_annealing | Learning rate annealing reduces the learning rate to "freeze" into local minima in the optimization landscape. The annealing rate is the inverse of the number of training samples it takes to cut the learning rate in half (e.g., 1e-6 means that it takes 1e6 training samples to halve the learning rate). This parameter is only active if adaptive learning rate is disabled. | default: 1.0E-6 |
DeepLearning__learning_rate_decay | The learning rate decay parameter controls the change of learning rate across layers. For example, assume the rate parameter is set to 0.01, and the rate_decay parameter is set to 0.5. Then the learning rate for the weights connecting the input and first hidden layer will be 0.01, the learning rate for the weights connecting the first and the second hidden layer will be 0.005, and the learning rate for the weights connecting the second and third hidden layer will be 0.0025, etc. This parameter is only active if adaptive learning rate is disabled. | default: 1.0 |
DeepLearning__local_random_seed | Specifies the local random seed | default: 1992 |
DeepLearning__loss_function | Loss function. | default: Automatic |
DeepLearning__max_runtime_seconds | Maximum allowed runtime in seconds for model training. Use 0 to disable. | default: 0 |
DeepLearning__max_w2 | Constraint for squared sum of incoming weights per unit | default: 10.0 |
DeepLearning__missing_values_handling | Handling of missing values. Either Skip or MeanImputation. | default: MeanImputation |
DeepLearning__momentum_ramp | The momentum_ramp parameter controls the amount of learning for which momentum increases (assuming momentum_stable is larger than momentum_start). The ramp is measured in the number of training samples. This parameter is only active if adaptive learning rate is disabled. | default: 1000000.0 |
DeepLearning__momentum_stable | The momentum_stable parameter controls the final momentum value reached after momentum_ramp training samples. The momentum used for training will remain the same for training beyond reaching that point. This parameter is only active if adaptive learning rate is disabled. | default: 0.0 |
DeepLearning__momentum_start | The momentum_start parameter controls the amount of momentum at the beginning of training. This parameter is only active if adaptive learning rate is disabled. | default: 0.0 |
DeepLearning__nesterov_accelerated_gradient | The Nesterov accelerated gradient descent method is a modification to traditional gradient descent for convex functions. The method relies on gradient information at various points to build a polynomial approximation that minimizes the residuals in fewer iterations of the descent. This parameter is only active if adaptive learning rate is disabled. | default: true |
DeepLearning__reproducible_(uses_1_thread) | Force reproducibility on small data (WARNING: will be slow - only uses 1 thread). | default: false |
DeepLearning__rho | It is similar to momentum and relates to the memory to prior weight updates.
Typical values are between 0.9 and 0.999. | default: 0.99 |
DeepLearning__standardize | If enabled, automatically standardize the data. If disabled, the user must provide properly scaled input data. | default: true |
DeepLearning__stopping_metric | Metric to use for early stopping (AUTO: logloss for classification, deviance for regression) | default: AUTO |
DeepLearning__stopping_rounds | Early stopping based on convergence of stopping_metric. Stop if simple moving average of length k of the stopping_metric does not improve for k:=stopping_rounds scoring events. | default: 1 |
DeepLearning__stopping_tolerance | Relative tolerance for metric-based stopping criterion (stop if relative improvement is not at least this much). | default: 0.001 |
DeepLearning__train_samples_per_iteration | The number of training data rows to be processed per iteration. Note that independent of this parameter, each row is used immediately to update the model with (online) stochastic gradient descent. This parameter controls the frequency at which scoring and model cancellation can happen. Special values are 0 for one epoch per iteration, -1 for processing the maximum amount of data per iteration. Special value of -2 turns on automatic mode (auto-tuning). | default: -2 |
DeepLearning__use_local_random_seed | Indicates if a local random seed should be used. | default: false |