19159
32117
sklearn.pipeline.Pipeline(numerical=sklearn.pipeline.Pipeline(Imputer=sklearn.impute._base.SimpleImputer,scaler=sklearn.preprocessing._data.StandardScaler),model=sklearn.linear_model._stochastic_gradient.SGDClassifier)
sklearn.Pipeline(Pipeline,SGDClassifier)
sklearn.pipeline.Pipeline
1
openml==0.12.2,sklearn==1.0.2
Pipeline of transforms with a final estimator.
Sequentially apply a list of transforms and a final estimator.
Intermediate steps of the pipeline must be 'transforms', that is, they
must implement `fit` and `transform` methods.
The final estimator only needs to implement `fit`.
The transformers in the pipeline can be cached using ``memory`` argument.
The purpose of the pipeline is to assemble several steps that can be
cross-validated together while setting different parameters. For this, it
enables setting parameters of the various steps using their names and the
parameter name separated by a `'__'`, as in the example below. A step's
estimator may be replaced entirely by setting the parameter with its name
to another estimator, or a transformer removed by setting it to
`'passthrough'` or `None`.
2022-09-25T22:43:28
English
sklearn==1.0.2
numpy>=1.14.6
scipy>=1.1.0
joblib>=0.11
threadpoolctl>=2.0.0
memory
str or object with the joblib
null
Used to cache the fitted transformers of the pipeline. By default,
no caching is performed. If a string is given, it is the path to
the caching directory. Enabling caching triggers a clone of
the transformers before fitting. Therefore, the transformer
instance given to the pipeline cannot be inspected
directly. Use the attribute ``named_steps`` or ``steps`` to
inspect estimators within the pipeline. Caching the
transformers is advantageous when fitting is time consuming
steps
list of tuple
[{"oml-python:serialized_object": "component_reference", "value": {"key": "numerical", "step_name": "numerical"}}, {"oml-python:serialized_object": "component_reference", "value": {"key": "model", "step_name": "model"}}]
List of (name, transform) tuples (implementing `fit`/`transform`) that
are chained, in the order in which they are chained, with the last
object an estimator
verbose
bool
false
If True, the time elapsed while fitting each step will be printed as it
is completed.
numerical
19156
32117
sklearn.pipeline.Pipeline(Imputer=sklearn.impute._base.SimpleImputer,scaler=sklearn.preprocessing._data.StandardScaler)
sklearn.Pipeline(SimpleImputer,StandardScaler)
sklearn.pipeline.Pipeline
2
openml==0.12.2,sklearn==1.0.2
Pipeline of transforms with a final estimator.
Sequentially apply a list of transforms and a final estimator.
Intermediate steps of the pipeline must be 'transforms', that is, they
must implement `fit` and `transform` methods.
The final estimator only needs to implement `fit`.
The transformers in the pipeline can be cached using ``memory`` argument.
The purpose of the pipeline is to assemble several steps that can be
cross-validated together while setting different parameters. For this, it
enables setting parameters of the various steps using their names and the
parameter name separated by a `'__'`, as in the example below. A step's
estimator may be replaced entirely by setting the parameter with its name
to another estimator, or a transformer removed by setting it to
`'passthrough'` or `None`.
2022-09-25T00:28:40
English
sklearn==1.0.2
numpy>=1.14.6
scipy>=1.1.0
joblib>=0.11
threadpoolctl>=2.0.0
memory
str or object with the joblib
null
Used to cache the fitted transformers of the pipeline. By default,
no caching is performed. If a string is given, it is the path to
the caching directory. Enabling caching triggers a clone of
the transformers before fitting. Therefore, the transformer
instance given to the pipeline cannot be inspected
directly. Use the attribute ``named_steps`` or ``steps`` to
inspect estimators within the pipeline. Caching the
transformers is advantageous when fitting is time consuming
steps
list of tuple
[{"oml-python:serialized_object": "component_reference", "value": {"key": "Imputer", "step_name": "Imputer"}}, {"oml-python:serialized_object": "component_reference", "value": {"key": "scaler", "step_name": "scaler"}}]
List of (name, transform) tuples (implementing `fit`/`transform`) that
are chained, in the order in which they are chained, with the last
object an estimator
verbose
bool
false
If True, the time elapsed while fitting each step will be printed as it
is completed.
scaler
19075
29787
sklearn.preprocessing._data.StandardScaler
sklearn.StandardScaler
sklearn.preprocessing._data.StandardScaler
11
openml==0.12.2,sklearn==1.0.2
Standardize features by removing the mean and scaling to unit variance.
The standard score of a sample `x` is calculated as:
z = (x - u) / s
where `u` is the mean of the training samples or zero if `with_mean=False`,
and `s` is the standard deviation of the training samples or one if
`with_std=False`.
Centering and scaling happen independently on each feature by computing
the relevant statistics on the samples in the training set. Mean and
standard deviation are then stored to be used on later data using
:meth:`transform`.
Standardization of a dataset is a common requirement for many
machine learning estimators: they might behave badly if the
individual features do not more or less look like standard normally
distributed data (e.g. Gaussian with 0 mean and unit variance).
For instance many elements used in the objective function of
a learning algorithm (such as the RBF kernel of Support Vector
Machines or the L1 and L2 regularizers of linear models) assume that
all features are centered around 0 ...
2022-02-18T05:01:48
English
sklearn==1.0.2
numpy>=1.14.6
scipy>=1.1.0
joblib>=0.11
threadpoolctl>=2.0.0
copy
bool
true
If False, try to avoid a copy and do inplace scaling instead
This is not guaranteed to always work inplace; e.g. if the data is
not a NumPy array or scipy.sparse CSR matrix, a copy may still be
returned
with_mean
bool
true
If True, center the data before scaling
This does not work (and will raise an exception) when attempted on
sparse matrices, because centering them entails building a dense
matrix which in common use cases is likely to be too large to fit in
memory
with_std
bool
true
If True, scale the data to unit variance (or equivalently,
unit standard deviation).
openml-python
python
scikit-learn
sklearn
sklearn_1.0.2
Imputer
19084
29930
sklearn.impute._base.SimpleImputer
sklearn.SimpleImputer
sklearn.impute._base.SimpleImputer
30
openml==0.12.2,sklearn==1.0.2
Imputation transformer for completing missing values.
2022-03-08T11:40:33
English
sklearn==1.0.2
numpy>=1.14.6
scipy>=1.1.0
joblib>=0.11
threadpoolctl>=2.0.0
add_indicator
bool
false
If True, a :class:`MissingIndicator` transform will stack onto output
of the imputer's transform. This allows a predictive estimator
to account for missingness despite imputation. If a feature has no
missing values at fit/train time, the feature won't appear on
the missing indicator even if there are missing values at
transform/test time.
copy
bool
true
If True, a copy of `X` will be created. If False, imputation will
be done in-place whenever possible. Note that, in the following cases,
a new copy will always be made, even if `copy=False`:
- If `X` is not an array of floating values;
- If `X` is encoded as a CSR matrix;
- If `add_indicator=True`
fill_value
str or numerical value
null
When strategy == "constant", fill_value is used to replace all
occurrences of missing_values
If left to the default, fill_value will be 0 when imputing numerical
data and "missing_value" for strings or object data types
missing_values
int
NaN
The placeholder for the missing values. All occurrences of
`missing_values` will be imputed. For pandas' dataframes with
nullable integer dtypes with missing values, `missing_values`
should be set to `np.nan`, since `pd.NA` will be converted to `np.nan`
strategy
str
"mean"
The imputation strategy
- If "mean", then replace missing values using the mean along
each column. Can only be used with numeric data
- If "median", then replace missing values using the median along
each column. Can only be used with numeric data
- If "most_frequent", then replace missing using the most frequent
value along each column. Can be used with strings or numeric data
If there is more than one such value, only the smallest is returned
- If "constant", then replace missing values with fill_value. Can be
used with strings or numeric data
.. versionadded:: 0.20
strategy="constant" for fixed value imputation
verbose
int
0
Controls the verbosity of the imputer
openml-python
python
scikit-learn
sklearn
sklearn_1.0.2
openml-python
python
scikit-learn
sklearn
sklearn_1.0.2
model
19160
32117
sklearn.linear_model._stochastic_gradient.SGDClassifier
sklearn.SGDClassifier
sklearn.linear_model._stochastic_gradient.SGDClassifier
3
openml==0.12.2,sklearn==1.0.2
Linear classifiers (SVM, logistic regression, etc.) with SGD training.
This estimator implements regularized linear models with stochastic
gradient descent (SGD) learning: the gradient of the loss is estimated
each sample at a time and the model is updated along the way with a
decreasing strength schedule (aka learning rate). SGD allows minibatch
(online/out-of-core) learning via the `partial_fit` method.
For best results using the default learning rate schedule, the data should
have zero mean and unit variance.
This implementation works with data represented as dense or sparse arrays
of floating point values for the features. The model it fits can be
controlled with the loss parameter; by default, it fits a linear support
vector machine (SVM).
The regularizer is a penalty added to the loss function that shrinks model
parameters towards the zero vector using either the squared euclidean norm
L2 or the absolute norm L1 or a combination of both (Elastic Net). If the
parameter update crosses the 0.0 value ...
2022-09-25T22:43:28
English
sklearn==1.0.2
numpy>=1.14.6
scipy>=1.1.0
joblib>=0.11
threadpoolctl>=2.0.0
alpha
float
0.0001
Constant that multiplies the regularization term. The higher the
value, the stronger the regularization
Also used to compute the learning rate when set to `learning_rate` is
set to 'optimal'
average
bool or int
false
When set to True, computes the averaged SGD weights across all
updates and stores the result in the ``coef_`` attribute. If set to
an int greater than 1, averaging will begin once the total number of
samples seen reaches `average`. So ``average=10`` will begin
averaging after seeing 10 samples.
class_weight
dict
null
Preset for the class_weight fit parameter
Weights associated with classes. If not given, all classes
are supposed to have weight one
The "balanced" mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as ``n_samples / (n_classes * np.bincount(y))``
early_stopping
bool
false
Whether to use early stopping to terminate training when validation
score is not improving. If set to True, it will automatically set aside
a stratified fraction of training data as validation and terminate
training when validation score returned by the `score` method is not
improving by at least tol for n_iter_no_change consecutive epochs
.. versionadded:: 0.20
Added 'early_stopping' option
epsilon
float
0.1
Epsilon in the epsilon-insensitive loss functions; only if `loss` is
'huber', 'epsilon_insensitive', or 'squared_epsilon_insensitive'
For 'huber', determines the threshold at which it becomes less
important to get the prediction exactly right
For epsilon-insensitive, any differences between the current prediction
and the correct label are ignored if they are less than this threshold
eta0
float
0.0
The initial learning rate for the 'constant', 'invscaling' or
'adaptive' schedules. The default value is 0.0 as eta0 is not used by
the default schedule 'optimal'
fit_intercept
bool
true
Whether the intercept should be estimated or not. If False, the
data is assumed to be already centered
l1_ratio
float
0.15
The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1
l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1
Only used if `penalty` is 'elasticnet'
learning_rate
str
"optimal"
The learning rate schedule:
- 'constant': `eta = eta0`
- 'optimal': `eta = 1.0 / (alpha * (t + t0))`
where t0 is chosen by a heuristic proposed by Leon Bottou
- 'invscaling': `eta = eta0 / pow(t, power_t)`
- 'adaptive': eta = eta0, as long as the training keeps decreasing
Each time n_iter_no_change consecutive epochs fail to decrease the
training loss by tol or fail to increase validation score by tol if
early_stopping is True, the current learning rate is divided by 5
.. versionadded:: 0.20
Added 'adaptive' option
loss
str
"hinge"
The loss function to be used. Defaults to 'hinge', which gives a
linear SVM
The possible options are 'hinge', 'log', 'modified_huber',
'squared_hinge', 'perceptron', or a regression loss: 'squared_error',
'huber', 'epsilon_insensitive', or 'squared_epsilon_insensitive'
The 'log' loss gives logistic regression, a probabilistic classifier
'modified_huber' is another smooth loss that brings tolerance to
outliers as well as probability estimates
'squared_hinge' is like hinge but is quadratically penalized
'perceptron' is the linear loss used by the perceptron algorithm
The other losses are designed for regression but can be useful in
classification as well; see
:class:`~sklearn.linear_model.SGDRegressor` for a description
More details about the losses formulas can be found in the
:ref:`User Guide <sgd_mathematical_formulation>`
.. deprecated:: 1.0
The loss 'squared_loss' was deprecated in v1.0 and will be removed
in version 1.2. Us...
max_iter
int
1000
The maximum number of passes over the training data (aka epochs)
It only impacts the behavior in the ``fit`` method, and not the
:meth:`partial_fit` method
.. versionadded:: 0.19
n_iter_no_change
int
5
Number of iterations with no improvement to wait before stopping
fitting
Convergence is checked against the training loss or the
validation loss depending on the `early_stopping` parameter
.. versionadded:: 0.20
Added 'n_iter_no_change' option
n_jobs
int
null
The number of CPUs to use to do the OVA (One Versus All, for
multi-class problems) computation
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details
penalty
"l2"
power_t
float
0.5
The exponent for inverse scaling learning rate [default 0.5]
random_state
int
null
Used for shuffling the data, when ``shuffle`` is set to ``True``
Pass an int for reproducible output across multiple function calls
See :term:`Glossary <random_state>`
shuffle
bool
true
Whether or not the training data should be shuffled after each epoch
tol
float
0.001
The stopping criterion. If it is not None, training will stop
when (loss > best_loss - tol) for ``n_iter_no_change`` consecutive
epochs
Convergence is checked against the training loss or the
validation loss depending on the `early_stopping` parameter
.. versionadded:: 0.19
validation_fraction
float
0.1
The proportion of training data to set aside as validation set for
early stopping. Must be between 0 and 1
Only used if `early_stopping` is True
.. versionadded:: 0.20
Added 'validation_fraction' option
verbose
int
0
The verbosity level
warm_start
bool
false
When set to True, reuse the solution of the previous call to fit as
initialization, otherwise, just erase the previous solution
See :term:`the Glossary <warm_start>`
Repeatedly calling fit or partial_fit when warm_start is True can
result in a different solution than when calling fit a single time
because of the way the data is shuffled
If a dynamic learning rate is used, the learning rate is adapted
depending on the number of samples already seen. Calling ``fit`` resets
this counter, while ``partial_fit`` will result in increasing the
existing counter
openml-python
python
scikit-learn
sklearn
sklearn_1.0.2
openml-python
python
scikit-learn
sklearn
sklearn_1.0.2