18591 8323 sklearn.pipeline.Pipeline(simpleimputer=sklearn.impute._base.SimpleImputer,standardscaler=sklearn.preprocessing.data.StandardScaler,logisticregression=sklearn.linear_model.logistic.LogisticRegression) sklearn.Pipeline(SimpleImputer,StandardScaler,LogisticRegression) sklearn.pipeline.Pipeline 1 openml==0.10.2,sklearn==0.21.2 Pipeline of transforms with a final estimator. Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be 'transforms', that is, they must implement fit and transform methods. The final estimator only needs to implement fit. The transformers in the pipeline can be cached using ``memory`` argument. The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters. For this, it enables setting parameters of the various steps using their names and the parameter name separated by a '__', as in the example below. A step's estimator may be replaced entirely by setting the parameter with its name to another estimator, or a transformer removed by setting it to 'passthrough' or ``None``. 2020-07-10T21:14:10 English sklearn==0.21.2 numpy>=1.6.1 scipy>=0.9 memory None null Used to cache the fitted transformers of the pipeline. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute ``named_steps`` or ``steps`` to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming steps list [{"oml-python:serialized_object": "component_reference", "value": {"key": "simpleimputer", "step_name": "simpleimputer"}}, {"oml-python:serialized_object": "component_reference", "value": {"key": "standardscaler", "step_name": "standardscaler"}}, {"oml-python:serialized_object": "component_reference", "value": {"key": "logisticregression", "step_name": "logisticregression"}}] List of (name, transform) tuples (implementing fit/transform) that are chained, in the order in which they are chained, with the last object an estimator verbose boolean false If True, the time elapsed while fitting each step will be printed as it is completed. standardscaler 17405 1 sklearn.preprocessing.data.StandardScaler sklearn.StandardScaler sklearn.preprocessing.data.StandardScaler 35 openml==0.10.2,sklearn==0.21.2 Standardize features by removing the mean and scaling to unit variance The standard score of a sample `x` is calculated as: z = (x - u) / s where `u` is the mean of the training samples or zero if `with_mean=False`, and `s` is the standard deviation of the training samples or one if `with_std=False`. Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the `transform` method. Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance). For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered aroun... 2019-11-22T01:19:36 English sklearn==0.21.2 numpy>=1.6.1 scipy>=0.9 copy boolean true If False, try to avoid a copy and do inplace scaling instead This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned with_mean boolean true If True, center the data before scaling This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory with_std boolean true If True, scale the data to unit variance (or equivalently, unit standard deviation). openml-python python scikit-learn sklearn sklearn_0.21.2 simpleimputer 17407 1 sklearn.impute._base.SimpleImputer sklearn.SimpleImputer sklearn.impute._base.SimpleImputer 11 openml==0.10.2,sklearn==0.21.2 Imputation transformer for completing missing values. 2019-11-22T01:19:36 English sklearn==0.21.2 numpy>=1.6.1 scipy>=0.9 add_indicator boolean false If True, a `MissingIndicator` transform will stack onto output of the imputer's transform. This allows a predictive estimator to account for missingness despite imputation. If a feature has no missing values at fit/train time, the feature won't appear on the missing indicator even if there are missing values at transform/test time. copy boolean true If True, a copy of X will be created. If False, imputation will be done in-place whenever possible. Note that, in the following cases, a new copy will always be made, even if `copy=False`: - If X is not an array of floating values; - If X is encoded as a CSR matrix; - If add_indicator=True fill_value string or numerical value -1 When strategy == "constant", fill_value is used to replace all occurrences of missing_values If left to the default, fill_value will be 0 when imputing numerical data and "missing_value" for strings or object data types missing_values number NaN The placeholder for the missing values. All occurrences of `missing_values` will be imputed strategy string "constant" The imputation strategy - If "mean", then replace missing values using the mean along each column. Can only be used with numeric data - If "median", then replace missing values using the median along each column. Can only be used with numeric data - If "most_frequent", then replace missing using the most frequent value along each column. Can be used with strings or numeric data - If "constant", then replace missing values with fill_value. Can be used with strings or numeric data .. versionadded:: 0.20 strategy="constant" for fixed value imputation verbose integer 0 Controls the verbosity of the imputer openml-python python scikit-learn sklearn sklearn_0.21.2 logisticregression 17462 10792 sklearn.linear_model.logistic.LogisticRegression sklearn.LogisticRegression sklearn.linear_model.logistic.LogisticRegression 33 openml==0.10.2,sklearn==0.21.2 Logistic Regression (aka logit, MaxEnt) classifier. In the multiclass case, the training algorithm uses the one-vs-rest (OvR) scheme if the 'multi_class' option is set to 'ovr', and uses the cross-entropy loss if the 'multi_class' option is set to 'multinomial'. (Currently the 'multinomial' option is supported only by the 'lbfgs', 'sag', 'saga' and 'newton-cg' solvers.) This class implements regularized logistic regression using the 'liblinear' library, 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers. **Note that regularization is applied by default**. It can handle both dense and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied). The 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization with primal formulation, or no regularization. The 'liblinear' solver supports both L1 and L2 regularization, with a dual formulation only for the L2 penalty. The Elastic-Net regularization is only su... 2019-12-16T00:38:14 English sklearn==0.21.2 numpy>=1.6.1 scipy>=0.9 C float 100000000 Inverse of regularization strength; must be a positive float Like in support vector machines, smaller values specify stronger regularization class_weight dict or null Weights associated with classes in the form ``{class_label: weight}`` If not given, all classes are supposed to have weight one The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))`` Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified .. versionadded:: 0.17 *class_weight='balanced'* dual bool false Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features fit_intercept bool true Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function intercept_scaling float 1 Useful only when the solver 'liblinear' is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a "synthetic" feature with constant value equal to intercept_scaling is appended to the instance vector The intercept becomes ``intercept_scaling * synthetic_feature_weight`` Note! the synthetic feature weight is subject to l1/l2 regularization as all other features To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased l1_ratio float or None null The Elastic-Net mixing parameter, with ``0 <= l1_ratio <= 1``. Only used if ``penalty='elasticnet'`. Setting ``l1_ratio=0`` is equivalent to using ``penalty='l2'``, while setting ``l1_ratio=1`` is equivalent to using ``penalty='l1'``. For ``0 < l1_ratio <1``, the penalty is a combination of L1 and L2. max_iter int 100 Maximum number of iterations taken for the solvers to converge multi_class str "warn" If the option chosen is 'ovr', then a binary problem is fit for each label. For 'multinomial' the loss minimised is the multinomial loss fit across the entire probability distribution, *even when the data is binary*. 'multinomial' is unavailable when solver='liblinear' 'auto' selects 'ovr' if the data is binary, or if solver='liblinear', and otherwise selects 'multinomial' .. versionadded:: 0.18 Stochastic Average Gradient descent solver for 'multinomial' case .. versionchanged:: 0.20 Default will change from 'ovr' to 'auto' in 0.22 n_jobs int or None null Number of CPU cores used when parallelizing over classes if multi_class='ovr'". This parameter is ignored when the ``solver`` is set to 'liblinear' regardless of whether 'multi_class' is specified or not. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors See :term:`Glossary <n_jobs>` for more details penalty str "l2" Used to specify the norm used in the penalization. The 'newton-cg', 'sag' and 'lbfgs' solvers support only l2 penalties. 'elasticnet' is only supported by the 'saga' solver. If 'none' (not supported by the liblinear solver), no regularization is applied .. versionadded:: 0.19 l1 penalty with SAGA solver (allowing 'multinomial' + L1) random_state int 22823 The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`. Used when ``solver`` == 'sag' or 'liblinear' solver str "warn" Algorithm to use in the optimization problem - For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones - For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs' handle multinomial loss; 'liblinear' is limited to one-versus-rest schemes - 'newton-cg', 'lbfgs', 'sag' and 'saga' handle L2 or no penalty - 'liblinear' and 'saga' also handle L1 penalty - 'saga' also supports 'elasticnet' penalty - 'liblinear' does not handle no penalty Note that 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing .. versionadded:: 0.17 Stochastic Average Gradient descent solver .. versionadded:: 0.19 SAGA solver .. versionchanged:: 0.20 Default will change from 'liblinear' to 'lbfgs' in 0.22 tol float 0.0001 Tolerance for stopping criteria verbose int 0 For the liblinear and lbfgs solvers set verbose to any positive number for verbosity warm_start bool false When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution Useless for liblinear solver. See :term:`the Glossary <warm_start>` .. versionadded:: 0.17 *warm_start* to support *lbfgs*, *newton-cg*, *sag*, *saga* solvers openml-python python scikit-learn sklearn sklearn_0.21.2 openml-python python scikit-learn sklearn sklearn_0.21.2