Flow
weka.SimpleKMeans_EuclideanDistance

weka.SimpleKMeans_EuclideanDistance

Visibility: public Uploaded 07-10-2014 by Joaquin Vanschoren Weka_3.7.12-SNAPSHOT 0 runs
0 likes downloaded by 0 people 0 issues 0 downvotes , 0 total downloads
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
D. Arthur, S. Vassilvitskii: k-means++: the advantages of carefull seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 1027-1035, 2007.

Components

Aweka.EuclideanDistance(2)Distance function to use. (default: weka.core.EuclideanDistance)

Parameters

-do-not-check-capabilitiesIf set, clusterer capabilities are not checked before clusterer is built (use with caution).
-max-candidatesMaximum number of candidate canopies to retain in memory at any one time when using canopy clustering. T2 distance plus, data characteristics, will determine how many candidate canopies are formed before periodic and final pruning are performed, which might result in exceess memory consumption. This setting avoids large numbers of candidate canopies consuming memory. (default = 100)
ADistance function to use. (default: weka.core.EuclideanDistance)default: weka.core.EuclideanDistance
CUse canopies to reduce the number of distance calculations.
IMaximum number of iterations.default: 500
MDon't replace missing values with mean/mode.
NNumber of clusters. (default 2).default: 2
OPreserve order of instances.
SRandom number seed. (default 10)default: 10
VDisplay std. deviations for centroids.
fastEnables faster distance calculations, using cut-off values. Disables the calculation/output of squared errors/distances.
initInitialization method to use. 0 = random, 1 = k-means++, 2 = canopy, 3 = farthest first. (default = 0)default: 0
min-densityMinimum canopy density, when using canopy clustering, below which a canopy will be pruned during periodic pruning. (default = 2 instances)default: 2.0
num-slotsNumber of execution slots. (default 1 - i.e. no parallelism)default: 1
output-debug-infoIf set, clusterer is run in debug mode and may output additional info to the console
periodic-pruningHow often to prune low density canopies when using canopy clustering. (default = every 10,000 training instances)default: 10000
t1The T1 distance to use when using canopy clustering. A value < 0 is taken as a positive multiplier for T2. (default = -1.5)default: -1.25
t2The T2 distance to use when using canopy clustering. Values < 0 indicate that a heuristic based on attribute std. deviation should be used to set this. (default = -1.0)default: -1.0

0
Runs

List all runs
Parameter:
Rendering chart
Rendering table