Flow
weka.filters.unsupervised.attribute.StringToWordVector(weka.core.tokenizers.WordTokenizer)

weka.filters.unsupervised.attribute.StringToWordVector(weka.core.tokenizers.WordTokenizer)

Visibility: public Uploaded 16-03-2019 by Jan van Rijn Weka_3.9.3 0 runs
0 likes downloaded by 0 people 0 issues 0 downvotes , 0 total downloads
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Weka implementation.

Parameters

-stopwords-handlerThe stopwords handler to use (default Null).default: []
COutput word counts rather than boolean word presence.default: ["false"]
ITransform each word frequency into: fij*log(num of Documents/num of documents containing word i) where fij if frequency of word i in jth document(instance)default: ["false"]
LConvert all tokens to lowercase before adding to the dictionary.default: ["false"]
MThe minimum term frequency (default = 1).default: ["1"]
NWhether to 0=not normalize/1=normalize all data/2=normalize test data only to average length of training documents (default 0=don't normalize).default: ["0"]
OIf this is set, the maximum number of words and the minimum term frequency is not enforced on a per-class basis but based on the documents in all the classes (even if a class attribute is set).default: ["false"]
PSpecify a prefix for the created attribute names. (default: "")default: []
RSpecify list of string attributes to convert to words (as weka Range). (default: select all string attributes)default: ["first-last"]
TTransform the word frequencies into log(1+fij) where fij is the frequency of word i in jth document(instance).default: ["false"]
VInvert matching sense of column indexes.default: ["false"]
WSpecify approximate number of word fields to create. Surplus words will be discarded.. (default: 1000)default: ["1000"]
binary-dictSave the dictionary file as a binary serialized object instead of in plain text form. Use in conjunction with -dictionarydefault: ["false"]
dictionaryThe file to save the dictionary to. (default is not to save the dictionary)default: []
prune-rateSpecify the rate (e.g., every 10% of the input dataset) at which to periodically prune the dictionary. -W prunes after creating a full dictionary. You may not have enough memory for this approach. (default: no periodic pruning)default: ["-1.0"]
stemmerThe stemming algorithm (classname plus parameters) to use.default: ["weka.core.stemmers.NullStemmer"]
tokenizerThe tokenizing algorithm (classname plus parameters) to use. (default: weka.core.tokenizers.WordTokenizer)default: ["weka.core.tokenizers.WordTokenizer"]

0
Runs

List all runs
Parameter:
Rendering chart
Rendering table