Output word counts rather than boolean word presence.
default: ["false"]
I
Transform each word frequency into:
fij*log(num of Documents/num of documents containing word i)
where fij if frequency of word i in jth document(instance)
default: ["false"]
L
Convert all tokens to lowercase before adding to the dictionary.
default: ["false"]
M
The minimum term frequency (default = 1).
default: ["1"]
N
Whether to 0=not normalize/1=normalize all data/2=normalize test data only
to average length of training documents (default 0=don't normalize).
default: ["0"]
O
If this is set, the maximum number of words and the
minimum term frequency is not enforced on a per-class
basis but based on the documents in all the classes
(even if a class attribute is set).
default: ["false"]
P
Specify a prefix for the created attribute names.
(default: "")
default: []
R
Specify list of string attributes to convert to words (as weka Range).
(default: select all string attributes)
default: ["first-last"]
T
Transform the word frequencies into log(1+fij)
where fij is the frequency of word i in jth document(instance).
default: ["false"]
V
Invert matching sense of column indexes.
default: ["false"]
W
Specify approximate number of word fields to create.
Surplus words will be discarded..
(default: 1000)
default: ["1000"]
binary-dict
Save the dictionary file as a binary serialized object
instead of in plain text form. Use in conjunction with
-dictionary
default: ["false"]
dictionary
The file to save the dictionary to.
(default is not to save the dictionary)
default: []
prune-rate
Specify the rate (e.g., every 10% of the input dataset) at which to periodically prune the dictionary.
-W prunes after creating a full dictionary. You may not have enough memory for this approach.
(default: no periodic pruning)
default: ["-1.0"]
stemmer
The stemming algorithm (classname plus parameters) to use.
default: ["weka.core.stemmers.NullStemmer"]
tokenizer
The tokenizing algorithm (classname plus parameters) to use.
(default: weka.core.tokenizers.WordTokenizer)