OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

weka.filters.unsupervised.attribute.StringToWordVector(weka.core.tokenizers.WordTokenizer)

Visibility: public Uploaded 16-03-2019 by Jan van Rijn Weka_3.9.3 0 runs
0 likes downloaded by 0 people 0 issues 0 downvotes , 0 total downloads

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Weka implementation.

Parameters

-stopwords-handler	The stopwords handler to use (default Null).	default: []
C	Output word counts rather than boolean word presence.	default: ["false"]
I	Transform each word frequency into: fij*log(num of Documents/num of documents containing word i) where fij if frequency of word i in jth document(instance)	default: ["false"]
L	Convert all tokens to lowercase before adding to the dictionary.	default: ["false"]
M	The minimum term frequency (default = 1).	default: ["1"]
N	Whether to 0=not normalize/1=normalize all data/2=normalize test data only to average length of training documents (default 0=don't normalize).	default: ["0"]
O	If this is set, the maximum number of words and the minimum term frequency is not enforced on a per-class basis but based on the documents in all the classes (even if a class attribute is set).	default: ["false"]
P	Specify a prefix for the created attribute names. (default: "")	default: []
R	Specify list of string attributes to convert to words (as weka Range). (default: select all string attributes)	default: ["first-last"]
T	Transform the word frequencies into log(1+fij) where fij is the frequency of word i in jth document(instance).	default: ["false"]
V	Invert matching sense of column indexes.	default: ["false"]
W	Specify approximate number of word fields to create. Surplus words will be discarded.. (default: 1000)	default: ["1000"]
binary-dict	Save the dictionary file as a binary serialized object instead of in plain text form. Use in conjunction with -dictionary	default: ["false"]
dictionary	The file to save the dictionary to. (default is not to save the dictionary)	default: []
prune-rate	Specify the rate (e.g., every 10% of the input dataset) at which to periodically prune the dictionary. -W prunes after creating a full dictionary. You may not have enough memory for this approach. (default: no periodic pruning)	default: ["-1.0"]
stemmer	The stemming algorithm (classname plus parameters) to use.	default: ["weka.core.stemmers.NullStemmer"]
tokenizer	The tokenizing algorithm (classname plus parameters) to use. (default: weka.core.tokenizers.WordTokenizer)	default: ["weka.core.tokenizers.WordTokenizer"]

0
Runs

List all runs

Parameter:

Rendering chart

Rendering table

Sign in

weka.filters.unsupervised.attribute.StringToWordVector(weka.core.tokenizers.WordTokenizer)

Parameters

0 Runs

0
Runs