preprocess module¶
-
class
preprocess.
PreProcessor
[source]¶ Bases:
object
The preprocessor class that handles binarization and discretization of dataset
-
add_transaction
(fields)[source]¶ Adds the preprocessed fields into the transaction list
Parameters: fields – Preprocessed fields (list) Returns:
-
binarize
(mapper, col_data)[source]¶ Binarize the attribute data using mapper
Parameters: - col_data – Categorical data assumed to be between -9 and 25
- mapper – Corresponding mapper of this field
Returns: Binarized field (str)
-
count_unique
(fields)[source]¶ Updates the count of unique fields
Parameters: fields – List of fields Returns:
-
discretize
(mapper, col_data)[source]¶ Used to discretize the continous values from the given mapper and value
Parameters: - mapper – Mapper of the continious field (Mapper Class)
- col_data – Value of the continous field (float)
Returns: Returns discretized name of the field (string)
-
get_field
(line, mapper)[source]¶ Selects the appropriate preprocessing method according to line and mapper structure
Parameters: - line – New line of data (str)
- mapper – Corresponding mapper structure of the field
Returns: Preprocessed field (str)
-
get_transaction_count
()[source]¶ Getter method for transaction count after parsing the file
Returns: Transaction count (int)
-
get_transactions
()[source]¶ Getter method for transactions list of OrderedDict
Returns: Transactions list (list(OrderedDict))
-
get_uniques
()[source]¶ Getter method for unique itemsets (dictionary)
Returns: Unique itemsets (dict), key: items, value: counts
-