preprocess module

class preprocess.PreProcessor[source]

Bases: object

The preprocessor class that handles binarization and discretization of dataset

class Mapper[source]

Bases: object

Used to map the fields

add_transaction(fields)[source]

Adds the preprocessed fields into the transaction list

Parameters:fields – Preprocessed fields (list)
Returns:
binarize(mapper, col_data)[source]

Binarize the attribute data using mapper

Parameters:
  • col_data – Categorical data assumed to be between -9 and 25
  • mapper – Corresponding mapper of this field
Returns:

Binarized field (str)

count_unique(fields)[source]

Updates the count of unique fields

Parameters:fields – List of fields
Returns:
discretize(mapper, col_data)[source]

Used to discretize the continous values from the given mapper and value

Parameters:
  • mapper – Mapper of the continious field (Mapper Class)
  • col_data – Value of the continous field (float)
Returns:

Returns discretized name of the field (string)

get_field(line, mapper)[source]

Selects the appropriate preprocessing method according to line and mapper structure

Parameters:
  • line – New line of data (str)
  • mapper – Corresponding mapper structure of the field
Returns:

Preprocessed field (str)

get_transaction_count()[source]

Getter method for transaction count after parsing the file

Returns:Transaction count (int)
get_transactions()[source]

Getter method for transactions list of OrderedDict

Returns:Transactions list (list(OrderedDict))
get_uniques()[source]

Getter method for unique itemsets (dictionary)

Returns:Unique itemsets (dict), key: items, value: counts
parse_file(file)[source]

The main function to parse the file and run the preprocesser methods

Parameters:file – Filepath of the data
Returns:Returns number of the transaction parsed (int)
save_transactions(path='transactions.csv')[source]

Save the preprocessed transactions into a file

Parameters:path – Path to be saved
Returns:Returns true on successful save