Welcome to Apriori’s documentation!¶

Summary¶

The project deals with the student-level data (ICPSR 4275) to find frequent itemsets and extract associaton rules. We are hoping to find social, educational, and cognitive relations. For instance, the students’ performance, gender, family income, parents’ education level may correlate with each other. In order to do that we are using Apriori algorithm to mine frequent itemsets. We also have provided some visualization methods, which are feasible to analyze the results.

Dataset¶

We are using student-level data to study the classic data mining algorithm (Apriori). Briefly, the dataset approximately consists of 15000 records and 1600 features. However, we are focused only on 16 features. List of selected features could be seen in preprocessor.py file.

The dataset can be downloaded from the following link: ELS 2002 Dataset.

Installing and running the program¶

You can install and run the program from scratch as follows

git clone https://github.com/HEL-DMP17/Apriori.git
cd Apriori
pip install -r requirements.txt
python arules.py

Source code layout¶

(All the files are in the directory src of our repository.)

preprocessor.py - Provides several methods and mapper class to preprocess the dataset
apriori.py - Frequent itemset generation and association rules extraction in this file
ppml_exporter.py - PMML file format exporter used to visualize the result in R
utils.py - Utility functions are here

Source code documentation¶

Module Index

Results¶

Interactive association rules shown in a table inspectDT.
Visual network of top 50 rules visNetwork.
Scatter plot of all association rules scatterPlotly.