.. Apriori documentation master file, created by sphinx-quickstart on Mon May 1 21:58:48 2017. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Welcome to Apriori's documentation! =================================== .. toctree:: :maxdepth: 2 :caption: Contents: Summary ======= The project deals with the student-level data (ICPSR 4275) to find frequent itemsets and extract associaton rules. We are hoping to find social, educational, and cognitive relations. For instance, the students' performance, gender, family income, parents' education level may correlate with each other. In order to do that we are using Apriori algorithm to mine frequent itemsets. We also have provided some visualization methods, which are feasible to analyze the results. Dataset ======= We are using student-level data to study the classic data mining algorithm (Apriori). Briefly, the dataset approximately consists of 15000 records and 1600 features. However, we are focused only on 16 features. List of selected features could be seen in ``preprocessor.py`` file. The dataset can be downloaded from the following link: `ELS 2002 Dataset `_. Installing and running the program ================================== You can install and run the program from scratch as follows :: git clone https://github.com/HEL-DMP17/Apriori.git cd Apriori pip install -r requirements.txt python arules.py Source code layout ================== (All the files are in the directory ``src`` of our repository.) * ``preprocessor.py`` - Provides several methods and mapper class to preprocess the dataset * ``apriori.py`` - Frequent itemset generation and association rules extraction in this file * ``ppml_exporter.py`` - PMML file format exporter used to visualize the result in R * ``utils.py`` - Utility functions are here Source code documentation ========================= * :ref:`modindex` Results ======== * Project's Wiki page presents the analysis of the results `Wiki Page `_. * Interactive association rules shown in a table `inspectDT `_. * Visual network of top 50 rules `visNetwork `_. * Scatter plot of all association rules `scatterPlotly `_.