Source code to accompany the paper:

Parallelization of Transition Counting for Process Mining on Multi-core CPUs and GPUs
Diogo R. Ferreira, Rui M. Santos
12th International Workshop on Business Process Intelligence (BPI 2016)
in Business Process Management Workshops, LNBIP 281, pp. 36-48, Springer, 2017
[PDF] [BibTeX]


This page contains the source code and data used to generate the results reported in the paper presented at the BPI 2016 workshop.

The source code comprises the following files:

The data comprises the following files:

The files above are zipped. Before proceeding, you will have to unzip them.


How to compile the code

Operating system: Ubuntu (or another Linux distro).

Check that you have all the necessary pre-requisites, namely CUDA. Also, make sure that CUDA is properly configured, as described here. A good place to configure the environment variables (PATH and LD_LIBRARY_PATH) is in ~/.bashrc.

Use this Makefile to compile the code. However, before running make, change the value of the ARCH variable to the compute capability of your GPU. For a list of GPUs and their compute capabilities, check this page.


Preprocessing the event logs

Start by preprocessing the event logs with the following instructions:

The arguments 0 1 2 3 specify that the case id is in column 0 (first column), task is in column 1, user is in column 2, and timestamp is in column 3, respectively.

You may be surprised that preprocessing the largest event log can take about 5min. Most of this time is spent on reading the data (i.e. parsing the CSV file). This is precisely the reason why we decided to separate the preprocessing from the main programs. This way we can preprocess the event log only once, and then run several different programs on it, without having to reprocess it again.


Running the programs

To run the multi-threaded CPU version of each algorithm, you can use instructions similar to the following:

where the first argument is the number of runs, and the second argument is the number of threads. As a rule of thumb, set the number of threads to be equal to the number of physical cores available in the CPU.

To run the GPU version of each algorithm, you can use instructions similar to the following:

where the first argument is the number of runs, and the second argument is the number of threads per block to be used on the GPU. Please check the hardware architecture of your GPU to determine the number of threads per block that should be used. As a rule of thumb, use a number of threads per block that is equal to the number of cores per SM (streaming multiprocessor). For example, all Maxwell generation cards (e.g. GTX 750 Ti, GTX 980, GTX Titan X) have 128 cores per SM.


Running a series of tests

For convenience, we provide a Python script that runs all algorithms on all event logs. Note that the event logs must have been preprocessed first by preproc.py (see above).

To run the tests, execute:

This will present the results in a similar form to Table 2 in the paper.


Testing on the BPI Challenge 2016 event logs

To run the experiment that we describe in the paper, do the following:

How to cite this work

Diogo R. Ferreira, Rui M. Santos, Parallelization of Transition Counting for Process Mining on Multi-core CPUs and GPUs, in Business Process Management Workshops, LNBIP 281, pp. 36-48, Springer, 2017

@InProceedings{ferreira17parallelization,
  author    = {Diogo R. Ferreira and Rui M. Santos},
  title     = {Parallelization of Transition Counting for Process Mining on Multi-core {CPUs} and {GPUs}},
  booktitle = {Business Process Management Workshops},
  year      = {2017},
  volume    = {281},
  series    = {Lecture Notes in Business Information Processing},
  pages     = {36--48},
  publisher = {Springer}
}

Last modified: 2018-02-12