Data mining revs up
"If the GUHA analysis that previously lasted for two days can be
let's say, five hours that will already be a giant step
forward", affirms Adjunct
Professor Esko Turunen.
Tampere University of Technology (TUT) and the University of Economics, Prague are combining the data mining method GUHA and the computational method Grid. The purpose of the project launched in autumn 2008 is to take data processing speed to the next level.
The human capacity to process statistical data on different research subjects was found inadequate decades ago. The problem was identified in the Czech Republic already in the 1960s and some of the brightest minds in the field took on the challenge.
The data mining method GUHA emerged as a result of their efforts. The method can analyse massive amounts of data, identifying hidden patterns and interdependencies between data items. During its long-term development GUHA has become a household name in the field of data mining.
"I first came across GUHA in 2002 when I was researching the predictability of traffic jams and found it a very useful method", says Adjunct Professor Esko Turunen from the Department of Mathematics. Today he is leading a collaboration project on the subject at TUT.
GUHA meets Grid
Turunen continued using GUHA with students to carry out different tasks. As they gained more experience on the method, they noticed that they tended to end up with such a large volume of data to be processed that all means to speed up the process needed to be taken into use. When Turunen was looking for a solution to the problem he chanced upon TUT's Grid project.
With Grid the computational capacity of multiple computers on a network can be combined to complete demanding tasks or calculations. Turunen immediately realized the potential of Grid for data mining and wanted to find out if it could be gridified with GUHA. The GUHA research team at the University of Economics, Prague lead by Doctor Jan Rauch was invited to Finland to consider if the task was feasible.
"We saw that the project was viable, though not all that simple", says Turunen.
From two days to a few hours
Turunen has high expectations for the results of the project. Employing multiple computers and the Grid method will bring significant speed gains for the data mining process.
"If the GUHA analysis that previously lasted for two days can be reduced to, let's say, five hours that will already be a giant step forward. With Grid this is truly attainable", affirms Turunen.
Researchers in Prague are already working on the gridification of GUHA and Grid. The first phase of the process is expected to be completed this year.
"In the second phase, we are planning to improve the algorithm that divides the GUHA task between the computers in the grid according to what we learn from the first phase. We are also interested in gridifying other applications developed at the University of Economics, Prague," says Dr. Rauch.
Applications in medicine and traffic
As the project is still ongoing, results are not yet available. The researchers have, however, contemplated the first potential application areas for the ready-made product. Tampere University Hospital has extensive data on eye diseases that, according to Esko Turunen, is likely to become the first target data set processed with Grid. Turunen mentions economics and traffic technology as other examples of potentially fruitful application areas.
Both parties are looking forward to continued collaboration after the gridification project. The method has wide potential as it is and researchers are already mapping out possible further application areas.
"We hope for further cooperation. The Grid potential can be used in various ways, and what we are doing with GUHA is just one of many options. At the moment we have two research projects at hand that could potentially use the Grid", Dr. Rauch states.