Making sense of the data maze
AUTHOR: Lachlan Colquhoun DATE: 06.09.07 ISSUE 1, 2007
A new research project is exploring ways to help organisations deal with information overload.
Information technology has been a boon for modern organisations but one of the most challenging consequences has been the creation of more and more data, some of it man-made, some automatically generated by machines.
 |
We are investigating how to efficiently gather, store, retrieve and process data grids from the perspective of managers and end-users. |
Illustration: Ron Monnier
From satellite sensors to new electronic networks, organisations that want to harness business intelligence for better decision-making need to ‘tame’ data with appropriate technology and structured systems and processes.
The deployment of applications running on single platforms has boosted costs, spread resources too thinly and limited scalability. Combining data from these disparate applications has created a major problem for organisations.
The analysis of what to do with the huge amount of data confronting organisations is the subject of a major Australian School of Business research project known as ADAGE, or the Ad-hoc DAta Grid Environments. The three-year project, which began in January, aims to investigate how to efficiently gather, store, retrieve and process data grids from the perspective of managers and end-users.
“A side effect of the explosion in IT and networks has been the explosion in data,” says project leader Associate Professor Fethi Rabhi. “Most correspondence now, for example, is done by email. If you go to an email archive, there is a lot of knowledge already there that you didn’t have previously. The data grid project is looking at these challenges.”
One way to consider data, says Professor Rabhi, is as the product of a “manufacturing process” where some data is raw and some is manufactured, and in a usable form. “If you go to the manufactured data it doesn’t necessarily cost you a lot to get it into a useable form, but if you go to the raw data then you start to have problems analysing it,” he says.
Raw data can come in several forms and vary in integrity. “Sometimes the data link is down, sometimes you have data that is busy, and sometimes data is entered incorrectly, so you have all sorts of gaps and inconsistencies which make the application of normal techniques very hard, and that is why we have this research project.”
Further complicating the situation are creators of aggregated data such as Reuters and Bloomberg, who gather and combine data and sell it on to third parties.
The project has a commercial partner in SIRCA, the Securities Industry Research Centre of the Asia Pacific, which is providing case studies for research.
Through dealing with real-life case studies, the researchers hope to create a methodology and even their own IT applications which can process raw data into a more usable form, and Professor Rabhi believes the grid computing model offers a way forward. “Grid computing is where you have different machines which combine to function and behave as one, and that presents a good way to manage the data challenge,” he says. “In most cases, data is spread around different sites. It is unfeasible to try to gather it all together and it would cost too much. However, using a grid computing approach is new and offers a way forward.”
ADAGE involves the UNSW School of Computer Science and Engineering in the Engineering Faculty, with a major contribution from Associate Professor Boualem Bentallah.
The project also has links to a European project called SORMA, which aims to develop a platform to support the dynamic trading of ICT resources “on-demand” on a transparent management resource. Funding from the DEST-ISL Competitive Grants Program gives the Australian research effort a greater ability to contribute to SORMA.