IBM Researchers are working on projects to improve heterogeneous data integration and to enable grid computing.
One endeavor, known as Project Clio, involves providing a mapping tool to look at data schemas and collections and deduce what transformations need to be made to integrate data sets, according to Laura Haas, IBM distinguished engineer for information integration and DB query processing, in San Jose, Calif.
Data will be viewed holistically so that inconsistencies can be reconciled, Haas said. “This is an example of a very advanced tool that might help somebody do an integration because they want a customer record to fit a particular format and they’ve got [the record] in three different databases,” Haas said.
The goal of Clio, according to IBM’s Web page on the subject, is to build a tool for creating mappings between two data representations semi-automatically, with user input. The technology is usable in applications such as data warehousing,
Project Clio technology is database-blind and generates SQL, so users can use any SQL system for data transformation, Haas said. The technology, which may appear in products next year or in 2004, functions with IBM’s federated database technology for extracting data from various sources. The project will enable a better understanding of relationships between data in various databases, said Haas.
The Clio architecture features a GUI presenting schema and data views; a correspondence engine for managing schemas, mapping and schema engines; an integration knowledgebase; and an underlying database.
An offshoot of Project Clio, code-named chocolate, is intended to enable mapping of XML documents to other XML documents. “One of the problems that’s emerging is everybody’s publishing in XML but we don’t have standards for what a customer looks like in XML,” Haas said. Chocolate in this instance would devise a common format for defining a customer, she said.
IBM anticipates there will be different ways of packaging the technology and views it as a tool for federated data sources, Haas said. XML documents can be mapped from the way they are originally structured to the way they are to be presented to the user, said Haas.
Pieces of the chocolate technology may appear in the DB2 database in the coming year.
An analyst said IBM’s data integration strategy is intended to give it a foothold into shops using other vendors’ products. “Effectively, they’re providing a higher order solution that can stand on the shoulders of the technologies that are already in place,” said analyst Carl Olofson, program director at IDC in Framingham, Mass. IBM’s strategy is to provide technologies that enable unification of heterogeneous environments, Olofson said.
Also in the works at IBM may be an “e-utility” that would enable setting up of computing grids, which unify multiple computers in different locations for a single purpose.
Such a utility, however, would likely consist of a package featuring products such as IBM WebSphere, Tivoli management software, and even hardware and storage, Haas said.
“This is a way of deploying the technology that we’re looking at that might be more easily packaged up, but I don’t think you’re going to go into a store to buy a grid. In some sense, grid is about reuse,” said Haas.
The e-utility, which is currently in a conceptual stage, would create a virtual computer and enable distribution of computing power similar to the old-style time-sharing method, according to Haas.