Recently, there have been a lot of news and development covering advances in parallel processing frameworks, such as Hadoop. Some innovative data warehouse software vendors are increasingly starting to research new development strategies the parallel processing offers. So far majority of the efforts were targeted at the improving the performance and optimization maps of the queries within the traditional physical data warehouse architectures. For example, traditional data warehouse vendors like Teradata joining the Hadoop movement and applying parallel processing to their physical DW infrastructures. Companies like Yahoo and Amazon are also spearheading map/reduce Hadoop adaption for large data scale analytics.
I had been monitoring advances in the Hadoop front in particular, as I believe it will provide a convergence grounds for our products and a new development direction for Queplix Data Virtualization. Data virtualization and Hadoop are born out of the same premise – provide data storage scalability and ease of information access and sharing and I see how the two technologies complement each other perfectly.
Hadoop’s data warehouse infrastructure (Hive) is what we are researching now to integrate with Queplix Data Virtualization products. Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called Hive QL. Queplix Data Virtualization will soon utilize the flexibility of its object-oriented data modeling combined with massive power of the Hadoop parallel processing to build virtual data warehouse solutions. Imagine the analytical performance of such virtual data warehouse solution created by using the Virtual Metadata Catalog and Virtual Entities in its base as organizational and hierarchal units (instead of traditional tables and columns and SQL-driven access). Such “virtual” data warehouse solution would be a perfect fit for large scale operational and analytical processing, data quality and data governance projects with full power of Queplix heuristic and semantic data analysis. Today, data virtualization solutions are deployed by many larger enterprises to gain the visibility into the disperse application data silos without disrupting the original sources and applications; in the near future Data Virtualization and Hadoop-based virtual data warehouse solutions will be deployed in tandem to implement the full spectrum data management enterprise solutions ranging from larger-scale data integration projects (i.e. massive application data store mergers as a result of M&A between large companies) all the way to Virtual Master Data Management pioneered by Queplix. Such solutions will not only provide a better abstraction and continuity of business for the enterprise applications but also will utilize full power of parallel processing and will provide immense scalability to Queplix semantic data analytics and data alignment products.
Here are some of new and exciting ideas about Queplix is working on now: utilizing Hadoop for Virtual CEP (Complex Event Processing) within Queplix Virtual Metadata Catalog; generating “data steward” real time alerts using predictive data lineage analysis actually before the data quality problems start affect your enterprise applications; implementing Hadoop-based virtual data warehouse solutions to provide High Availability for large application stores that require massive analytics and semantic data processing; large scale Virtual Master Data Management initiatives involving enterprise-wide Customer or Product Catalog building; large-scale Business Intelligence projects based on Queplix Virtual Metadata Catalog.
Watch this blog for new developments and advances of Queplix technology to integrate Hadoop and Data Virtualization as we make announcements throughout this year!
Very interesting information. I also use virtual data room for documents and data operations. I use Ideals virtual data room and it fully appreciates me.
ReplyDelete