Tuesday, March 22, 2011

Thoughts on Big Data and Data Virtualization

Big Data Analysis in Relationship to Queplix Data Virtualization Solution
On the plus side for obtaining IT and business alignment, more companies are beginning to combine business and information management responsibilities in a single role, carried out by a single person, rather than a “business and IT partnership” with two people, two hierarchies and two sets of reporting relationships. Gartner expects 20 percent of companies to employ business information managers by 2013, compared with 5 per cent in 2009.
-       Massive Data News in the report from April 2010
 
Here are the next ten things you should know about big data:
1.    Big data means the amount of data you’re working with today will look trivial within five years.
2.    Huge amounts of data will be kept longer and have way more value than today’s archived data.
3.    Business people will covet a new breed of alpha geeks. You will need new skills around data science, new types of programming, more math and statistics skills and data hackers…lots of data hackers.
4.    You are going to have to develop new techniques to access, secure, move, analyze, process, visualize and enhance data; in near real time.
5.    You will be minimizing data movement wherever possible by moving function to the data instead of data to function. You will be leveraging or inventing specialized capabilities to do certain types of processing- e.g. early recognition of images or content types – so you can do some processing close to the head.
6.    The cloud will become the compute and storage platform for big data which will be populated by mobile devices and social networks. 
7.    Metadata management will become increasingly important.
8.    You will have opportunities to separate data from applications and create new data products.
9.    You will need orders of magnitude cheaper infrastructure that emphasizes bandwidth, not iops and data movement and efficient metadata management.
10.  You will realize sooner or later that data and your ability to exploit it is going to change your business, social and personal life; permanently.
 David Vellante in Big Data on February 16, 2011

Queplix VDM solution provides a data management solution continuum, starting from data integration of multiple disperse data sources to Master Data Management. All in a single “dashboard” view. We have an automated NoSQL, object oriented representation of business objects, abstracted from multiple sources and described as metadata repository. QVDM is a persistent solution that operates with minimum disruption to the sources in automated fashion.

QVDM offers today enhanced Data Alignment, Data Quality and Data Enrichment, proactive Data Stewardship interface and Global Data Dictionary. As such, QVDM is a full-spectrum data management solution. One of our strengths is in our ability to intelligently identify and described business objects from a variety of data sources, using our Application Software Blades™. In doing so, we eliminate the need to deal with proprietary data storage formats and the need to copy large amounts of data in order to make it available for analytics.

The Big Data industry has been developing rapidly recently, even though the technology was created years back (Google Big Table, etc.) The goal of Big Data is to be able to store and processlarge amount of data for analytical purposes using Map/Reduce technology. The big data technology by itself does not provide analytical engine, but rather enables it to operate on large volumes of data. The big data storage vendors (i.e. Hadoop) provide flat non-relational storage facility and the multi-processing engine to address BI queries for large data volumes. It is not possible to achieve the same performance using traditional RDBMS.

The most obvious synergy between QVDM and Big Data technologies is in the noSQL approach to data management. Big Data vendors pursue common goal which is to enable BI solutions to work on large volumes. Let’s consider an example of Jaspersoft BI:
“Jaspersoft’s vision goes well beyond Big Data. Our modern architecture and agnostic data source support is tailored for the cloud, from IaaS to PaaS, either public or private variations. In particular, NoSQL support puts us in the driver’s seat to become the de facto embedded standard for reporting and analysis within PaaS cloud environments.”
— Brian Gentile, CEO of Jaspersoft.
Jaspersoft’s BI engine can be deployed on top of Hadoop and in order to work effectively with large data sets it needs to utilize abstraction of business entities. Queplix Virtual Data Manager is built on the noSQL architecture and can provide the abstraction required fro Jaspersoft and other BI vendors today.

QVDM architecture works natively with Big Data engine by abstracting the data from the siloed sources and can optionally be used to migrate the data to the Big Data storage like Hadoop. With QVDM it is now possible to create the abstracted metadata layer and then use it to recreate the business objects in Hadoop in order to copy/move the data and therefore enable BI to work on Big Data engines. In other words, through Queplix Hadoop application Blade, we can now enable BI vendors is to virtualize Big Data repositories and provide persistence. 

1 comment:

  1. I really appreciate for your efforts you to put in this article, this is very informative and helpful. I really enjoyed reading this blog. Keep sharing and give us updates about Data Virtualization.

    ReplyDelete