Monday 1 August 2011

Real-world strategies and tips: Difficulties in "Coming to Terms with large data"

Demonstrate the value of the company and to ensure close contact with small, fast start by Eric Williams, "large volumes of data" recipes take advantage of - - and avoid the problems of data management can lead to analytical queries that run in the face of this, said end users of such information.

Williams, Catalina Marketing Corp. vice president and CIO, St. Petersburg, after buying habits of consumers in various countries around the world and estimates using information obtained from loyalty cards to retail company based in Florida. The voluminous data that computers and data storage devices before entering the combination of local predictive analytics software, Catalina were a sense of management and large data sets from long.

A typical day takes only data from the company 525 million U.S. retailers. Their own systems, the 200 million Americans over the last three years, including the purchase history of 800 billion lines of stores customer data on the Catalina.

Important strategies for data management and analysis of organizations launched the Williams Committee is simple: Avoid the temptation to collect all the parts of the existing information and analysis, not only for business users and professional response to throw in a data warehouse. However, high-volume data warehouse or other databases of information, such as load, an important model to show the value of a subset of business data and trends important to start the process review and analyze the large data and gain experience, the approach to overcome the challenges of large data.

"For a limited time or limited number of products and take a reading of the information if a person on board - probably a - the analysis may be of some help," said Williams. "I certainly not need a PhD person can do. He often someone just for their expertise and make business decisions. "

Organizations for the storage of large data sets to address these potential challenges and look for nuggets of information, such as mining, large data management, has become one of the most talked about trends in the IT industry can provide a significant competitive advantage. And structural issues, it is difficult to load data from a mainframe data processing and internal resources of the system, such as newspapers, that can contain a variety of unstructured information, Call Detail Records and social media sites like Facebook and Twitter.

Data management of distributed information processing role
For example, data companies, people's computers and mobile devices hits the Web allows you to keep track. This produces very large amounts of data, Tony Iams, Ideas International in Rye Brook, vice president and New York-based senior analyst at IT Research Company said. Services, you can use this data to organizations IAMS says "more than ever the user behavior in order to create a potentially much more accurate picture." However, the data are properly configured and managed to make this possible.

Jill Dyche, Base Consulting Group, Sherman Oaks, California, the public, data classification is an important first step in the process of managing large data, he said. "We go to the classification of data very quickly when talking to our customers," Dyche Oregon Therefore, only data in a truck platform, "the Pacific Northwest BI Summit in 2011, in Grants Pass said the data warehouse or data marts, but in fact I see what data and how it is used.”

Most of the time, one of the important features of large volumes of data too large to function effectively in a separate server. In addition, the non-treatment of data types, such as blogs and social media interaction - "other large data," according to Gartner Inc. analyst Merv Adrian - always a good choice for bases traditional relational data. As a result, many client organizations, and often as Hadoop, Map Reduce and NoSQL around open source technologies such as data warehouses is an important, distributed computing, management, or scale, the model, the game .

According to Williams, a decentralized approach, but worked for Catalina Marketing. "All these computer networks or the standard PC connection devices, all meaning in the world of ideas and make them work together," he said. "This will give us what we grow in size and allow this is really a very cost-effective and efficient, "he said.

Williams asked at an optimum level is another strategy designed in part to maintain the data storage devices, Catalina is holding a group meeting monthly visitors. Who wants to run queries and - - of business users needs change over time, Williams has also to see how staff meetings are essential because, he said.

"We are working with them to understand how they work, what their analysis shows what works," he developed data processing and structure parameters of the request "is not optimized for you to accept the value added [users] need. " Williams has been modified to accommodate new types of data requests compliance structure had to say.

Management challenges of monitoring data that require high
And data can be canceled decided - for some organizations, management and analysis of data sets very large is one of the biggest problems to find valuable information that can lead to business benefits.

Largely because workers are afraid of, for example, UPMC, more than 20 hospitals and more than 50,000 employees based in Pittsburgh with the health system in recent years has seen a rapid growth of stores William Costantini, according business leaders for the integrated operations center, to remove any information.

"The biggest problem at the moment [imagination] and the case was scared of everyone's responsibility to do what and when liquidated, may be eliminated," said Costantini. "Everyone throws something or who are afraid of destruction. At the same time, everyone should be aware of your budget and wants to keep the size down. "

In addition to the major challenges organizations data analysts, the output is usually a data warehouse, information infrastructure, and the experimental data to explore the "virtual" data is growing in popularity. Companies to ensure that you do not end up with inconsistent data partitioning in a virtual environment should take a close, analysts said.

In addition, a database transaction rates, and facilities used to store data Hadoop independent IT department is usually created by application developers. "This is done by people with different tools of the usual approach," Pacific Northwest BI Summit, said Adrian. "Managed probably too generous to long term."


Organizations may not be compatible with companies that can integrate information across data types, an infrastructure management Get Gartner added.

No comments:

Post a Comment