Inductis
Who We Are

Who we are

DM Review Article
LDW-Publishers of Trade Journal For Business and Industry Leaders
Designing a Technology Infrastructure for Analytics
Best Practices published in DM Review Magazine
April 2005 Issue
By Sudip Chakraborty
 
Overview: This article provides an overview of the four major technology selection decisions that need to be made by technology managers and directors who are responsible for designing a technology infrastructure that will support the analytical processing needs of the business.

The use of data analytics is expanding rapidly as companies attempt to leverage huge volumes of data in their operational systems and data warehouses to attain greater profits. To ensure that you are able to use data analytics effectively and efficiently, it is critical to design the right technology infrastructure.To a large extent, the speed at which queries and models are run, the ease by which analysts are able to use their analytical tools and how quickly the infrastructure can be customized for users depends on technology infrastructure.

This article provides an overview of the four major technology selection decisions that need to be made by technology managers and directors who are responsible for designing a technology infrastructure that will support the analytical processing needs of the business. This article will also help business managers understand some of the complexities and challenges in designing and building an efficient analytical processing infrastructure.

A business may require a separate infrastructure to meet data analytics needs because the requirements are very different from those of transaction processing. Most companies have made significant technology investments in the latter and have optimized their technology infrastructure for that sole purpose.

Making technology selection decisions is an iterative process that is performed jointly by business users and technology professionals. Your users may tell you that they need 100% uptime and high performance for all their applications. However, the reality of delivering such a technology platform may be much more expensive than what users anticipated and, more importantly, what they or your finance organization can afford to pay. You will need to provide a balance between what users say they require and what they truly need. Because you will have a broader knowledge of technology and cost drivers, you should be able to guide users and define the business requirements. This process often requires two to three iterations.

Business Requirements

As a first step, you should capture the business requirements. You need to gain a good understanding of the
functionality required by the users of the system. In an analytic environment, your users would typically need to
load, merge, clean, profile and segment data in multiple ways and then create predictive models employing the
data. You need to decide who are the likely users of this infrastructure. Are they business users who prefer to have
GUI-based applications to guide their analysis? Are they technical users with knowledge of SQL and other
programming languages? The size of typical data sets today and an estimate of how they are likely to grow over
the next three to five years is critical information to capture. You also need to consider business policies regarding
data privacy and security.

Given that designing and implementing the right infrastructure for analytics is often a significant investment of
time and money, you should plan on designing an infrastructure that will be adequate to meet needs for three to five years before major, additional upgrades are necessary.

Architecture

Based on the business requirements, you will need to create the architecture for your infrastructure. The
architecture will require you to make four major technology selection decisions - application(s), database,
operating system and storage. Note that there are dependencies between all four of these decisions; therefore, you should not plan to make a decision on each in a sequential manner, but rather start with a broad universe of options for all four components and narrow them down through an iterative process to the final solution set.

Application

The analytic tool or suite of tools will be the primary interface between the users and the analytical environment.
The application(s) should satisfy all critical functional needs of the users. At the same time, it should provide
adequate performance, scalability and ease of use.

Other criteria you should consider are the application's availability on multiple platforms (such as Windows and
UNIX) and the skill set to support it. Finally, you should make sure that the licensing costs are within your budget. You may decide to use one tool, such as SAS, for all your needs or select multiple tools to provide users with the most optimal tool for specific functions. However, you need to take into account the licensing and support costs for each additional application or tool and weigh that against the added benefit of providing specialized tools for specific functions. Our experience at Inductis indicates that specialized tools such as CART, MARS and TreeNet from Salford Systems provide a tremendous benefit to building predictive models with a high degree of accuracy.

Database

Selecting the right database is very important because data storage can make a huge difference in performance and ease of use. In some cases, you may choose to use flat files (e.g., with SAS). You can choose a relational database such as Oracle, IBM DB2 or Microsoft SQL server, a fast and specialized analytical database such as QueraBase (from Enquera) or even a specialized system such as Teradata. You should expect to achieve higher performance as you use more specialized databases (e.g., benchmarking with Oracle and QueraBase has provided us, in some cases, with a three to five times improvement in performance of commonly used analytical functions).

While database selection can often be dictated by enterprise-wide standards, you should weigh the benefits of using a more specialized database against the additional cost before making the decision rather than blindly following the company standard.

Operating System

In many cases, the set of options for the operating system (OS) will be dictated by your IT department. For example, your company may support only Windows and HP-UNIX. You need to make sure that the application(s) and the database that you have selected are supported by the chosen OS platform. You need to decide whether you want a single OS environment (e.g., all-Windows or all-UNIX) or a mixed environment (e.g., users with Windows desktops connected to UNIX servers).

As with database selection, you will need to consider the end use of the infrastructure. Will it be used by hundreds
of users submitting ad hoc queries? Or is it a more structured load? You will also need to consider the level of user familiarity with the chosen OS as well as the cost of application and database licensing on the selected OS because these can vary significantly. As a general rule of thumb, you should select Windows if you want to optimize for ease of use (as often demanded by business users) but seriously consider UNIX if uptime is your primary concern and you want to be able to scale to hundreds of concurrent users.

Storage System

Last, but not least, is the type of storage system you want to use for your analytical technology infrastructure - there are several options available. The cost of raw storage ranges from $1 per gigabyte to approximately $25 per gigabyte. For example, if you choose to provide users with individual storage units, you may be able to provide them with inexpensive 500GB USB drives. However, sharing common data and providing backup, retrieval and security will be very cumbersome and expensive. Alternatively, you may choose to go with a storage area network (SAN) which costs approximately $20 to $25 per gigabyte in addition to infrastructure setup costs. The leader in this space is EMC; however, products from EMC tend to be more expensive than products from emerging companies such as 3PAR. In the middle of the spectrum, there is a whole range of technologies - direct attached storage (DAS), network attached storage (NAS) and iSCSI - all of which offer a different level of trade-off between cost and performance.

An important component of the storage system is the backup technology. Decisions need to be made about how, and at what frequency, backups will occur. These decisions will be driven by the business requirement of how much data the business can afford to lose as well as how quickly it needs to be recovered. Common methods for designing a backup system are disk-to-disk and disk-to-tape.

Careful Planning Required

This article has focused on how to approach the four major technology selection decisions that support an analytics
technology infrastructure. As you may have realized, this is a complex and time-consuming exercise requiring significant technical expertise across multiple dimensions. Before you dive into this, you may want to evaluate whether the in-house staff has the expertise to make the necessary decisions or whether it would be more effective to bring in outside consultants. In both cases, incorporating the business users' perspective is critical to ensure that the environment is designed with their specific needs in mind.

 
Quick Links for Financial and Insurance Consulting Services and More...
Apply For Insurance Consulting Services-Inductis

APPLY TO INDUCTIS

Inductis - Focusing On Professional Financial Consulting & Insurance Services
FOCUS AREAS
Case Study of Best Financial Consulting Services & Insurance Consulting-Inductis and More...
CASE STUDIES
  Select examples of how Inductis teams have achieved results for a variety of clients ...more >>
Best Financial Consulting Company- Inductis
PUBLICATIONS
  Our thoughts on how organizations can elevate their performance ...more >>
Site Map -Inductis
SITE MAP
Contact Us for Financial Services and Insurance Consulting Services - Inductis
CONTACT US
Copyright © 2002 - 2008 Inductis Inc.