Data Quality Management for Better Big Data Analytics
Published on 31 July 15
2
1
For businesses that have begun using it, business analytics has become a valuable means of uncovering insights that aid decision making in many areas. For business analytics to produce reliable results, it has to be available in the right volumes (for statistically valid results) and at the right time. It must also be of the right quality.
There’s no doubt that most medium and large scale businesses capture and store significant amounts of data that is processed by their core business support systems as well as business intelligence (BI) systems, and maintained by their IT departments. Even so, there may be a hesitation about how to start up an analytics initiative. This may be because of a lack of a clear data quality management system being in place, one that provides confidence that data is owned, managed, controlled, and reliable, and can be made available when required.
Although the whole exercise of setting up and implementing a data quality management system is too large to describe here, the following provides a very high level perspective on how to approach the problem.
Identify existing data assets and new requirements
The first phase in taking control of all the data in the organization is to inventory it and know more about it. Typical questions to answer at this stage are:
- What types of data are there? How rich is each one?
- Who owns (or maintains) the data? Who consumes it? Who makes it available?
- Where and how is the data stored? And for how long?
- Is the data quality level known?
- How and where is data captured?
- Why is this data maintained?
- Are new data requirements known, and what are they?
- Is metadata available?
- What types of data are secure, and how does this security work?
Strategise and Plan
In the second phase the objective is define the strategy of a data management system, along with it’s highest level components.
- Goals: these should be supportive of the business goals
- Organization: What is the data management organization structure? Who will own it, who will be responsible for various aspects of data management, such as security, stewardship, ageing and retention, etc.
- Scope of the data management systems: Identify inclusions, ie, what types of data fall within the scope of the data management system? Also included would be related processes and standards. What are the high level objectives for each inclusion?
- Prepare a roadmap for implementation: Prioritise the various areas and types of work. Plan for implementation in terms of projects, teams and schedules.
Define and Implement New Data Quality Management System
- Governance: what is the governance organization structure, who are the authorities and authorization processes? What are the touchpoints and controls with vendors? What standards does the data management system follow, and how is data quality assured?
- Processes and Technologies: The management of data necessarily comprises a holistic set of standard processes and guidelines that address how data should be sourced, handled, stored and accessed such that any user may be confident that he/she always has the right version that came from a single source. These processes must be defined with reference to associated technologies that are used for these purposes. At the minimum, these processes and technologies should address the following:
This blog is listed under
Data & Information Management
Community
Related Posts:
You may also be interested in
Share your perspective
Share your achievement or new finding or bring a new tech idea to life. Your IT community is waiting!
Donal, thanks for your input! I do agree that if a formal data management system is not already in place, then the early stages of a Big Data project might well provide some good experience that can be used in setting up a data management system. But even if there is no immediate plan for Big Data, it's helpful to have a management system with formal controls in place for enterprise data. It would certainly be useful for analytics initiatives that are required to start off with whatever "non-Big" data is available.
Mario, These are valid questions to ask for any data quality management initiative you might undertake. Nothing specific here about Big Data. In my experience, often the First Big Data project is often a discovery one, and by its very nature would follow a different set of processes. Not that over time, the process you outlined above might be introduced as trusted insights from the discovery phase are acted upon.