Using Varonis: Who Owns What?

December 13, 2012

(This one entry in a series of posts about the Varonis Operational Plan – a clear path to data governance.  You can find the whole series here.)

All organizational data needs an owner. It’s that simple, right? I think most of us would be hard pressed to argue against that as a principle—the data itself is an organizational asset, so of course it’s not the Help Desk or AD Admin folks who own it, it’s the users or business units that should own it. Of course, that’s great in theory, but with 1, 5, 10, or even 20 years’ worth of shared, unstructured data, figuring out who owns data is far from simple, let alone involving those owners in any meaningful way.

Before we get into using Varonis to locate owners, I want to talk about why finding a single data owner can be such a problem. IT probably knows who owns the Finance folder.  It’s the CFO or a delegated steward. Same with HR, Marketing or Legal—these tend to be clearly-delineated departmental shares and it’s not hard to figure out whom to go to if we need an informed decision. (Regularly involving those owners in data governance is a different problem, and one I will cover in future posts.)  The identification for these folders is relatively straightforward.

But what happens if you need to find the owner of a folder that has a less obvious name? What if the folder’s name is a project ID, or an acronym of some kind? In my experience, a majority of unstructured data resides in folders that aren’t obviously owned by anyone.

What IT tends to do then is a few different things:

  • Check the ACL and see which groups have access. If it’s a single group with an obvious owner, that’s a likely candidate. If the ACL contains many different groups or a global access group like Domain Users, though, this tactic tends to fail.
  • Check the Windows owner under Special Permissions. This metadata can be helpful, but can also be a red herring since it’s often just set to the local Administrator of the server. Even if there’s actually a human user there (who likely created the folder), that value may be outdated or inaccurate.
Special Permissions Dialog
  • Check the owner of files within the folder. Same problems as above.
File Properties Dialog
  • Enable operating system auditing to identify the most active user. Anyone out there excited about turning on file level auditing in Windows? I have yet to talk to anyone who answers yes to this question because of the performance hit on the server as well as the storage required and expertise to parse the logs effectively.
  • Turn off access and see who complains. Not an optimal strategy when it comes to critical data.
  • Email the world and hope for a response. In general, people don’t want to take ownership of something without good reason, since it may mean more work. How confident are you that the proper owners (who may be at a management or director level) are going to know exactly which data sets their teams are using regularly? If they’re not sure, are they going to jump to take responsibility?

So finding owners is hard, let alone finding owners at scale. If you’ve got thousands of unique ACLs and you want owners for all of them (or at least the ones that make sense) you’re going to have to go through some version of this process for each one. It’s no wonder we haven’t done a good job of this over time. Thankfully, there’s a better way.

Step 4: Identify Data Owners

The key difference between attempting to solve this problem manually and attacking it intelligently with Varonis is the DatAdvantage audit trail. A normalized, continuous, non-intrusive audit record of all data access is a key piece of DatAdvantage, and it allows us to actually identify data owners at scale without having to hunt and peck. Once you start gathering usage data and rolling it up into high level stats you can start to see the likely owners of any data set, not just the obvious ones.

DatAdvantage gives you two straightforward ways to get this information: First, we can quickly take a look at a high-level view of a single folder within the Statistics pane of the DatAdvantage GUI. This will show us the most active users of a particular folder. We like to say that at most, you’re one phone call away, since if the most active user isn’t the data owner, they almost certainly know who is.

You can operationalize this process even further by creating a statistics report, which can be run on an entire tree or even a server. A single report can show the top users of every unique ACL, and it’s possible to set up advanced filters to make this even more useful—showing only users outside of IT or in a specific OU, for example. You can even add additional properties from AD to the report, showing each user’s department or line manager, if available. None of this is possible without constantly gathering access activity and providing an interface to combine it with other available metadata.

Identifying owners is useful, but actually involving them is where IT can really start to make headway when it comes to ongoing governance. We’ll tackle that next.


The Formula for Analytics Success: Data Knowledge

November 12, 2012

Companies spend a small fortune continually investing and reinvesting in making their business analysts self-sufficient with thelatest and greatest analytical tools. Most companies have multiple project teams focused on delivering tools to simplify and improve business decision making. There are likely several standard tools deployed to support the various data analysis functions required across the enterprise: canned/batch reports, desktop ad hoc data analysis, and advanced analytics. There’s never a shortage of new and improved tools that guarantee simplified data exploration, quick response time, and greater data visualization options, Projects inevitably include the creation of dozens of prebuilt screens along with a training workshop to ensure that the users understand all of the new whiz bang features associated with the latest analytic tool incarnation.  Unfortunately, the biggest challenge within any project isn’t getting users to master the various analytical functions; it’s ensuring the users understand the underlying data they’re analyzing.

If you take a look at the most prevalent issue with the adoption of a new business analysis tool is the users’ knowledge of the underlying data.  This issue becomes visible with a number of common problems:  the misuse of report data, the misunderstanding of business terminology, and/or the exaggeration of inaccurate data.  Once the credibility or usability of the data comes under scrutiny, the project typically goes into “red alert” and requires immediate attention. If ignored, the business tool quickly becomes shelfware because no one is willing to take a chance on making business decisions based on risky information.

All too often the focus on end user training is tool training, not data training. What typically happens is that an analyst is introduced to the company’s standard analytics tool through a “drink from a fire hose” training workshop.  All of the examples use generic sales or HR data to illustrate the tool’s strengths in folding, spindling, and manipulating the data.  And this is where the problem begins:  the vendor’s workshop data is perfect.  There’s no missing or inaccurate data and all of the data is clearly labeled and defined; classes run smoothly, but it just isn’t reality  Somehow the person with no hands-on data experience is supposed to figure out how to use their own (imperfect) data. It’s like someone taking their first ski lesson on a cleanly groomed beginner hill and then taking them up to the top of an a black diamond (advanced) run with step hills and moguls.  The person works hard but isn’t equipped to deal with the challenges of the real world.  So, they give up on the tool and tell others that the solution isn’t usable.

 

All of the advanced tools and manipulation capabilities don’t do any good if the users don’t understand the data. There are lots of approaches to educating users on data.  Some prefer to take a bottom-up approach (reviewing individual table and column names, meanings, and values) while others want to take a top-down approach (reviewing subject area details, the associated reports, and then getting into the data details).  There are certainly benefits of one approach over the other (depending on your audience); however, it’s important not to lose sight of the ultimate goal: giving the users the fundamental data knowledge they need to make decisions.  The fundamentals that most users need to understand their data include a review of

The above details may seem a bit overwhelming if you consider that most companies have mature reporting environments and multi-terabyte data warehouses.  However, we’re not talking about training someone to be an expert on 1000 data attributes contained within your data warehouse; we’re talking about ensuring someone’s ability to use an initial set of reports or a new tool without requiring 1-on-1 training.  It’s important to realize that the folks with the greatest need for support and data knowledge are the newbies, not the experienced folks.

There are lots of options for imparting data knowledge to business users:  a hands-on data workshop, a set of screen videos showing data usage examples, or a simple set of web pages containing definitions, textual descriptions, and screen shots. Don’t get wrapped up in the complexities of creating the perfect solution – keep it simple.  I worked with a client that deployed their information using a set of pages constructed with PowerPoint that folks could reference in a the company’s intranet. If your users have nothing – don’t’ worry about the perfect solution – give them something to start with that’s easy to use.

Remember that the goal is to build users’ data knowledge that is sufficient to get them to adopt and use the company’s analysis tools.  We’re not attempting to convert everyone into data scientists; we just want them to use the tools without requiring 1-on-1 training to explain every report or data element.

Thanks to http://evanjlevy.wordpress.com/2012/11/12/the-formula-for-analytics-success-data-knowledge/


Frameworks for big data and business intelligence adoption

October 12, 2012

In the last post, Frameworks is #1, we discussed how checklists and their big brothers, frameworks, help you develop new solutions by providing a structure for identifying and closing gaps. In this installment, we’ll go into one of our preferred frameworks, TDWI’s MAD (Measure, Analyze, Drill) Framework, and show you how we use it to ensure our clients can progress along a gently sloping curve to BI maturity, making their investments rationally but with the understanding that they are providing greater clarity and a stronger base on which to make business decisions.

First, some background information: The Data Warehousing Institute (TDWI), has used Geoffrey Moore’s chasm metaphor to describe the path to business intelligence maturity since 2004. Here is one of the representations they’ve published to convey this approach:

TDWI maturity model

Descriptions and examples are provided for each stage, as shown here in this excerpt from Interpreting Benchmark Scores Using TDWI’s Maturity Model for Stage 1 – The Infant Stage:

The Infant stage is the conglomeration of two stages from the original BI Maturity Model created in 2004: Prenatal and Infant. These stages are flip sides of the same coin and one leads directly to the other, as we shall see.
Operational Reporting: The Prenatal sub stage represents a pre–data warehousing environment where an organization relies entirely on operational reports for information. An operational report runs directly against an operational system and shows data for that system only. In some cases, it may contain data from multiple systems if an organization consolidates data into an operational data store. In general, however, operational reports are static and inflexible and show a limited range of data for a limited set of processes. If a user wants to view a slightly different set of data in a slightly different way, the IT department usually needs to code a new custom report, a process that may take days, weeks, or months, depending on the complexity of the report and the current backlog of requests.
Spreadmarts: The lack of flexibility of operational reports causes certain users to take matters into their own hands, which gives rise to the second half of the stage. These users create their own reports using whatever tools are handy—usually a spreadsheet or desktop database (e.g., Microsoft Access). They collect, clean, transform, aggregate, and format data for individual or group consumption, essentially performing all the functions of a data mart or data warehouse. The end result is something called a spreadmart—a spreadsheet or desktop database on steroids acting as a data mart or data warehouse. Other names for spreadmarts are data shadow systems, analytical silos, and human data warehouses.
While spreadmarts give business decision makers the data they crave, they have significant downsides. Spreadmart creators, who are typically high-priced business analysts, waste an incredible amount of time collecting and massaging data—tasks that a data mart or data warehouse is designed to do. Worse yet, the analysts define terms and metrics according to their own parochial views of the business, creating a kaleidoscope of misaligned data silos that aren’t easily reconciled. Without a single version of the truth, executives can’t gain an accurate view of business operations to help them make smart decisions, and they risk falling out of compliance with financial regulations regarding information transparency and accuracy. More than one executive has commissioned a data mart or data warehouse primarily to stem the proliferation of spreadmarts.

Excerpt from Interpreting Benchmark Scores Using TDWI’s Maturity Model © 2007 TDWI

To advance your organization’s Business Intelligence maturity, you must focus on more complex challenges. After you’ve dealt with the basics of establishing a reliable infrastructure, improving your data quality, and publishing high-value metadata, you are ready to focus on more strategy-focused approaches to solutions. This is where the MAD framework comes in.

Another of TDWI’s innovations, the MAD framework was first described in 2007, the framework takes its name from the three key abilities one should expect from BI in general, and dashboards in particular: Monitoring, Analysis, andDrill to detail.

MAD framework

As more vendors took to this approach and extended it, Wayne Eckerson, director of TDWI Research, extended the MAD framework to represent the current and future domains of Modeling, Advanced Analytics and Do (Collaborate and Act).

MAD framework2
So how does Chateaux use the MAD framework? The MAD framework enables a common understanding of where BI work falls and where you want it to go. What it doesn’t do is provide the checklist part of the framework. This is where concept maps come in. Using customer-specific concept maps, similar to the TDWI’s reference example, below, we are able to focus your organization on the key goals of your initiatives. Applying expected value and other qualitative measure approaches to this selection, enables us to very effectively focus the organization on where they should expect to mine the greatest value and what those results will look like.

Concept Map of Business Outcomes

If you’d like to learn more about TDWI, their Maturity Benchmarks, and membership, visit www.tdwi.org. You can also email or give us a call and we’ll schedule time to help you navigate the information and tools available from TDWI and other sources on this topic.

How is your company’s BI Maturity? Have you had a benchmark assessment? What about the MAD framework or others: what are your success stories, tips and cautionary tales?

Fantastic articles at  http://tdwi.org/


CIOs Need to Make Information Management a Real Priority

August 9, 2012

 

Fantastic article from Ventana Reseacrh

Our recent benchmark research on information management uncovered some startling facts about the level of technology adoption necessary for efficient information-centric organizations. Chief information officers (CIO) are responsible for the availability of information to their businesses in a consistent and timely basis, but in most organizations, information management is seen as just a delegated set of tasks and is not the CIO’s top priority. This unfortunate outlook can have a lasting impact on the efficiency and profitability of a business.

Our business analytics benchmark of more than 2,800 organizations found that two-thirds spend the majority of their time on data-related tasks rather than analytic ones. Analysts spend too much time using tools such as Microsoft Excel to copy and paste the data they need to communicate to meet information requests. Lack of availability and lack of consistency in a company’s information has a severe negative impact on its business analytics.

Our benchmark research on information management found some opportunities for businesses to gain ground. Almost two-thirds (63%) have confidence in their organizations, saying they have the right team to improve information availability, but at the same time they admit that data spread across too many applications and systems (67%) and multiple versions of the truth (64%) are barriers to information management. Getting the right accurate information still plagues most organizations, and most do not have a plan to improve this situation. While organizations have complained for decades about lack of access or accuracy of the data, today the impact of these issues is better known. Only 19 percent of organizations indicate business and IT work well together, while 56 percent say they work fairly well together and are focused on improving. Most organizations still operate in silos and talk about working together more than they actually do.

Our research found a lack of adoption of key initiatives to help manage information assets more effectively. Even key initiatives that are completed, from master data management (MDM) (10%) to data virtualization, data quality, data integration and data governance (16%) are employed by just a fraction of organizations that should be mastering the science of information management. We saw some potential for improvement; initiatives in data integration and data quality are in effect at 28 percent of organizations, but in other areas the number was smaller. The majority of initiatives focus on customer-centric data: The number ranges from almost three-quarters of some organizations to much less in financial, employee, product and supplier data. These data-related initiatives are critical for organizations that need to deliver information management. An organization that does not have them completed and working together is taking significant business and financial risk by running at an unacceptably low level of efficiency and accuracy.

Information management in a distributed enterprise environment is no easy task when you need a common information warehouse. We found that too many incompatible tools (57%) and many unsynchronized metadata stores (42%) were the top two obstacles to having a common information warehouse.

IT management puts itself in a difficult situation when it fails to invest in resources and technology to improve information asset management. While projects are being initiated and planned, in many cases data availability is not being improved fast enough to meet business needs. We found the largest obstacles were insufficient staffing (68%), inadequate budget (63%) and insufficient training and skills (59%), which means that many organizations are ignoring this issue or operating with less than skilled resources that are already overstretched.

This has to change, and our benchmark found a lot of potential places for improvement. CIOs need to create a strategic plan for information management to ensure they are focused on the factors necessary to equip their information architecture to meet business needs. Unfortunately, even as organizations begin to see the importance of information management, we have the current fixation on handling big data, which takes away resources that could be devoted to getting information management efforts in order. The reality is that big data does not operate efficiently without an efficient information management environment. Just adding another data source that is not well-integrated inevitably increases costs and uses more resources.

Your next step should be to make information management a strategic top agenda item for your CIO. Other priorities, including business analytics, business applications and big data, will not reach their full potential without top-notch information management that integrates business and IT efforts.

 


Reflections on #ACM Webinar “2012 – #BigData: End of the World or End of #BI?”

July 3, 2012

Some notes on the ACM Webinar on 2012 – Big Data: End of the World or End of BI? by Dr. Barry Devlin of 9sight Consulting

The original Data Warehouse Architecture was conceived in 1988 as a single logical storehouse. This changed in the early 1990′s into a layered model of an Enterprise Data Warehouse with Data Marts.

There are four (4) ancient postulates of data warehousing:

  • Postulate 1 (1970s): Operational and informational environments should be separated for both business and technical reasons.
  • Postulate 2 (1980s): A data warehouse is the only way to obtain a dependable, integrated view of the business.
  • Postulate 3 (1980s): The data warehouse is the only possible instantiation of the full enterprise data model.
  • Postulate 4 (1990s): A layered data warehouse is necessary for speedy and reliable query performance.

See: Devlin, B. “Business Integrated Insight (BI2): Reinventing enterprise information management”, (2009), www.9sight.com/resources.htm

Slide #8

This explication of the underlying assumptions (or postulates) helps to explain the evolution of the data warehouse architecture. It seems now that these decisions were made based on the available computing power at the time. The operational data stores were straining under the load at the time, and BI was seen as a luxury compared to the real business of making money. Now with the large computing resources of CPU, disk space, and networks, this constraint is no longer a barrier to integration of front-end and back-end business processes.

Devlin says that the explosion in the number of DW components from the mid 1990s onwards suggests that the data warehouse architecture is failing. From my perspective, this mess came about because some enterprises tried to do data warehousing on the cheap. Requirements were usually vague and the implemented solutions were ad-hoc. I think Devlin is saying that this mess was inevitable given the ancient posulates given above.

After reflecting on this mess, Devlin came up with five (5) modern postulates for highly evolved business:

  1. Modern business processes seamlessly combine action-taking and decision-making, and require an integrated continuum of consistent information.
  2. The new information architecture must be based on a comprehensive enterprise information model, spanning all types of information used in the business.
  3. The business information resource is best maintained as a single copy of each data item, with only the most minimal resort to transient layers or copies of specific subsets of data for specialized needs.
  4. An integrated, model-based and closed-loop process environment is needed to create, maintain and use both the business information an activities.
  5. An integrated, flexible and role-based user interface provides access to the entire business information.

Slide #10

What is a comprehensive enterprise information model? How is it different from a data model? Data model was mentioned in postulate #3 above. So, are we moving up the knowledge hierarchy from data to information? If so, I think the analysis is confused by the ambivalent meaning of data model—see my earlier notes at On the Logical Difference Between Model and Implementation.

Devlin goes on to propose a new architecture Business Integrated Insight (BI2)…covering all information and process:

  • People Personal Action Domain
  • Process Business Function Assembly
  • Information Business Information Resource

See: Devlin, B. “Business Integrated Insight (BI2): Reinventing enterprise information management”, (2009), http://bit.ly/BI2_White_Paper

Slide #11

Devlin introduces Biz-Tech ecosystem. He does not think IT is dead despite what many analysts say. He says that IT has evolved into a Biz-Tech ecosystem which is the fully symbiotic existence and IT. This has the following three (3) characteristics:

  1. Interdependence
    New technology enables business possibilities;
    new business opportunities drive technology advances
  2. Reintegration
    Silos in business and IT are obvious to Web-savvy customers;
    coherence becomes mandatory
  3. Cross-over
    Business people need IT skills to see how to recreate the business with new technoology;
    IT people need business acumen to see how to satisfy business needs in new ways with emerging technology

Slide #14

This view flies in the face of the idea of computing (or IT) as a commodity. IT people need to be integrated into the business as much as sales, marketing, HR, production, and design. All of these people has to come together to create a coherent product for the customer. IT people are no longer resources simply to be brought on the open market. And IT people need to stop thinking of themselves as simply Java programmers or Oracle DBAs.

He gives three (3) examples of Biz-Tech ecosystems:

  1. Business Intelligence reinvents Retail (cf Walmart)
  2. The web recreates the library (cf Wikipedia)
  3. Big data redefines automobile insurance—Pay as you drive

Devlin sees evolution of BI2 occurring in three (3) parts:

  1. Removal of layers in BI2.
    • Introduction of the advanced information warehouse which has pillars rather layers. Data, metadata, and models are shared across the pillars. EDW has evolved into Core Business Data. (See slide #21)
    • Data virtualisation becomes more important by enabling queries to be constructed across differing data stores.
  2. Dealing with new information types:
    • Big data challenges our fundamental beliefs about the relationship between data and knowledge.
    • The DIKW pyramid is no longer valid. (Date -> Information -> Knowledge -> Wisdom) (See slide #23)
  3. Introducing m3 – the modern meaning model (see slide #24)
    • Decision making moves from individual to collaboration
    • Decisions are not rational

Devlin gave the following picture of the Modern Meaning Model:

Devlin's Modern Meaning Model (m3)

I have not absorbed this model yet, but it does appear to be sensible. Whether or not it is useful remains to be seen.

Devlin sees mobile computing as important as the producer and consumer of information, and decisive in team-based decision making (the iSight Model—see slide #29). He sees the informal interactions being recorded for future analysis.

Devlin’s conclusions are:

  1. Overall—simplify the BI environment
    • Less layers, less copies, less ETL
    • Recognise the emerging biz-tech ecosystem
  2. Big Data—forget the hype, but do evaluate
    • Business opportunities may exist in unexpected places
    • Recall that big data has very different characteristics
  3. Enable innovation through team working
    • Collaborative decisioning vs. collaborative BI
    • The emerging role of informal information

Slide #31


HP Preferred Partner Status achieved

February 1, 2011

Image representing Hewlett-Packard as depicted...

Image via CrunchBase

C24 had some fantastic news yesterday, we have achieved HP Preferred Partner Status. As mentioned in earlier posts we have recently had a systems integration push which has proved extremely successful and has enabled us to get a number of accreditation. The business has also employed two highly successful sales people who have a fantastic track record within systems integration, the drive to build a successful business is really gaining momentum.

The solutions the business now provides are:

Microsoft Dynamics Hosting, Hosted Managed Services, Application hosting and delivery and Systems Integration including hardware purchase, burn and soak test, virtualisation, LAN and WAN solutions.


Follow

Get every new post delivered to your Inbox.

Join 745 other followers