Data is not “Ta Da”

I have a good friend who is in the IT Project Management area.  He called the other day to say that his update was fully successful, having been distributed to over 1 million pieces of hardware without one call in.  I congratulated him and he felt totally satisfied that he had accomplished something pretty fantastic.  I want to believe that he did.

I did not want to tell him that his euphoria is probably misplaced.  You see, according to “the law of averages” (in other words statistics), if you have no comments from a distribution of 1 million users, there are several issues:

  1.  The call center ignored the calls concerning any problems with the software update
  2.  There were no calls because people thought the problems would clear with time
  3.  The call center explained that there was a new software update and the client should give it 24 hours or so to clear up

Of course, this is only three of many possible alternatives; and this is why I think this way.  As a practicing statistician, I truly believe that outliers exist and should be noted and studied.  In the past, outliers have been ignored and, as a result, calamity occurred.  Just a few examples:

  1. Someone wins a major championship 7 times in a row (big outlier and never studied)
  2. The levies in New Orleans are not upgraded because it is not a good cost benefit analysis number (never questioned, and never fixed)
  3. A million users (or pieces of hardware) are upgraded and NO ONE has a complaint

What we fail to do, as people and especially as IT professionals, is to question good fortune.  I have heard many an IT person (including me) talk about “if it ain’t broke, don’t break it.”  Instead of rechecking the software, we are satisfied with knowing that no one has complained.  We do this because we are busy and onto the next project, upgrade, etc. and do not have time to rehash an old upgrade.  I understand those issues and do not blame the project team.  Who I do blame is the data analyst, who should be studying this anomaly and try to figure out why this upgrade is so “good.”  By not studying these outliers, we are denying ourselves the real analytic questions and, consequently, not really exploring the very nature of data analysis.

What will happen as a result of not studying this issue?  Maybe nothing, or maybe the next upgrade will see a landslide of issues, some from the current upgrade and some as a result of small problems with the previous upgrade that was ignored because they had “no complaints.”

As an IT specialist for the Federal Government, there were times when I had to predict when outages would occur and one time, from my data analysis, I found that it was during a particular holiday.  I briefed the executives on this and that holiday happened and there were no outages.  None.  The executives came to me and challenged my analysis.  They said I did not do a good job of analyzing the data.  The senior executive came to my rescue and asked how many more people the executives had placed on call for that specific holiday.  All the executives were silent.  They had put more people on call to fix any problems before they became critical.  The result:  fewer outages and more continuous operations.  So it was not the data that lied, it was the ability of people to recognize the problems BEFORE they occur and manipulate the data prior to the event.

That is what must happen, even when the numbers are favorable.  A bit of skepticism is okay.  Check out the numbers and check the call center.  If there were truly a million pieces of hardware that did not have a problem, then start figuring out how to give that project team a raise.  Also, make them understand that no complaints is now the new standard which they must attain every time.  I guarantee they will WANT you to check the numbers in that case.

Nothing is perfect.  Nothing.  It flies in the face of human endeavors and certainly in human history.  But it is proven and reliable and when something is perfect – it needs verification.  Or at least validation.  More on this in later articles.  Learn, Offer, Value, and Educate (LOVE).

Advertisements

Should We Certify Students PRIOR to Them Using Their Tech in Schools?

We, as law-abiding citizens and adults, would NEVER allow people to drive on our roads without being tested and certified.  And we set age limits on obtaining a driver’s license.

Why is that?  Cars can kill, and drivers are (at this point) solely responsible for that vehicle.  If you injure or kill a pedestrian or have a collision, the world is turned upside down for a long time.  Trauma, disruption of lives, hurt feelings, legal ramifications, etc.  So we would naturally take the precautions to ensure that the drivers would understand the rules of the road along with having the skill to drive defensively, understand the environment around them, etc.

What is the difference between that and using social networking while in school (or for that matter anywhere)?  If students were required to take a course and a test prior to working their technology in the school, there would be no more excuses like “I didn’t know that post would cause this?!” and similar phrases.

What?! Compare the driving of a multi-ton piece of steel (or plastic) to a social networking post?!  What kind of comparison is that?!

Let’s take just one example.  A student posts a very unflattering post to their social networking page.  It trends and ends up destroying the fellow student.  You can call it cyber bullying, cyber libel, anything you want.  The other student at the crux of this post is not only mortified, but decides to either retaliate or else do self-harm.  Either way, we get back to hurt feelings, trauma, disruption of lives, legal ramifications, etc.  Sound familiar (see above)?  What about cybersecurity in all this?  What if I (Student A) decide that Student B is my friend.  Student B asks me for my password to my social networking site to “seal the deal” of friendship.  I, not wanting to ruin the friendship, give the password to Student B.  The next day, Student B tells Student C the password since Student C is Student B’s friend, but Student A’s enemy (my enemy).  Now I have someone that wants to do me harm having my password.  Bad news but something the certification course can address.

Ladies and gentlemen, we are trying to close the barn door after the horse is long gone.  We have programs to keep students safe, but they are sometimes disjointed and address problems in a non-mandatory form and format.  I remember the “anti-marijuana” movies when I was in middle school and high school and used to laugh at them (most of us did, openly).  Half of them were presented by known drug users, so what was the message here?

Give the courses as part of the beginning of every school year and make it stick.  Get the School Board involved and establish ground rules for using technology in the school (whether it is after class or on school grounds).  Establish a curriculum and make the student and parents sign a certification statement.  I am not sure if any school districts do this now, but it would do two things: (1) It would set the standards, and (2) It would serve as an ethics foundation for the future.  In other words, it would teach the would-be “black hats hackers” of the future some basic ethics that would help them in the future to understand their accountability in the world of cyber and cybersecurity.

There are those that will probably disagree with this post and that is fine.  Disagreement and refinement is part of what life is about.  If this helps just one child to be a better cyber citizen, then it is worth it.  My philosophy is L.O.V.E. (learn, offer, value, and educate).  I want to offer my ideas and learn in the process.

Is Analysis Dead?

I have been reviewing the “cottage industry” of data analytics and have found that they all lack one element — analytics.  Sure, you get all the “buttonology” that goes along with these courses, but the true measure of any academic element is the critical thinking that goes along with these courses.  For instance, here is the curriculum in data analytics at one unnamed major university:

 

  • Orientation and Introduction
  • Data Querying and Reporting
  • Data Access and Management
  • Data Cleaning
  • Statistical Programming Tools
  • Data Mining Overview
  • Geospatial Data Analytics
  • Relational Databases and Data Warehouses
  • Statistical Analysis of Databases
  • Linear Algebra Overview
  • Data Visualization
  • Presentation Skills
  • Teamwork Skills
  • Problem Solving Skill

Notice the people skills are at the end, and the problem solving skill course is at the very end of the list (and why do you want to have a linear algebra review?).  These data analytic course curriculum are stock full of “how” and not “why.” I have taught undergraduate statistics for almost 2 decades and can tell you that everyone coming out of my classes knows what data are and what they are not.  Truthfully, anything can be data, depending on the requirements for the study — whether it be shoe size or just plain dates on a calendar.  The whole reason for math and statistics is to look for patterns, and that means that you HAVE to include data.  But more importantly, crucial in fact, is the organ between your ears — your brain.  It is the human element that makes the thinking essential to any type of data analysis.  It puts the analysis in data analysis.

 

An example is necessary at this juncture.  People get sick, they just do.  They get the flu or flu-like symptoms and they try to battle this with various remedies; but what about environment?  Sure, you can wash your hands and you can stay away from people who are sick, but what about the ones that are incubating the disease or are “carriers” without being aware?  We all react to sickness, but rarely change our environment to prevent the disease.  After reading about the flu, I found out that flu bugs, as many diseases, love dry climates, as well as a certain PH in the air.  So, what if you humidify the air?  The rate of the disease goes down based on the absolute humidity (actual amount of moisture in the air) instead of “relative humidity” (the ratio of air water vapor to saturation, which may vary according to temperature, see article below as citation) according to one study that was a reevaluation of a previous study on humidity and flu (http://www.webmd.com/cold-and-flu/news/20090213/influenza-linked-to-absolute-humidity).  What this means is that increasing the amount of water in the air can reduce the amount of flu in that air, since they hate high absolute humidity.  But, you ask, why is there flu in the tropics which has a high humidity?  Extrapolating the conventional wisdom of the absolute humidity, it is because the “relative humidity” is high but the absolute humidity may be low even though it is humid, since temperature is a factor and there is high temperature in the tropics.

Taken one step further, remember those hot springs and the steam baths?  Well, if the absolute humidity is high, then the bugs don’t want a part of that and therefore reduction of disease.  The people that lived by taking steam baths did not understand the data portion of the answer, they just knew that when they took them they got sick less.  Works for that person, maybe it will work for me.

So what did we learn from this little foray into the world of data analytics?  The analysis is 90% of the process, folks.  Whether you are talking about the flu or cybersecurity, it is all the same.

Did I just make a segue without using that tool?  Ooops.  So let’s talk about data analysis and cybersecurity.  There are tools out there right now that are collecting an immense amount of data, usually to spot an outlier that will reveal the culprit trying to take corporate information and sell it to some other company at the highest bidder, or maybe an insider threat that is bringing down the network, or possibly someone taking 1 cent from every other employee and putting it in their paycheck (see “Superman III”).  But what the data does is just the beginning and you could actually prevent this stuff by some good old fashioned analysis prior to getting the data.  This means that people have to be vigilant, all the employees have to be observant to their workplace environment.  Analysis is not necessarily sitting in front of a white board filled with formulas, nor in front of a computer staring at charts and graphs.  It is just being observant and using a hundred set of eyes rather than one automated tool.  Yes, the tool can help to corral the numbers, but it is really the observation that is the value added measure.

Once when I was at home talking with my Dad, I asked him when the neighbor got a trailer.  He looked at me and asked me how did I know about that since they had not picked it up and certainly had not told many people.  I told him that I saw their truck and it had an “extended” outside rear view mirror, which is used when hauling trailers.  He just looked at me and smiled.  I knew then to keep my eyes open for future possibilities.

One last thing about analysis — there is some guessing to this.  The more analysis you do, the more you look for information that will either confirm or deny that guess.  As the information becomes more apparent, your probability of being right or wrong also increases.

I will write more on this, but suffice it to say that analysis is something we do every day as parents, adults, and certainly a member of the planet Earth.  There are those that analyze very complicated elements of human endeavor such as people like Claude Shannon (see “Fortune’s Formula” by William Poundstone), or more simple analysis like writing a paper for school.  In all cases, it is brought to a conclusion with your brain, not just chart or graph which are tools but not the ultimate answer.  Look around you and listen, analysis starts there.