Getting to the Heart of the Matter – Data (Part 1)

heart-disease

February is heart (disease) awareness month and it is important that we realize that there are TONS of data that exist where we can find out about heart disease and the consequences that it has on our lives and the lives of others.  The Center for Disease Control (CDC) (www.cdc.gov) has data on how many deaths result from heart related illness (the total has not changed all that much from year to year, approximately 610,000 deaths per year according to https://www.cdc.gov/dhdsp/data_statistics/fact_sheets/fs_heart_disease.htm).  The amount of deaths from heart disease is more than those from suicides, unintentional accidents, influenza, diabetes, and chronic lower respiratory diseases (https://www.cdc.gov/nchs/data/nvsr/nvsr60/nvsr60_06.pdf).  What this means is that heart disease is something that not only needs attention, but is in some ways preventable.  According to the CDC website, almost 50% of Americans have AT LEAST ONE of THREE risk factors that are associated with heart disease.  These three are elevated blood pressure, elevated LDL cholesterol, or smoking (https://www.cdc.gov/dhdsp/data_statistics/fact_sheets/fs_heart_disease.htm).  This is not only troubling, but I felt necessary of further “data diving” to see the association between heart disease and areas where I personally have knowledge, like diabetes or high blood pressure.

The CDC has so much data on the subject that I started at this site to look for some data and found a survey called the Behavior Risk Factor Surveillance System (BRFSS) (https://www.cdc.gov/brfss/).  This data is available to anyone and has a great amount of data that is available for download, or for data analysis using CDC web-based analysis tools.  I  went to the “Surveys and Documents” link and found “BRFSS Prevalence and Trends Data” which gave the user the ability to put in risk factors and find the data according to US State, gender, and a number of other characteristics.  This is much better than downloading the data and having to do the analysis yourself, and also gives you an idea of the areas of the country where people are at more risk of heart disease than others.  It is a great resource for those that want to look at the numbers behind the heart disease issue. If nothing else, it presents an interesting look at how the country’s regions have populations that are more at risk of some diseases and not at risk for others.

I also looked at the BRFSS Web Enabled Analysis Tool (WEAT) that allows you to look at the data from a cross-tabulation point of view.  Here you can place characteristics in a number of ways to compare several factors against the disease.  The tool is very easy to use and contains so many factors that it is hard to determine which ones to choose.  However, for the budding data analyst, this is a great way to learn about data analysis and the multi-factor approach to the analysis.  A screen shot of the WEAT page is below (https://nccd.cdc.gov/s_broker/WEATSQL.exe/weat/index.hsql).

 

weat-page

You can see the “Cross Tabulation” link where you can click and set the numerous factors that can be associated with any of the various factors that the survey contain.  Please do not get overwhelmed!  There is so much data here that I used this for a project that I was required to do for one of my graduate classes in statistics from Penn State.  The data were provided, already collected, and catalogued.  All I had to do was do the various tests on this data.  It amazes me that more people do not know about this data treasure trove.  I realize that this is a phone-based survey, but from what I can tell it is one of the most extensive and intensive surveys in order to get a read on different maladies that pertain to the United States and give data analysts those tools.

Although this article was about gathering and understanding data pertaining to heart disease, the data takes you far beyond just that one malady.  But by understanding some of the factors that heart disease entails, the knowledge will undoubtedly help you to understand heart disease as composed of factors, rather than just something that happens as a result of “genetics” as proposed by some.

Enjoy the CDC site and the various ways of using data to clarify a disease that will be with us for a lifetime (hopefully a LONG lifetime).  To control it, we MUST understand it.

Learn, Offer, Value, Educate (LOVE)

Advertisements

Lies, Danged Lies, and…Percentages?!

percentagesAs a person focused on the “truth to data” realm, I find it somewhat frustrating (sometimes amusing, but often frustrating) that there are writers that feel that throwing in a statistic (small “s”) in their articles somehow bring to bear the force of data to make their point.  Such was an article in the Baltimore Sun on 12 February 2017 that started with “An overwhelming 97 percent of scientists agree that climate change is real and that human activity is responsible” (“Fake news may be vulnerable to ‘vaccination'” by Sean Greene).  The irony is that the article was about “fake news.”  Although the writing was excellent, Mr. Greene missed the point entirely with his first sentence.  There are so many questions I have concerning where he got he 97% figure.

  1.  How many is 97%?
  2.  How many are the 3% that are remaining?
  3.  What research is associated with the 97%?  When was their most current research concerning climate change published?

These are just three of the questions that I asked myself while I tracked down the study generally mentioned by Mr. Greene in his article (he cited the study as being from the Pew Research Center report).  I found a study on the Pew Research Center website (http://www.pewinternet.org/2016/10/04/public-views-on-climate-change-and-climate-scientists/) and looked through the article, trying to find the 97% figure mentioned in the quote above anywhere in the article.  I was unsuccessful in finding this figure.  I did find a quote that said “a Pew Research Center survey of members of the American Association for the Advancement of Science (AAAS) found 93% of members with a Ph.D. in Earth sciences (and 87% of all members) say the Earth is warming mostly because of human behavior.”  Again, how many is 93%?  Well I looked at the membership of the AAAS and found that they have no membership figures on their outward facing site, so I had to look at Wikipedia (https://en.wikipedia.org/wiki/American_Association_for_the_Advancement_of_Science) and found that they have 120,000 members.  What this means is that 8400 members (7%) of the AAAS do not agree with the 111,600 members that say that warming is the result of human behavior.  This is something to consider in the long run, especially since 8400 is not a small number of scientists.  The hilarious (read frustrating) part of the entire newspaper article was that the quote at the beginning of the article was used during a survey conducted to show that people are easily manipulated through something as innocuous as  a pie chart with the above quote.  Why sure people are manipulated through pie charts!  They are also manipulated by percentages given in various articles.  (For instance, I read recently that software attacks of cell phones of a particular brand increased 163% in one month!  Would you buy that brand of cell phone?!)

I think that we ALL need to be careful of where we use data and then try to rationalize that use with a reference that can be “pre-bunked” (using a term from Mr. Greene’s article).  I actually agree with many of the suppositions that Mr. Greene wrote in his article, I just think that using data without the raw numbers is the same as saying that 100% of the writers of this blog do not agree with using percentages without backing it up with raw numbers.  Does that make you want to get more numbers to understand the percentages?  Would you like a pie chart?  I thought not.

Learn, Offer, Value, Educate (LOVE)

The Job Is not Done Until the Paperwork is Finished!

paperwork

We often look at the cost of a program and see the outward benefits, but fail to see the underlying costs that are associated with said program.  Such is the complexities involved in any new program, especially when it comes to state or federal government programs.

A great example of this is the Affordable Care Act (ACA), otherwise known as Obamacare.  According to the current data, approximately 20 million people are now on health care that did not have health care in the past (even though there are approximately 6 million that no longer have health care that had it before, giving the NET at 14 million rather than 20 million, but that is another article for another day).  The focus of this article is the associated paperwork that ACA implements as a result of this new program.

Specifically, I would like to mention the 1095-B and 1095-C, Health Care Coverage form and Employer Funded Healthy Care Coverage Form respectively.  Because of various medical coverage, I received a number of these forms and my wife also received a number of these forms.  Now, let’s extrapolate these to the population of people in the US that currently receive these forms.

If the figures are correct that 20 million people get health care coverage, this would mean that there are (at least) 20 million pieces of paper that are generated EACH YEAR to appropriately document that these people have health coverage.  That would mean that there has to be printing devices to print these, mailing costs to mail these, and of course a department to ensure they track the distribution of these forms.

Let’s assume for a moment that it costs 1 dollar to print one of these forms, 30 cents to mail them, and the department in question consists of 20 people each making 50,000 dollars per year.  That would mean that the costs are as follows (per year):

20 million dollars to print

6 million dollars to mail

1 million dollars in salary

TOTAL:  27 million dollars

And this figure does not take into consideration more than the 20 million people who get this form that are currently on health care; in other words the ones that are already on health care coverage.  The costs could be 5 to 10 times what I listed per year.  And this is just for the paperwork!

Now, this is unbelievable low considering the Congressional Budget Office original estimation of the cost of ACA, which was over 700 BILLION DOLLARS for five years between 2014 and 2019 (https://www.cbo.gov/publication/44176).  However, it is important to note the “small” costs that are a part of this that will continue long after the large costs are mitigated (or just maintained as is often the case).

In the meantime, if one takes a look at programs like Social Security, we often do not realize the cost of these types of programs, which approach 1 TRILLION DOLLARS per year in benefits!  It is those types of programs that are associated with TONS of paperwork that, even though they are more digital, does not often decrease the costs of those programs since the maintenance of the documentation for these programs can often lead to additional costs against that program.

How do we correct these paperwork nightmares?  One way might be to introduce legislation that institutes a default choice — that everyone has health care unless proven that they do not.  Of course, I am sure there are other ways to reduce or eliminate these paperwork overflows.  Until then, we will be faced with funding the paper that will be a central part of our lives.

Learn, Offer, Value, Educate (LOVE)

Numeric Hysterics – Ask The Right Questions!

crazy-numbers

 

I have seen so many numbers being thrown around the press lately with little explanation of those numbers.  The numbers are given in headlines or headers that are accompanied by a narrative that incorrectly concludes what those numbers represent or – worse – no narrative that lets the uninformed observer make their own conclusion.  A few examples are necessary in order to further illustrate this very concerning trend.

I was reading in a newspaper that Social Security was receiving a .3 percent increase.  After seeing that, I talked with a few people about the article in different venues, asking them the amount of the increase.  Their response — 3% increase.

I explained that figure was wrong, and that it was in fact POINT 3 % increase.  They looked at me and stated that it was the same.  I explained that a 3% increase meant that for every $1.00 there would be a 3¢ increase.  Again, they said that is the same as POINT 3% increase.  I further explained that a POINT 3% increase meant that for every dollar there would be a .3¢ increase!  In other words, it would take 10 TIMES that increase to make the 3% increase that people think they are getting.  Remember that 3% is the same as saying .03 and .3% increase is the same as same .003.

Well, that is one simple case of incorrect conclusions, but the other one is much more serious.  It entails that percentage of police stops of minorities vs non-minorities in Baltimore County, Maryland.  According to a televised segment, there was a horizontal bar graph that showed that 56% of stops in Baltimore County were made against minorities.  With just a slight explanation, and more editorial comment, the narrator stopped short of explaining in detail where this information originated or what it really meant.

In order to really understand the data, several questions must be asked:

  1.  Where were the stops done (area of the county)?
  2.  Why were the drivers being stopped (warrants, tail lights, speeding)?
  3. What is the percentage of minorities in the area where the officer made the stop?

I list these questions because what the horizontal bar graph presented was just one perspective of the data — the number of stops made and to whom was stopped.  There are questions as to where and why that are not answered by these data.

A more telling data set might have been if the officer gave warnings to non-minorities but not minorities, or if the officer pulled the driver over after they identified the race, but I did not see any of these questions in the bar graph on the screen.  I just saw a graph that (without further description) showed that Baltimore County Police Officers treated minority drivers worse than non-minority drivers.  Without further explanation, or some more specific data, this is not only incorrect, but potentially damaging (guilty before being proved guilty).

There are situations where statistics can help.  A study completed by three researchers partnered at three prestigious universities included jury pools from counties in two states and did a series of statistical testing on these data points.  Their study is both extremely informative and contains a number of developed hypotheses (questions) that were explored and tallied.   I will not go into the conclusion since it is not the conclusion that is important (although well worth the reading of the study), but the lengths to which the students went to study the data, not just present it in its “naked” state.  You can see the study at: http://repository.cmu.edu/cgi/viewcontent.cgi?article=1349&context=heinzworks

So what do I wish to achieve from this article?  I want to point out two very important points:

  1. STOP presenting data without studying that data for spurious conclusions and indicators
  2. ASK the right questions concerning the data so that there is appreciation of what that data REALLY shows

In this day and age, we are prone to take extreme steps without a real representation of what a graph means and how those numbers affect not just us personally, but what they say about us as a collective.  We need to take each data set and question it to the point of getting to the truth.  Only then can we swerve away from numeric hysterics.

Learn, Offer, Value, Educate (LOVE)

“Moon Shot” For Cancer Cure? Not Possible Unless Requirements are Clear!

picture2I, along with others, listened intently to President Obama and Vice President Biden talk emotionally about a “Moon Shot” for a cure for cancer.  Congress is on board, along with just about every American.  After all, the cause is noble, the life saving potential is clear, so let’s all just move on and get it done – right?

I have been a project manager for decades, mainly in the Federal Government and I can tell you that just the pure definition of “Moon Shot” is a misnomer to this very large (and expensive) venture for which we have embarked.

First, the original moon shot had a time frame.  President Kennedy stated in his State of the Union address very specifically the time frame for landing on the moon – by years end of 1969 (“by the end of the decade”).  Okay, what is the time frame of the Cancer “Moon Shot?”   I see that there is web site that speaks to the Moon Shot 2020 initiative, which so far is split into 3 Phases, the last one implementing new immunotherapy by 2020.  In this instance, at least there is a time frame explicitly stated, so at least something is in relative stone, although the “intermediate” steps to these 3 Phases is still in “fluidity.”  This could be an issue in the future since 2020 is less than 3 years away!

However, even President Obama does not believe that a cure can be completed in that amount of time.  President Obama stated to a group of school kids that “[cancer] probably won’t be cured in my lifetime, but I think it will be cured in yours.” (http://www.politico.com/story/2016/01/joe-biden-cancer-research-moonshot-217854).  This is one of the champions of the “Moon Shot.”  That is not a real confidence builder for me if I was the project manager on this one.

Second, the physical goal of the Cancer “Moon Shot” is a moving target.  The Moon was not unpredictable.  We could calculate the orbit of the Moon and make the adjustments accordingly to plan the landing — even where to put the Lunar Excursion Module.  Cancer is a very unpredictable disease since it adapts to individuals and progresses at times silently until the body reacts.  If the goal is to get an immunotherapy by 2020, how will the disease look then?  With the genome map, will the disease adapt to new environments? Cancer is still a moving target.  The frustrations that exist in this endeavor are those that have existed since we have started this battle and will continue until we can get a step ahead of the cancer.  Maybe the REAL goal is not to provide immunotherapy, but to predict where it will strike and prevent it.  A vaccine might be something that will help, but certainly this is not small pox or polio, although at the time these diseases were as illusive and cunning as cancer is now.

Finally, what is the number one killer of people in the United States?  I keep hearing that it is cancer and that would be false!  The number one killer in the US is the same one that has existed for decades (that’s right – decades) — heart disease.  If you do not believe me, then I refer you to the Centers for Disease Control (CDC) which publishes a yearly look at deaths from various causes (https://www.cdc.gov/nchs/data/nvsr/nvsr65/nvsr65_05.pdf*).  We have yet to solve the heart disease problem, although we are making progress in the treatment and prevention of the disease, but all of this comes down to making small steps.

So, let’s review.  The three areas of a project that are vital are cost, time, and quality.  The cost for this project has yet to be specified, since studies could increase in cost as well as the various costs for new research facilities, bureaucracies (like the National Institutes of Health departments that will be developed as a result of this initiative), and other as of yet unspecified costs.  The only time frame I see is for the 3 Phases and the 2020 end date.  It took 10 years (or close to it) to get two humans on the Moon.  It cost billions, and a number of lives in the process.  That was the 1960s!  And the most important thing is that there is a champion like Joe Biden who has taken on the initiative, but even scientists are concerned that his clout will wane after he leaves office (http://www.politico.com/story/2016/01/joe-biden-cancer-research-moonshot-217854).  At this point, although there is a great start to the requirements (the quality end of project management), the actual milestones are few which can lead to some problems in the future when the “lesser” more attainable requirements are avoided for the more “optical” results.

And please remember that this is all a good cause, but the numbers that die from cancer are still less than those that die from heart disease.  I did a little research on age vs disease and the results are below.  From this chart you can see that the death from cancer occurs at younger ages than heart disease.  If the Cancer 2020 effort is focusing on older study patients are they really focusing on that age group that is the proper target?  Just food for thought.

picture1

I wish the study a great deal of good fortune.  Eliminating cancer is something that will undoubtedly help us a nation to build our future with our present population.  And it is easy to cheer on this effort.  Its nobility is something that is indisputable.

Learn, Offer, Value, Educate (LOVE)

 

Screwy Presidential Election? – It’s Happened Before!!

rbh-and-truman

Rutherford B. Hayes (Background)/Harry Truman (Foreground)

I hesitated a few weeks to post this article because the wound is still fresh for those that feel that the presidential election was so uniquely unfair that they felt protesting was the only answer.  I have done some research, and along with some basic statistics, felt that sharing this information may at least give people food for thought.  I will never change individual minds on the outcome (some of them in my own family), but maybe (just maybe) we can all take a breath and realize that these types of elections have happened before.

Some of the things I have heard:

  1.  Get rid of the electoral college.  This type of thing didn’t happen in the past!
  2.  This President-Elect did not get the popular vote.  How can we do this when we are a democracy!
  3.  We will never heal, but be divided forever, thanks to the electorate that voted for this President
  4.  This is just unfair

The first point is very clearly something that may have to be readdressed given the nature of modern America.  The reason for the Electoral College was to ensure that populations in rural areas were counted, which is one reason according to one site (http://www.historycentral.com/elections/Electoralcollgewhy.html) was to equalize the small and large states so that a manipulation of the citizenry would not result in a President elected that would be a tyrant.  The source goes on to say that the small states wanted this compromise in order to approve the Constitution at the Convention in the 1780s.  From a statistical point of view, this is very smart.  In essence, the Electoral College acts as the “standard deviation” or “standard normal curve” to the election process, equalizing the results so that every state is treated in a fair way.  If we take a look at the populations of the states today, and the Presidential Election was done with population only, then the candidate would need only a few states to take the election, California being one of them.   “What a minute!” I hear you say.  If a person wins California NOW they can still take the Electoral Votes in that state.  Yep, but the bottom line is that if the candidate barely wins smaller states, or larger states, even though they do not get the popular vote, they get that state’s electoral votes.  You want to get rid of the Electoral College?  Contact your Congressional Representative and start a Constitutional Amendment to get rid of it (we have done this before as a country).  The problem is that the smaller states like the system, so this could be a problem getting a majority vote (3/4 votes of all states actually).  However, the system allows it, so why not?

Let’s review the fairness issue.  The fairness in the Electoral system is that EVERY state is considered for the Presidential Election, which really proves the US motto “Out of Many – One.”  Of course, one might argue that having a popular vote also proves that motto, since individuals will then vote for the president and, thereby, cut the boundaries that divide the states.  This whole argument comes down to whether an Electoral College is necessary in Modern America, given our ability to communicate worldwide in an instant, our research abilities, and our basic political system.  “After all,” you say, “this type of thing did not happen prior to our living history.”

I am here to tell you that this type of thing happened EXACTLY 140 years ago with the election of Rutherford B. Hayes (http://history1800s.about.com/od/presidentialcampaigns/a/electionof1876.htm).  This election, according to the cited source, was “intensely fought and had a controversial outcome”  –  Sound familiar?  The winner, Hayes, did not get the popular vote AND did not get the majority of electoral vote — but he still won!  The mechanics behind this is a great read (it seems summarized best in the reference above), but suffice it to say an “Electoral Commission” favored Hayes and he won the election.  However, again according to the resource above, his 4 year tenure (he wrote a letter after the Republican Convention that he would only serve one term) was plagued by his perceived illegitimacy, even to the point of calling him “Rutherfraud” B. Hayes.  Samuel Tilden, his opponent and winner of the popular vote, still felt as though he had won (according to the source).  He later fell ill and died, leaving part of his fortune to the New York Public Library.  A philanthropist to the very end.

This covers both 1 and 2 above.  The reason we have an Electoral College is that we are supporting the Constitution, and this has led to some interestingly unique elections, but we are still here to talk about it, referring to our America and progressing as a nation.

But the important story is that some are saying that the country will forever be divided and we will never heal.  I am here to tell you that during my life I have seen the country split because of Civil Rights, the Vietnam War, The Cold War, and Worldwide Terrorism.  I have seen the protests in this country on TV and in real life and it sometimes made me wonder if we would ever heal.  The injustices that erupted, and have been quelled through both good and bad Presidents have not soured our national pride.  We still cheer our USA teams, we are proud of our military (something that was completely different then when I first started in the military), and the country is coming to grips with a variety of economic and social issues.  I remember a phrase about “it takes a village” and I never agreed with that when it came to the US.  “It takes a nation.”    A nation of people who are comfortable enough to know that they can march in protest without fear of retribution; a nation of people that are not put down because of who they voted for in ANY election; a nation of people that respect each others opinions.  It is only then that we can heal this nation and move on.  We have done it before and we will do it again.

Finally, the idea that elections are fair is like saying the stock market is always down (or up).  The bottom line is that everything in politics turns around.  I mean if you look at Presidential elections back 200 years ago you see the election of 1800 when another election was in contest (http://history1800s.about.com/od/presidentialcampaigns/a/electionof1876.htm).  This one ultimately ended with Jefferson taking the Presidency even though he was declared an “Un-Christian Deist.”  However, more tragically was that the person who would support him in Congress, Alexander Hamilton, would later shoot and kill Aaron Burr (Thomas Jefferson’s Vice President) in a duel.

Still think the election is unfair?  Take a look at Harry Truman, who was basically told by the media that he would lose (sound familiar?).  The outcome was very different, with him winning and holding a headline from a paper that would “jump the gun.”  What happened?  Biased sampling leading to biased results (again, sound familiar?).  This biased sampling is summarized very well in this article: https://www.math.upenn.edu/~deturck/m170/wk4/lecture/case2.html.  The bottom line is this:  the bias of a human is something is both unavoidable in the type of sampling done for the Truman/Dewey contest.  If you place a human in the process of choosing who to survey, you are instituting a bias that is very similar to Heisenberg’s Uncertainty Principle.  Heisenberg’s theory contains the basic philosophy that when we attempt to observe, we actually influence, the event (http://science.howstuffworks.com/innovation/science-questions/quantum-suicide2.htm).  It is evident that humans were involved in the surveys of the election of 1948 and the current one.  The biases are there for the current one:  why would anyone state that they were going to vote for Trump, given the overall feelings about him?  That, in itself, biased the sampling.  The result is that he was elected, using the rules of the Electoral College to defeat his opponent.  The more we try to observe, the more we influence.

I know this has been a long article and I apologize.  I have been thinking about this for several months and wanted to finally get my feelings on paper.  The statistics, the data, the surveying, all of this are part of the overall look at this election, but it is people, history, and political uniqueness (or perceived uniqueness) that makes this interesting.

Learn Offer Value Educate (LOVE)

Grace HOOPER?! It’s Grace HOPPER! Get it right!!

grace-hopper-1Well, I have seen some disrespectful news, but I just saw a “ticker” from a major news network (I will not say the name, but the three initials start with “N”), that said a new Center for Cybersecurity was going to be inaugurated at the Naval Academy in Maryland under the name of Grace HOOPER.  The problem is that the name of this giant of computing is Grace HOPPER.  She went from enlisted to “star” officer in the Navy and was responsible for actually making the COBOL computer language (look it up — big stuff here!).

I was so angry when I saw this misspelling of this great lady’s name.  In a world that is priding itself on possibly electing the first woman President, it forgets that there are women that have paved the way for the women now to make an even larger impact on the world.  The sad thing about the injustice that this news agency did to this very important and famous woman is that they forget that there is a SHIP NAMED AFTER HER (specifically a guided missile destroyer)!  That’s right, a US Navy ship bears her last name (http://www.public.navy.mil/surfor/ddg70/Pages/default.aspx).  Heaven forbid that we should get it right on some news ticker!

It is astounding to me that there is someone not checking these tickers to make sure that they get this name correct.  Computer scientists and computer enthusiasts should be offended at this, but to tell the truth I do not know how many of them know that Grace Hopper was so important in their career field.  I am hoping I am wrong and plenty of computing enthusiasts will say they knew who she was.  If not, it is time everyone knew!

There are others that people forget.  For instance, who was the first one to popularize a “pie chart?”  You read this right — who was the first person to make the pie chart popular?

Give up? (Or did you look it up to make sure you got it right?)

It is Florence Nightingale, the nurse who took data and visualized it so that she could get soap for the operating and diagnostic area of the hospital where she worked.

Let’s get this stuff correct, folks.  I have no idea who Grace HOOPER is, but Grace HOPPER is a great pioneer in computers and deserves at least a second look at the spelling of her name.

Learn, Offer, Value, Educate