Tuesday, May 31, 2011

Data vs. information

Today's post from Dr. Groves, Director of the U.S. Census Bureau encapsulates what this blog, data insights, is all about.

What’s the difference between “data” and “information?”

We’re entering a world where data will be the cheapest commodity around, simply because the society has created systems that automatically track transactions of all sorts. For example, internet search engines build data sets with every entry, Twitter generates tweet data continuously, traffic cameras digitally count cars, scanners record purchases, RFID’s signal the presence of packages and equipment, and internet sites capture and store mouse clicks. Collectively, the society is assembling data on massive amounts of its behaviors. Indeed, if you think of these processes as an ecosystem, it is self-measuring in increasingly broad scope. Indeed, we might label these data as “organic,” a now-natural feature of this ecosystem.

Information is produced from data by uses. Data streams have no meaning until they are used. The user finds meaning in data by bringing questions to the data and finding their answers in the data...
To read the entire post, see: http://blogs.census.gov/2010census/2011/05/designed-data-and-organic-data.html

In an era when there are more sources of data available than ever before, we analysts are challenged to use that data well and in innovative ways. In recent years I have also found that simply because lots of data exist, and the public knows that lots of data exist, analysts are expected to HAVE everything and to KNOW everything with an immediacy that is often impractical and sometimes impossible. In other words, expected to synthesize "information" on any given topic simply because data exist.

The challenge for analysts has gone from turning tailored research data into research findings, to taking streams of sometimes incomplete, and clearly not "tailored," data and turning them into useful information. This change, in some ways, is like having an open fire hydrant and being asked to use the geyser to water an orchid. What you need is there, but certainly not in the form you need it.

The ability to use data well requires both strong traditional analytical training and a clever and creative streak. If anything, careful analysis is more important than ever before, but that alone is no longer sufficient. To be able to capitalize on these new waves of data, analysts will need to develop an ability to synthesize statistics from multiple sources, and also to be critical of the data available. (What pieces are missing? How was the data source changed over time? Is the data representative of a whole population or a selected subset? How can, or should, or shouldn't, the data be extrapolated to other groups?)

These questions and others will certainly keep us busy for a long time.

Saturday, May 28, 2011

Crushing obesity

In the early 1990s, half of the U.S. population could proudly state that they were neither overweight nor obese. Today only one third can make that claim. The rate of obesity* has nearly doubled from about 15 percent of the population to more than 26 percent in less than two decades.

But numbers alone seem inadequate to describing the magnitude of the problem. The U.S. Centers for Disease Control and Prevention (CDC) put together a series of maps showing obesity rates from 1985-present.

Obesity's rapid spread across America:
In the maps below the lightest blue represents obesity rates by state of less than 10 percent. The darkest red-orange represents obesity rates of 30 percent or higher. Note the rapid shift over 20 years from no states having reported obesity rates above 15 percent (in 1989) to no states having rates below 15 percent (in 2009).

So why are we getting fat?
There are a variety of factors at play in the rising obesity epidemic. An article in Slate this week, and on my blog last summer, describes the link between obesity and commuting. The Federal Reserve and the USDA suggest that increased food consumption, particularly fast food, is the primary driver expanding America's waistlines.

New research, from Pennington Biomedical Research Center at Louisiana State University, digs into another trend - our economy - to understand the obesity epidemic. Using data on occupational patterns since the 1960s, coupled with weight data, and energy expenditure (i.e. calories burned) per day per occupation, they come to a startling conclusion: modern jobs make us fat. Specifically "In the early 1960's almost half the jobs in private industry in the U.S. required at least moderate intensity physical activity whereas now less than 20% demand this level of energy expenditure. Since 1960 the estimated mean daily energy expenditure due to work related physical activity has dropped by more than 100 calories in both women and men."

Research from the Mayo Clinic, American Cancer Society, and have come to a similar conclusion - sedentary behavior leads to obesity. So the advent of desk jobs that require sitting for long periods of time may be a primary cause of rising rates of obesity.

One conclusion is certain, whatever the cause (or, more likely, causes) of obesity, the cost is too high to be ignored. Obesity increases a person's risk of heart disease, diabetes, high blood pressure, and certain types of cancer, among other ailments (source: CDC). In 2006 the increased rate of obesity resulted in an estimated $40 billion in healthcare costs.


*According to the CDC, obesity is “defined as a Body Mass Index (BMI) of 30 or greater.BMI is calculated from a person's weight and height and provides a reasonable indicator of body fatness and weight categories that may lead to health problems. Obesity is a major risk factor for cardiovascular disease, certain types of cancer, and type 2 diabetes.

Map images courtesy of the U.S. Centers for Disease Control and Prevention Behavioral Risk Factor Surveillance System.

Tuesday, May 24, 2011

Transportation by generation

Yesterday I participated in a demographics forum at the U.S. Department of Transportation to discuss the issues of an aging population on transportation infrastructure and generational differences in transportation use. The video and slides from the forum have been posted by the DOT.



To view the webcast and slides, click here:

URL: http://mediasite.yorkcast.com/webcast/Viewer/?peid=17f98d9e1a2743a1a9c24b3de09a936e1d

Friday, May 20, 2011

Whadja say? I can't heer yooooo.

Thanks to The Economist for pointing me to this fascinating map from Rick Aschmann.

Having lived in each corner of the country, and traveled to more than half of the states, I find accents fascinating, but never considered mapping the data. I just knew, growing up in southern New England, that "systematic r-dropping" led the word "cart" to sound like "cot" (or, more unfortunately "party" to sound like "potty.") I did not know, as Aschmann documents, that this accent type does not exist anywhere else in the world! Aschmann's work provides an interesting visual display of regional dialects, and also provides a wealth of qualitative data, including samples, on many of them.

Despite the incredible wealth of data on accents and dialects, I do think data for non-English-speaking areas are lacking. For example, the map includes Navajo, despite the fact that fewer than 400,000 people (0.13% of the nation's population) speak Navajo or another Native American language. However, the only areas labeled "Spanish Speaking" are in Mexico. Yet more than 12 percent of the U.S. population speaks Spanish. Spanish-speaking populations account for an even higher share in states along the border. Nearly 30 percent of the population age 5 and older speak Spanish as a primary language in California and Texas. Similarly, in California nearly 3 percent of the population speaks Chinese, another 2 percent speak Tagalog, and 1 percent each speak Vietnamese and Korean. (Source: U.S. Census Bureau, 2009 American Community Survey).

Click on the map, or follow the link below, to access the original.
Original map: http://aschmann.net/AmEng/#DialectDescriptionChart

Wednesday, May 18, 2011

Small (business) is beautiful

In honor of National Small Business week...

Most U.S. companies are small businesses.

The smallest of the small businesses are known as "non-employers" meaning that the business is run by a single individual. "Most are self-employed persons operating unincorporated businesses, and may or may not be the owner's principal source of income," according to the U.S. Census Bureau. There are 21.7 million of these small businesses, contributing just under one trillion dollars to the U.S. economy.

In the next group are the traditional small businesses, with fewer than 5 employees. These account for 3.6 million firms, and with about 1.6 employees on average, employ 6.1 million workers.

However, most American workers work for very large companies. Nearly 40 million workers work for the fewer than 2,000 American firms that could be considered the nation's giants - with 5,000 or more employees.

(On the chart, the number of companies is shown in red, number of employees in grey, and average payroll per employee in green.)


Of all states, Florida seems to be the home of small business. Less than 10 percent of Florida companies have 20 or more employees, and nearly 70 percent have less than 5 employees. Delaware and D.C. have the largest concentrations of large (500+ workers) companies.

The largest firms tend to pay more, on average, to their employees. The largest companies have payroll per employee of $48,000 while the smallest are closer to $38,000. However, those payroll statistics are skewed greatly by the multi-million dollar salaries paid to the executives of the nation's largest companies. According to a study conducted for the New York Times, the median executive pay was $7.7 million in 2009 and $9.6 million in 2010. The Wall Street Journal, using their own survey, reports similar trends.

While executive pay has shown rapid growth since the 1980s, the trends in business size have remained relatively constant over the past decade.

Tuesday, May 10, 2011

Go West, Americans! (Or maybe South?)

Since the Census Bureau started keeping score in 1790, the nation's population has grown fastest in the western and southern regions, shifting the nation's "mean center of population" in a steady march across the continent.

According to the Census Bureau:




The center is determined as the place where an imaginary, flat, weightless and rigid map of the United States would balance perfectly if all residents were of identical weight.

Tracking the mean center of population tells a story of the nation's growth, conflicts, and social change. This interactive map from the U.S. Census Bureau shows the shifting mean center of population over time:


Today's mean center is in Texas County, Missouri - more than 1,000 miles from the first recorded center in Kent County, Maryland (1790). Some of the biggest shifts over time show the nation's development, and at times, growing pains.

Major shifts over time:
1790: First mean center is calculated as falling about 23 miles east of Baltimore, MD.

1810: The Louisiana Purchase (1803) doubled the land area of the nation, and the mean center shifted into Virginia.

1860: The center shifted by more than 80 miles (biggest shift on record) thanks to rapid growth in the nation's western states, driven in large part by the Gold Rush.

1870: Just ten years later the mean center of population experienced it's biggest shift to the north, as Northeastern and Midwestern cities experienced rapid post-Civil War growth as people fled the war-ravaged South. Also during this time Alaska became a U.S. territory (1867).

1920: The smallest shift on record was between 1910 and 1920. The nation's current territory had already been acquired, slowing the rate of westward expansion. The Northeast and Midwest saw large inflows of international migrants. And last, but certainly not least, there was substantial migration of black/African American population out of the South and into the Northeast and Midwest, precipitated by the intense racism that spawned the Jim Crow laws.

1950: After six decades in Indiana (the longest in any one state), the center finally crossed state lines into Illinois.

2010: The center has its biggest recorded shift to the south, as Georgia, Florida, North Carolina, South Carolina, and Texas record rapid population growth.

How American Productivity is like the Kentucky Derby

Throughout the first three quarters of the Kentucky Derby on May 7, Shackleford was way out front. However, something interesting happened in the final stretch. Other horses started to break away from the pack, and despite the jockey's strenuous whipping of Shackleford's flanks, the horse could go no faster.

This scene reminded me of the productivity news released by the Bureau of Labor Statistics two days earlier. (What can I say, I am an inveterate data geek.)

The past couple of years have seen incredible gains in the productivity of the average American worker, measured as output per hour worked. 2009 saw an increase of 3.7 percent over 2008. 2010 was even stronger, with productivity increasing by 3.9 percent over 2009 (one quarter saw a jump of 6.7 percent). The nation hasn't seen that level of productivity growth since 2002, when people were recovering from the shock of the prior September. In 2002 people were just learning how to use their iPods and Google was just getting it's legs, so rapid productivity growth that year can, at least in part, be linked to technology gains.

So, what is behind this recent productivity boom?
And more importantly:
Why did productivity growth start slowing toward the end of 2010, falling to only 1.6 percent in the first quarter of 2011?

While it is possible that companies merely cut the "dead weight" in their laborforce with layoffs, I think the answer is more basic. This brings me back to the Derby analogy... With unemployment rates hovering between 9 and 10 percent in 2009 and 2010, I strongly suspect that American workers were working harder for fear of losing their jobs. Plus, with companies cutting their workforce, the remaining workers had to work harder (or smarter) to keep up with corporate demands for the same (or higher) level of output.

In addition, the majority of those who lost jobs and have since been re-employed report being overqualified for their current gig, according to Pew Research. This would undoubtedly give another, temporary, bump to productivity levels.

But with no major advances in technology, American workers are like Shackleton in the home stretch: there is only so much more performance to be squeezed out of a worker before he or she has nothing left to give.

So productivity growth is likely to tail off for the near-term. This might be troublesome for the corporate bottom line, but may be an unexpected boon for the 14 million unemployed Americans. As the economy recovers and demand picks back up, companies that have already stretched their existing workforce as far as they can will have to begin hiring to keep up with demand.

Photo courtesy of TheRichBrooks: http://www.flickr.com/photos/therichbrooks/

Sunday, May 8, 2011

How many moms?

If you tried to go out to brunch today, you probably noticed that there are a LOT of moms.

In the United States there are about 4.1 million babies born each year, despite a decrease during the recession. About 40 percent of those births were first births, meaning an estimated 1.6 million women are celebrating their first Mother's Day this year.

And while many moms have 2 or 3 children, those 4.1 million babies each year add up to a lot of moms over time. Today there are more than 31.7 million families where kids under the age of 18 are still living at home with their mom (excludes father-only families and children being raised by grandparents or others). And that doesn’t count those of us who have gotten old enough to move out, but still want to treat Mom on Mother’s Day, which brings the total number of mothers to more than 85 million moms in the United States.

Speaking of treating mom, the National Restaurant Association reports that Mother’s Day is the most popular day for dining out. More than a third of respondents planning to dine out said that they would go out for breakfast or brunch, and 20 percent report that they will take mom out for more than one meal that day. According to the National Retail Federation that adds up to more than $3 billion spent on meals for moms today.

The NRF also reports that gift-givers will spend $140 on average for mom, and a survey by Ebates.com showed that residents of California, Oregon, New York, and North Dakota spend the most, while residents of Alabama spend the least. Nationwide total spending for Mother's Day (on any type of gift) is expected to exceed $16 billion.


For San Diego specific stats, see "Moms are worth a lot this Mother's Day"

Photo courtesy of: Clevercupcakes - http://www.flickr.com/photos/clevercupcakes