Visualizing Bikeshare Data

Something I’m working on …


Posted in Uncategorized | Tagged , , | Leave a comment

Buffalo’s Neighborhood Cafés

Just for fun, and because my recent data/mapping energies have been channeled into a project for the Western New York Law Center instead of this blog, I thought I’d post a poster-map-gift I made for a holiday party with fellow neighborhood café employees. And if you haven’t been, you should visit the best one on this map… Sweet_ness 7 Parkside!

Neighborhood Cafe Map

Posted in Uncategorized | Leave a comment

Buffalo NY In Rem 47 Foreclosure Listing

For those interested in Buffalo’s 2013 tax foreclosure auction – or just housing in the area in general – here is a map of the city’s first publication of the In Rem 47 foreclosure listing (5,409 properties), available here.


More mapping soon … it’s been a busy past couple of months!

Posted in Uncategorized | 7 Comments

Great Reset

Richard Florida‘s theories about economic development and the “creative class” don’t always resonate with me, but his latest book The Great Reset most certainly did. He makes some important points about the role of the service sector in our economy, as well as high-speed rail and urban development. My head was nodding vigorously at almost every page.

The thesis of the book is that we are on the brink of a “Great Reset”, or a massive reorganization of the old economic and social order that lead to the 2008 recession.  Just like the urban, industrial era following the Long Depression of 1873, and the suburban, mass-production boom after the Great Depression of the 1930’s, this Reset will be a sweeping transformation of where we live and work, what our jobs are, and how we spend our money.

Megaregions as the new economic geography

Each Reset resolves in a “spatial fix,” or geographic resettlement: the industrial city of the late 1800’s and the suburbs from the 1940’s onwards are examples. Florida projects that the spatial fix to the current Reset is the Megaregion: clusters of major metro areas, secondary cities, and their suburbs. The largest in North America is the “Bos-Wash” corridor (encompasing Boston, New York, Philadelphia, Baltimore, and Washington D.C, with over 50 million people and more than $2 trillion in economic activity). Megaregions, more than nations, run the global economy: the world’s 40 largest megaregions comprise 67% of all economic activity, 85% of all technological innovation, yet only 18% of the population.

People have been concentrating into cities for a while (since 2010, the world’s urban population now comprises the majority), and they will continue to. Megaregions serve as magnets for Florida’s “creative class” and have weathered the recession better than other areas, sustaining high economic and population growth while smaller cities have shrunk. A new theory of urban economics helps to explain why larger cities sustain higher “metabolisms” (a metaphor for the pace of innovation, economic growth and social life) without collapsing into congested and inefficient messes. The study’s authors describe this mechanism as “accelerated innovation cycles,” but I prefer Florida’s wording: “As globalization has increased the financial return on innovation (by widening the consumer market), the pull of innovative places, which are already dense with highly talented workers, has only grown stronger” (p. 152). Megaregions have a bright future.

The new economy: fulfilling jobs in the service sector and beyond.

While higher-paying knowledge/professional/creative jobs are growing and generate substantial wealth, catering solely to the educated “creative class” employed in this sector is an elitist, tunnel-visioned approach to economic development. Florida points out that the service economy, comprised of routine, low-paying, and generally disdained jobs such as in food service, hospitality, cleaning, and health aids, is bigger than any other sector. It comprises over 45% of U.S jobs, and it isn’t going anywhere because unlike manufacturing, it’s impossible to outsource overseas the tasks of cleaning buildings, walking dogs, or cashing people out at the grocery store.

Florida argues that the service sector is a huge untapped source of jobs that could be made better if companies paid front-line workers more liveable wages, offered promotion potential, and made better use of the analytical and social skills of all workers, not just the managers. Companies pioneering this approach have already shown that extending creative input to all workers yields innovations that help the bottom line, and makes for a more fulfilling work experience for employees, reducing costly turnover.

High-speed rail infrastructure is an investment in the new economy

If the highway and private automobile provided the framework for the Suburban spatial fix, High-speed rail will be the backbone of the megaregion economy. High-speed rail is the fastest and most convenient ground transportation available – and is often quicker than air travel when you account for security procedures and wait times. Rail increases connectivity within and between major metropolises and their secondary cities. It facilitates the exchange of people, ideas, and economic functions, broadening labor markets and providing a framework for in-fill development along rail corridors. As I noted in a previous post, economists and politicians have argued that the cost of high-speed rail infrastructure is not justifiable. I was very pleased with Florida’s counterargument, that it is less justifiable to use federal dollars to bail out the auto industries and banks that fueled the recession – in terms of the sprawled, suburban landscape and accompanying housing bubble – in the first place. A new economic order, a Great Reset, calls for new infrastructure. He goes on to say:

“Infrastructure is always expensive, and there’s no clear way to measure the overall future return on the investment, whether it’s in the form of innovation, development, or new communities and jobs. Infrastructure provides a skeleton on which to grow a new economic model. The infrastructure investments we make now will determine the kind of economy we have in the future…In some ways, infrastructure is analogous to government support for basic research in medicine or the social sciences. Such investments, which are either too large or too risky for private companies to undertake, offer a significant social rate of return that can drive future invention, productivity, and growth” (p. 170).

High-speed rail is an example of such infrastructure, a critical adaptation and complement to a new economic order of megaregions, that could offer a high quality of life and fulfilling employment opportunities for those in the professional and service sectors alike.

Aside | Posted on by | Tagged , , , , , , , , | Leave a comment

New York’s Adult Tobacco Survey

I quit, and it feels so good! Not tobacco- a job. For the past 7 months, I’ve been humbled and also mortified by working the front lines of what is sometimes glorified as “primary data collection for health behavior research.” This is only phone peddling of an unexpected sort: rather than sales, donations, or debt collection, the desired outcome is completed surveys. But instead of collecting data, telephone interviewers spend most of their shifts getting yelled at or hung up on. And who could blame the world at large for reacting this way to calls from strangers with stilted and over-eager introductions?

I’ve worked in data collection before, measuring things like traffic volume, pavement cracks, salt marsh redox potential, and all the species of bugs and grubs scraped from riverbed rocks. But this is the first time I’ve had to interface (intervoice?) with fellow humans, and it’s painful. A lot of data these days is collected by pained people like me in call centers like this, where teams of 100+ interviewers cold-call random households and sometimes businesses for research studies commissioned by all sorts of clients (ex: universities, government health departments,) about topics like tobacco use, physical activity, nutrition, and smoking policies in apartment buildings. One of the surveys the office does each year is New York State Adult Tobacco Survey (ATS). When I discovered that this project’s data is published online, I jumped at the opportunity to examine the finished product of all those hours of tedious dialing. Below, I analyze the 2009 and 2010 data.

The purpose of ATS is to help the New York State Tobacco Control Program monitor how attitudes about smoking change over time. The program uses this information to better target their activities (smoking cessation services, media campaigns, and policy work promoting tobacco control), and also to brag that desired behavior/attitude changes reported through ATS are an indicator of their effective programming (FALLACY!). In any case, this is important to understand because tobacco use is currently the leading cause of preventable deaths in the U.S.

National Context:

The Centers for Disease Control and Prevention conducted a 2010 study of 33 Adult Tobacco Surveys conducted in 19 states from 2003-2007. The sample size (or number of survey respondents) for each ATS ranged from 1,300 to 12,000 (NY’s was just over 4,000 in 2009 and 2010). However, all analysis  is done after the data is weighted by each respondent’s probability of selection within the state, according to race, ethnicity, sex, and household size. Bottom line: the numbers presented here are in terms of projected state population rather than simply the number of survey respondents.

The prevalence of current smokers within New York State is  somewhere around 17.0% percent (16.3% in 2009, 17.7% in 2010) – which is fairly low nationally, as the CDC study showed a median prevalence of 19.2%, with states ranging from 13.3% (Hawaii in 2006) to 25.4% (West Virginia in 2005). Tobacco use prevalence for cigarettes, cigars (including cigarillos and little filtered cigars), and smokeless tobacco (chew, dip, snuff) is graphed below, comparing the lowest, highest, and median rates from the CDC review to New York’s 2009/2010 ATS results.

Tobacco use prevalence comparison

Smokin’ Maps?

But how does cigarette smoking prevalence vary across the state?

It’s a tricky question to answer through the ATS. The most precise geographic information in the 2010 data-set is the postal/zip code of each respondent. However, only about half of New York State’s (over-4,000) zip codes are represented in the data, as shown below:

2010 geographic coverage

To avoid bias, I merged the zip code data for county-level analysis, comparing the projected prevalence of current smokers (shown below). Keep in mind that this map is imprecise: estimates from a sample comprising only 0.03% of the state’s adult population, inhabiting only half of its zip codes. It’s unlikely, for example, that over half of adults in Yates county (population of 19,000 over-18) are smokers. However, the map does illustrate some broader trends, such as a cluster of counties radiating South and West from Albany, but North of NYC, with prevalence rates above the state average (17%). Though looking at the map above reveals these counties also happen to have a large proportion of their geographic area unrepresented in the data.

county smoking prevalence

The Cost

The cost of addiction- to the addicted, but also to the rest of the population- is a subject that I find particularly interesting.

Making cigarettes more expensive is meant to discourage smoking, and New York has the highest cigarette excise tax in the nation: bumped up to $4.35 per pack in July 2010. But it’s important to keep in mind that this tax revenue comes disproportionally from the pockets of lower-income residents.

Smoking Status by Income Bracket

That’s because people with annual household incomes under $30,000 are more than twice as likely to be current smokers (42%), compared to the general population (20%) as shown above, and among smokers, those with lower incomes are also 20% more likely to be at least moderately concerned about the cost of cigarettes. But the rising costs are here to stay, as the link between increased cigarette prices and lower smoking prevalence has been vigorously proven (vigorous = meta-analysis of over 500 studies!) across  low, middle, and high income groups.

Some health economists argue that the regressive nature of the excise tax (burdening the poor more than anyone else) is addressed by directing that tax revenue into cessation services that target low-income smokers. Indeed, NY’s ATS data shows that among smokers, those with lower incomes are most likely to be aware of and have used the state’s quit-line (which offers free counseling and Nicotine Replacement Therapy).  So low-income smokers may be making most use of cessation services, but they are also paying for it big-time, and it’s not helping much: the disparity in smoking status between income-groups remains.

Health disparity by social class is nothing new. But when it comes to addiction, nutrition, and other lifestyle factors, discussion tends to gravitate strongly toward the responsibility of the individual. And I agree, individual responsibility is an important factor. But it’s also important to remember what we are collectively responsible for: the barriers to employment, childcare, transportation, and other societal circumstances beyond an individual’s control that may fuel chronic stress and drive them toward certain health behaviors.

Posted in Uncategorized | Tagged , , , , | 1 Comment

Visual Display of Quantitative Information

I just finished Edward Tufte’s Visual Display of Quantitative Information (2nd ed), a classic modern text (or so I hear) on how to design data graphics, which “visually display measured quantities by means of the combined use of points, lines, a coordinate system, numbers, symbols, words, shading, and color”.

This is a great book that I’m sorry I didn’t read sooner. Some key points I took away:

  • Tables are better suited for displaying data-sets with 20 objects or less, while visual graphics are better suited for summarizing a lot of information.
  • Color often muddles rather than clarifies data graphics, as the human eye does not easily give visual ordering to colors. Gray-scale shading, however, does convey a natural visual hierarchy, and so better represents varying quantities than color does.
  • If color is used, avoid red/green contrasts in consideration of color-blind viewers (5-10% of population). I am VERY guilty of this. Green/yellow/red scales are often my default. Contrasts with blue are a safer bet: color-blind people can generally differentiate blue from all other colors.
  • In regards to typography: the more that letters are differentiated, the easier the reading. This means that “serif” rather than “sans serif” fonts are preferable, and all-caps writing writing should be avoided (the more equal height/width/volume of capital letters makes for more difficult reading).
  • Do not vary the design of the graphic (ie the scale, symbology, etc), because this distorts how the viewer perceives variation in the data. Variation in the data is after all what the graphic is there to illustrate – and truthfully so.
  • The number of dimensions in the graphic should not exceed the number of dimensions in the data. So for example, don’t use area (2D, e.g through differently-sized circles) to represent a 1-D measure, such as the value of a dollar over time.
  • Less is more, or, maximize the Data-Ink Ratio. This is the ratio of ink used to represent the data (essential to the graphic) and total ink used to print the graphic (includes grid, frame, axis/scale bar ticks, etc)
  • Maximize the “data density” of the graphic – or the number of entries in the data matrix within the area of the graphic.
  • “Small multiples”, or a series/block of small graphics indexed by changes in a particular variable, are an effective graphic format because they are inherently comparative and tend to have high data densities.
  • Pie charts should never be used, because of their low data-density, and because the human eye is not adept at detecting differences in angles.

Maps in the History of Information Visualization:

One of the most interesting parts of the book outlines the history of data graphics . If you have any doubts that geography is awesome and always has been, consider this. Before charts, before graphs and plots, there were MAPS!

Geographic maps were the first form of data graphics, at least as far as historians can tell. While the first maps found on clay tablets date prior to 3500 BC, thousands of years passed before precise cartographic maps with full grids were created (1100’s AD in China, and 1550 AD in Western civilization), and it wasn’t until 1686 until cartography and statistics merged to create the first thematic map (which Tufte refers to as “data map”, but this has come to mean something else in the IT world, so I avoid the term for clarity’s sake). This early thematic map, courtesy of Edmond Halley, shows the location of trade winds and monsoons. Geographic analysis really blossomed after John Snow’s famous, even mythologized 1854 map of cholera deaths and water pumps in London – some say this kick-started the fields of health geography and spatial epidemiology. Charles Joseph Minard’s multivariate map (published 1869, shown below) of Napoleon’s 1812 Russian campaign is another famous merger of cartography and data visualization that Tufte says “may well be the best statistical graph ever drawn.”


Posted in Uncategorized | Tagged , , , , , , | 2 Comments

Geocoding in Ruby

Sans confidence, competence, I program! Computer languages have always seemed prohibitively complex and very boring. Given my interest in working with data and ‘information’, taking the leap to explore Information Technology is a predictable and necessary next step…that I’ve avoided for years. Having I.T bigshots for both a boyfriend and a brother has done little to alleviate my dread of the terminal and its command lines.

To bulk up my GIS muscles, I tried and failed to teach myself Python from a book this past Fall. The failure was purely one of attention span…how to trudge through mundane code rules, vaguely applicable to matters of interest, when much friendlier reading material was calling to me from the library shelves? Then in January I started to learn Ruby at awesome (& free!) Learning-to-Code classes held every other Monday night at the Co-Work Buffalo office, which uses Chris Pines’ Learn to Program book/website as a guide. This learning attempt was much more successful, in that I actually accomplished stuff – geocoding being the biggest deal to me and my mapping.

Geocoding is one of the most important processes in spatial analysis: matching a place-descriptive data field (like an address, city name) to a precise location in terms of latitude/longitude. To create a geocoder in Ruby I first installed Ruby Gems, which is a “framework” for managing other packages/libraries of code. I then installed a geocoding package aptly named Geocoder, which operates within the Ruby Gems framework and uses Google’s Geocoding API by default to look up addresses and/or geographic coordinates.

I used these packages to write the program below, which reads in a list of addresses (the file Addresses.txt), and writes a new file (LatLong.txt) listing the latitude and longitude coordinates for each address.


My next programming goal is to venture into the endless possibilities of web-scraping, using the Ruby Mechanize package. Inspired by this hilarious map, I want to mine Craigslist for all the rich social data it has to offer.

Posted in Uncategorized | Tagged , , , , | Leave a comment