Transcript of the September 21, 2012 NCVHS Working Group on Data Access
and Use Meeting
[This Transcript is Unedited]
DEPARTMENT OF HEALTH AND HUMAN SERVICES
The Working Group on Data Access and Use
September 21, 2012
Hubert H. Humphrey Building
200 Independence Ave., SW
CASET Associates, Ltd.
Fairfax, Virginia 22030
TABLE OF CONTENTS
- Introductions – Review Agenda
- HHS Chief Technology Officer Perspectives
- Overview of NCHS Data Products and Services
- Overview of CMS Data Products and Services
- HHS Approaches to Dissemination
P R O C E E D I N G S (2:07 p.m.)
DR. CARR: We now open the Data Access and Use Working Group. Keeping with
our tradition, we want to start on time and move forward efficiently. What we
would like to do is go around the room.
(Intro around table)
DR. CARR: Is there anybody who called in on the line?
MR. SIVAK: Thank you very much for having me today. I just wanted to say,
first of all, that I have been hearing some stories all day long about some
crazy barbecue that happened last night. All I have to say is that I want an
invite next time, so put me on the list.
MR. SIVAK: Again, thank you for having me. I unfortunately have to keep it
relatively short as a meeting with the deputy secretary at 2:30 today, that I
need to be at. What I did want to do today is kind of go through a brief
overview of what we are doing with Healthdata.gov, kind of what some of the
directions we are heading in are, and kind of get a sense from you guys about
how that fits in with the work that you are doing, suggestions, comments,
thoughts, all of that kind of stuff.
I will just dive right in. As I am assuming most of you guys know,
Healthdata.gov is sort of the version two of the platform that was launched
with Data.gov, sort of way back in the day. We launched Healthdata.gov in June
of this year, coincidentally with the third edition of the Health Data
Initiative Forum, or Datapalooza as it is affectionately known.
Actually, since about a month or six weeks ago, we have been working on sort
of phase two of version two. We are developing this with an agile team, using
two week sprints to build new features and things like that. We are going to be
releasing this version, this phase, on October 8th, next month.
There are going to be a number of primarily back-end improvements to the
system. I think the one that is most interesting for the folks here is that we
are going to be improving the publishing process and the workflow associated
with that, for the metadata associated with the datasets that people put out.
I think a message went out, actually, I know a message went out relatively
recently to all of the health data leads within HHS. We are basically creating
categories of folks who can go and author metadata entries and then edit
metadata entries. It is actually kind of going back to the discussion that was
just happening, it is relevant in many ways to that. We need to make sure that
the datasets that we are putting out, now that we sort of have the control of
Healthdata.gov and what datasets we put out there, we need to make sure that
the datasets are both of high quality and sort of conform to the issues of the
mosaic effect and re-identification and some of those challenges.
It is actually very important, but I think the new workflow that we are
putting in place here will allow us to do that much more effectively. That is
just one of the improvements. Keep that in mind.
For future iterations of Healthdata.gov, I am personally very interested in
the feedback of both internal and external users, because I think we should use
that to guide our development. Today might not be the right forum for this, but
I would love to get input from you guys about the direction that you think this
thing should take.
I really consider this a marketing tool for us. We have thousands of
datasets across this agency or across the department that I think are
incredibly valuable. In fact, it was funny, I got an email from somebody
yesterday who had gotten the email about the new Healthdata.gov process, and
said something to me about a couple of datasets that they have that they don’t
think are really that important. They are not really sure if it matters.
I looked at it and it is great data. It is data that you can think of a
million different use cases for. I think there is an education process still
that we need to go through internally. This site is primarily a tool for us to
get this information out. The more usable and user friendly it is, the better,
the better the descriptions of the data are, et cetera.
Really, I think over the course of the last few years, this has been, as you
guys all know, a very intense, ongoing effort. These are some of the high level
things that have happened with Healthdata.gov over the past really two and a
half years, but primarily over the past, say, six to eight months as we have
kind of built this out.
There is the metadata enhancement, or the publishing workflow efforts that
we have been undergoing. If you guys have noticed, there’s a blog on
Healthdata.gov, which we’ve used pretty extensively to start to publish some
good material on how data is being used, what some of these datasets are. I am
a big fan of stories, so if any of you guys have stories that you want to
promote, things that you want to tell, I think this is the way to engage
people. We are happy to use this as a platform for anybody who wants to promote
I should also say that we are looking at ways of enhancing the design. The
look and feel, I think, can be brought up to data a little bit. I don’t want to
put anything negative on the team, because I don’t think any of us are really
designers. We kind of put this together, without having the expert services of
somebody who is very familiar with design, and look and feel and user
We are looking at ways now of incorporating that into future versions. We
might even run a design challenge on Healthdata.gov, which I think would be
kind of fun. That is all to say, marketing tool, if you guys have stories,
bring them on. We are happy to use the platform to kind of publish those
I think a big step forward was providing programmatic access to the metadata
catalog through APIs. That is really a first step. I think the real holy grail
is providing an API for all of the data that we have, and that is obviously a
much tougher lift, but something that is obviously right in line with the
digital strategy, and everything that the entire federal government is kind of
moving towards. I think we are seen, rightly so, as leaders in this space, and
so we can continue this by moving aggressively down this path.
This next bullet point is a bit of a misnomer. One of the things that we are
working on doing is enabling the ability to actually store data within
Healthdata.gov. Right now, it is just a metadata catalog. That is great for the
larger agencies and the folks who have some resources to actually host their
datasets. There are lots and lots and lots of examples of smaller entities in
HHS that don’t have this ability. We think that it would be very useful to be
able to provide them a place to maintain some of their content and information.
At the same time, using features of the platform to actually provide these
APIs automatically on the data. This is something that we are working towards.
With this new release, we will have at least the sort of basics of that
capability. That is something to kind of keep in mind, going forward. If it is
a heavy lift to create an API for a specific dataset, it might even be easier
to bring that dataset into Healthdata.gov, and then use the tool itself to
provide the API and the functionality that we need.
Then, finally, we are big fans of open source. It is the whole reason that
we are building this on Secant and Drupal and SOLR. We are working now towards
releasing the entire code base as a project under GET(?) in the next few weeks.
We want to encourage development of this. Much like the data, we don’t want to
lock up the code within the walls of HHS, so we are trying to release this and
get this out there as much as we possibly can.
One thing that has really kind of struck me about the whole data movement
over the past three years, and I have been working on this stuff since I was
with the DC government a few years ago, is that we have had this big push to
get data out. We haven’t done a ton in terms of helping people understand what
the data is all about.
I did this before, I will do this here because I think it is kind of fun.
Just as a really simple example of a dataset that we have, that I think is very
valuable, but it is not accessible to normal people, is one called the National
Plan and Provider Enumeration System Downloadable File. You guys all know what
this is? I figured everybody here would know what that is.
Pretend that you don’t know what it is for a second, and listen to the first
sentence of the description of this dataset. The National Plan and Provider
Enumeration System, NPPES, Downloadable File, also referred to as the NPI
Downloadable File, contains FOIA-disclosable NPPES health care provider data
for health care providers who have been assigned national provider identifiers,
I will admit, I read that and actually one of my guys that you all probably
know, Arnum Chatagy(?), was showing me how we had included this dataset as part
of our health data initiative starter kit, and how it is a great dataset. I
read the description and I was like, I have no idea what that means. I said,
what is that, and he said, oh, well, it’s not totally complete, but it’s pretty
much a list of all of the doctors in the United States. Why didn’t we say that?
That’s a useful dataset, right?
Every time I have said this to an audience that is not primarily composed of
subject matter experts, their reaction is, wow, that is useful. We could use a
list of all of the doctors in the United States, along with their location and
specialty and all of that kind of stuff.
To me, that’s a perfect example of some of the work we need to do, in terms
of really broadening the tent and talking to a much wider group of people. I
think there is a lot of talent out there that we can take advantage of, that we
are missing right now. One of the efforts that we are going to be working on
is, for lack of a better term at the moment, call it data education. We want to
basically use the skills that we have internally, to translate things and make
them available to people on a broader basis.
That is kind of one part of that. Another part of that is we have got people
here who have been working on these datasets for a long, long time. Up until
now, they really haven’t had much contact, in some cases, with the people who
actually could use the data, or maybe who aren’t using the data, but would like
to. I think that there is some value in that two-way conversation, both from
the perspective of the users of the data, who might not know what it means?
Right, I mean, if we could have the person who creates the NPPS or maintains
it say, hey, it is a list of all of the doctors, and here is how we collect it
and here is how you use it. That would be helpful to people. At the same time,
I think it would be helpful to the people inside, who are working on this
stuff, to see how people on the outside want to use it, how they can use it,
how they are using it. I think that will help influence that we develop things
going forward. We are making an effort now to connect people, both virtually
online, but also in person, and I will talk about that in a second, to kind of
help spur some of these developments.
Finally, something else that really struck me as being a little strange was
that we have our Healthdata.gov project, which is based on Drupal and Secant
and SOLR, going on. Simultaneously, as the other federal government entities
have this thing called the OGPL going on, which is based on Secant and Drupal
and SOLR. We are both doing it because they are open source projects, and
because we want to be able to kind of control the development of these things,
or at least control what features and functionality we think are important, we
put our resources to. We want to own the code. We want to build communities
around this stuff.
Right now, we are building two separate communities, which really doesn’t
make a lot of sense to me, because it is on the same stuff. We have recently
started a process to get everybody together and start working on the same code
base, to kind of put these projects together. Obviously, we will have different
features and functionality that we prioritize, but that is fine. That is how
open source works, right?
I just feel like if we put everybody together and get a single code base
going, that is this project, we will have much, much greater buy-in from the
rest of the world. We will have a virtually unlimited pool of developers to tap
into, that hopefully can expand the scope of this rapidly. So far, this process
has actually been working pretty well.
It looks like there is a lot of international interest from the folks in
India and the U.K. and Canada. The U.S.’s been really psyched about this. The
guys in the Healthdata.gov team are pretty excited about this, so I think this
can go somewhere. For any of you guys who want to pitch in on this effort, I
think it would be great. The more support we can show for a unified band of
people, kind of working together on this thing, would be really helpful.
Let’s see, I mentioned the memo that we sent out to the health data leads
already. If anybody has any questions about that, feel free to come talk to us
at any time. The basic timeline is pretty straightforward. We want the leads to
identify their internal teams, the authors and the editors, by October
1st. We want those leads to then compile a catalog of just what
datasets that we are talking about by the 10th.
Then, we are going to do a training web to kind of walk through how the
system actually works and what the new functionality looks like, on the
10th, as well. Then, we will hold office hours until December
1st, to kind of answer any questions that come up. This is the
general idea, that we want to get across.
Now, in that webinar on the 10th, one of the things we are going
to be doing is refreshing people’s familiarity with the privacy issues and the
data quality issues. I think that previous discussion, a lot of that stuff is
probably relevant. It is probably worth getting our guys in touch with some of
the folks here, before that happens, in order to make sure we have the current
thinking kind of in place.
I think the other thing I should say is that over the past few years, we
have spent a lot of time focusing on the idea of changing the culture at HHS
from this default setting of closed data, to a default setting of open, with
thinking about what to do after, as opposed to a blanket restriction and then
trying to open it. I think that has been remarkably successful. I just showed
up here a couple of months ago, and I can tell you, just based on my
conversations, it is almost a presumption of openness now among most people. I
think we have done a great job there.
There are sort of pockets of resistance, I would say, within the department
that we still want to talk to and get them sort of onboard with this whole
thing. We are undertaking this process of meeting with the data owners at the
agencies, with everybody at pretty much every level, to kind of again explain
the importance of this, walk them through what we have done over the past few
years, talk about some of the challenges, some of the successes, make sure they
understand that this isn’t really a terrible thing and there are lots of good
things that can happen from it. Then, also, get feedback from them about what
some of their issues are, how we can help resolve them, what we need to do in
order to make this work better.
We started with the Administration for Children and Families. We met with
them in July. That meeting actually led directly to a whole bunch of data being
released just a few weeks ago, which is incredibly useful actually for many,
many, many purposes. As part of this effort of getting the people who know the
data together with the people who can use the data, we partnered up with some
folks at the Greater Baltimore Technology Council, who are running a set of
events called Unwired and then Groundwork.
Unwired was the first one, and the idea was to basically define the problems
that existed in the city of Baltimore, figure out what assets were available to
solve those problems, have everybody form into teams around specific projects
designed to solve those problems. Then, the Groundwork event, which happens
next week on the 28th and 29th, is to actually build
those solutions, with the time being spent in those intervening weeks to
actually kind of add more assets to the projects and things like that.
It turns out a lot of the ACF data is useful for some of those projects. We
had people at the first event, at Unwired, from ACF and from SAMHSA, who were
there to actually provide their expertise. They are going to come back to the
events on Groundwork, be part of the teams and help explain what data is out
there, how it works, all of that kind of stuff. We are looking forward to
seeing how that goes.
It is an experiment for us. Even if it is not successful, I think we will
learn from it. I would like to see us try to replicate this in other places. We
have already had some interest expressed from folks in Philly, from some folks
in Boston, some West Coast cities. I think there is this really interesting
idea of connecting people who understand the information with people on the
ground who know the problems, and have the access to other assets, to try to
mash it all up and do some interesting things with it. This is another thing
that we are working on.
Finally, we are having our biannual meeting for the HDI data leads in
November. We will send out more information on that, so stay tuned for that.
Almost finally, we are working on a bunch of different things externally.
We have this regional affiliates program for the health data initiative, the
idea being that there are lots of interested parties around the country, who
want to be a part of this and kind of do their own things in their regions. We
are encouraging as much of that as we possibly can. We have some great examples
here. In Cincinnati, two weeks ago, there was an event focused around a health
company accelerator that had some really interesting applications come out of
There is an event, the first week in October, in Boston, a Massachusetts
regional affiliate, kind of along the same lines. There has been a bunch of
activity out there, and we are encouraging more and more folks to kind of sign
up for this program, that are interested. Our job partly is to support those
efforts, and to help those people kind of do as much as they possibly can. I
should say you can find more details on this at the URL at the bottom of the
slide, hdiforum.org. Again, if you have any questions, feel free to contact me.
Finally, the event that you all have been waiting for, the Fourth Edition of
the Health Data Initiative Forum, or Datapalooza. Save the dates were just
announced, June 3rd and 4th, next year. Mark your
calendars, don’t plan anything else. Be there the whole time, it will be fun.
We are in the planning process now, so I would, over the next few months, just
keep an eye out for lots of different pieces of information coming through
around schedules and agendas and suggested topics. If you guys have any
thoughts, let us have it. We are definitely open to suggestions. That is about
it. I am happy to answer any questions. I think I can stay for one minute.
DR. COHEN: If data are going to be housed in Healthdata.gov, how is that
going to interact with the health information warehouse?
MR. SIVAK: That is a good question. My personal opinion is that the datasets
are slightly different, or they can have slightly different purposes. It is an
alternative, in the sense that there is a different use case for it, maybe. The
Health Indicators Warehouse, there is a lot of analysis that goes in there, and
a lot of sort of aggregated data and things like that. I am imagining that, if
we want to provide programmatic access to raw datasets that don’t currently
have a home, it is not a bad place to do it. These are all things that can be
worked out. This is an initial stab.
DR. COHEN: Is the focus going to be on individual level or aggregate level
or whatever is around?
MR. SIVAK: I think it remains to be seen. I would say that kind of whatever
is around would be my answer right now, but that is something that we can talk
DR. WARREN: I love the fact that you use the NPPES as an example, because
you described it wrong. When we are using these datasets, we need to be very
careful, if we start changing the wording, that we have accurate descriptions.
It is more than just physicians, it is any clinician that needs an identifier
in order to be out there.
MR. SIVAK: I want to ask you a question about this. That is a great point,
and this is something that I have seen kind of throughout my career, when
talking to any subject matter expert in any field. Actually, my best friend in
the world, actually he is a philosopher, right, I used him as an example of
I went to this thesis defense because I wanted to be a good friend and show
support. I sat down and it was a four-hour long affair, with him and the
committee that was sort of integrating him. They spent four hours talking about
his thesis, using words that I understood, I had heard before, they were part
of my vocabulary. They were using these words in combinations that literally
made no sense to me. I understood maybe 15 percent of his defense. I like to
think I am a pretty sharp guy.
What I realized at that moment was that, in technical fields, we have a
language that we use because we have to. We have to be precise and we have to
use very specific words, in order to communicate what we are trying to say. At
the same time, if we are trying to involve a broader group of people than we
typically talk to in our technical fields, I think we have to give up some of
that precision, in order to be more approachable. I think this is a potentially
interesting example of that, right?
DR. WARREN: As long as we don’t disenfranchise other people.
MR. SIVAK: Sure, and absolutely right, but I think there is a way to phrase
things in sort of lay language that might not be as precise as using the word,
provider, right, but might be more accessible to folks. My gut tells me that we
can involve a lot more people if we do that. It is something that I am happy to
have the conversation about, because I am interested in this quite a bit.
MR. SCANLON: Bryan, just before you go, this is the group we have just
formed. We have folks who know the data area, the public health area, the
community health area and the technology area, and folks who have helped plan
previous Datapaloozas. We will be using them, and it is available to your
office, as well, as resource experts on how we are doing with HHS data. Do you
have ideas about how we can reach out better to the developer community, and
what kinds of datasets do you think would be more useful.
The other thing is, as a FACA, you can hold meetings with the public, and
with anyone really, any group, and be covered under agency consultation. Again,
it provides the opportunity for open meetings with these communities for the
department, as well.
MR. SIVAK: That is great. I am sorry I have to run. If anybody has any
questions, please feel free to just drop me an email. It is just
Bryan.Sivak@HHS.gov and I am happy to help.
DR. CARR: Thank you so much, this is great. It was clear and we understood
it. All right. Now, Jim, I keep preempting you, but back to you, review charge
MR. SCANLON: To review, we invited you, you were all recommended highly to
serve on this working group. The focus really is, as Bryan said, we are not
developers. We have a lot of folks who are experts in data and surveys and
research and programs, but we collect a lot of data. Some of it is intended to
be public, and they have extensive dissemination programs. Other data is more
administrative and you sort of have to turn it into a data product.
What we are going to try to do, and we really look to you for help, what
would be the best way. We would like to expose you to the kind of data that
HHS. To be honest, what you see on Healthdata.gov is probably the tip of the
iceberg. That is the data that can be put on an open health data website with
no restrictions. I think you know, that is probably not what we can do most of
the time, other than directory data or location-based data, public lead data.
We have a lot of other data. We have all of the CMS claims data, for
example, that can be made available for research analysis quality and so on. We
can’t put that, for the reasons you heard at the previous discussion. We can’t
just put that on a public website and cross our fingers. We will all be in jail
before the day ended.
What we are asking is your help on what data do we have. We will talk about
how we make it available now. If you could advise us on how we would reach out
even further, I think you particularly could develop a community. How would we
even interact more, because again, our folks sort of stop when they publish or
move on with the next study. You really know how that data is taken and
applied. Many of you make a living doing that actually, so that is the kind of
advice that we would need.
All the while, thinking of protecting where we are making it available, how
do we be sure to protect the confidentiality, and generally give us advice. I
think we will start, as we will today, with two of our biggest data holder
producer organizations, the National Center for Health Statistics and CMS. We
will take you through some of the others, as well, at Public Health Data, and
we will be looking for more advice.
DR. CARR: I would just add, I realize this is our second meeting, and Todd
really asked us to work quickly, and we took the summer off. It is our
intention to be nimble, quick, focused and really take this. In fact, Bruce
brought forward an excellent application where we may begin to say, here is
what is out there, and here would be a way to use it. We will talk about that
later. I think we are going to hit the ground, working hard, before we leave
here today. With that, Jim, can we turn it over to you?
MR. CRAVER: Thank you. I was hoping to squeeze another 10 minutes out of
your agenda, but that has been taken from me.
DR. CARR: No, we are flexible. We are here to learn basically.
MR. CRAVER: I am here on behalf of the National Center for Health
Statistics. Really, my objective is to give you a whirlwind tour of some of the
pieces that are available to you. I am going to try to jump back and forth,
between the Web and the presentation. If I am going too quickly, stop me.
DR. CARR: Did you say the Web?
MR. CRAVER: Yes.
DR. CARR: Are you familiar with the HHS building?
MR. CRAVER: Yes, I am, so I can stay within the presentation, if I have to.
Thank you for the reminder. These are some of the areas that I am going to
touch for the presentation, just a brief overview of some of the data systems
and how we think of them, how we organize them. Then, ask the question or
answer the question, why might we approach a single health topic from multiple
perspectives with multiple surveys. Then, dive in to some of the tools and some
of the resources that are available to you and to the public really.
Again, my objective is that so your familiarity with NCHS and its resources
is increased by the end of this talk. Also, for you to be empowered and able on
your own, to go even more in-depth into some of those resources.
Just broadly speaking, the types of data sources that we usually refer to
within NCHS are the vital statistic system, the births and deaths, mortality
and natality datasets. We have surveys of individual people, and that includes
person to person interviews, computer-assisted interviews, knocking on doors or
telephone surveys, as well as bringing people through our Mobile Examination
Clinic, the MEC, which not too long ago was sitting out in front of this
MS. GREENBERG: It still is.
MR. CRAVER: It is still here? I came in the back way.
MS. GREENBERG: We had hoped to visit it during this meeting, but there was
just too much going on.
MR. CRAVER: That is a lost opportunity or missed opportunity, too bad. It is
maybe four and a half trailers, I know they say five, but the last one is a
small, short trailer. For our new survey, the National Youth Fitness Survey,
which is bringing in youth, young children, and adolescents, and actually
measuring their fitness, and their height and weight. It is really the gold
standard of clinical measures of individuals.
You can imagine the provocative statement that I always say is, imagine you
walk up to someone and ask them what their weight is and you will get a number.
If you put a scale on the ground in front of them and ask them to step on it,
you will get a different number. You get a sense of kind of the reasons why we
like to approach topics from multiple perspectives.
We also have the National Health Care Survey. It is a family of surveys
which do survey providers, doctors, clinicians and others, who are in the
health care arena. That is really quite a broad area of surveys that are
coordinated and ongoing. Those tend to be of administrative records or records
that are abstracted from hospitals and clinics and other places where people
interact with the health care system.
I have sort of hinted at or touched on some of the reasons why we have
multiple sources. You have probably have guessed already, to really capture all
aspects or all facets or all sides of the health care industry, the health care
system, health care as it is used, and health care as it not used. We even have
estimates of undiagnosed disease prevalence and incidence rates.
We also use our data to look at some of the methodological issues regarding
collecting the data and analysis of the data, how best to understand the data
that we have, how best to combine that data with other datasets. Also, to look
at extending a data system that we have through linking it at the record level
to other datasets. We have an ongoing program, we call it the Linkage Program,
that takes Social Security Administration data, and takes our survey data, and
it takes CMS data and census data, and we merge that together at the individual
level, at the record level. We essentially come up with another dataset that
really has a full range or a fuller range of information at the record level.
Your discussion about disclosure and re-identification starts to make some
of the people at NCHS very nervous, because we do have this kind of data that
is tapping into data sources from multiple agencies, and really intentionally
doing that. Then, we have to release that data in an aggregate form, in a form
that is perturbed in some way, or guards against that disclosure. Or provide
access to researchers to the raw data that we have, in a very secure format,
with high levels of assurance from them that they won’t do anything that they
shouldn’t do with those data.
These are the tools. I am going to get up and grab my water, because I am
starting to get dry. It is warm in here for me. These are some of the tools
that I will touch on. I will try to jump over and take a look at those. If I
don’t succeed, because it will actually take me a little bit longer if I do
that, but if I don’t succeed, I will stay within the presentation.
Just looking at the homepage for NCHS, I will give you a little bit of the
geography. This is maybe a month and a half old. Our website was revamped. We
are very proud of the work that went into this. I have to tell you, though, as
an older user of NCHS website, it took me about three days and a phone call to
find out where FastStats was. There is a big red arrow, that is to help you. It
was right there in front of me, and I couldn’t find it. The person who I called
was very nice and didn’t say anything bad.
Scrolling down the screen, you will get to the data access areas, and then
you will get to the additional resource areas. Obviously, this page has a lot
of information and a lot of links that go off into different parts of NCHS.
These five red arrows point to the five areas that I am going to look at today.
The first is the Health Indicators Warehouse, the first one that I am going
to feature. I appreciate question earlier, Bruce. It is one that I think about
often. Bryan and I have talked, and we will have ongoing conversations about
it, so I have my thoughts that are not inconsistent with some of his thoughts.
The Health Indicators Warehouse is all aggregate, public use data. There are
no individual records, it is all data that anyone can come and use, access,
download and access through an API. The system has the ability to graph data.
There is some linking to interventions. We use, right now, the community guide
to preventive practice and the guide to clinical preventive practice for some
of those. It was an idea that was proposed very early on, that we don’t just
put out indicators like the number of people who have diabetes. If someone in a
community wants to do something about that, how can we point them in a
direction that they then might be able to do something about.
Because that data is available through RAPI, that then can be used by
third-party developers, which is really one of the things that this project is
really trying to drive also, not just open data, but also to seed that
community of developers, third-party developers, who aren’t necessarily in
public health, population health circles, but are developers, and want access
to high quality data.
We have about 1100, I think it’s 1170, 1169, indicators on the system right
now. It comes from a variety of data sources. I am embarrassed slightly that
CMS is not called out specifically here. It is included, of course, under HHS.
We do have maybe about 150, 170 indicators that are from CMS, and many of those
are only available through the health indicators warehouse.
Currently, those indicators only have a single year of data, 2008. We have,
and are about to make available to the public, four years of data, and then,
very soon after that, five years of data. There is going to be a little bit of
a tweak for the interface, but that will happen quite shortly.
Let me get to the homepage and see if I can come over. Here, I am on the
browser, and I am on our homepage at NCHS, and I am going to dive down the
Health Indicators Warehouse. Just a quick geography, we have three main areas
by topic, by geography or initiative, to get into the warehouse. I like to say
that these are basically three doors into the same very large room.
What a user will do is select a topic and jump over to what we call our
filter page. Immediately, you are presented with a subset of the 1100
indicators, related to the topic that you are interested in. I am just going to
pick something. By the end of the day, this might be relevant. I am looking at
binge drinking in adults.
We don’t force you to read the metadata, but we do force you to click
through the page that has the metadata on it. At NCHS, we do think that that is
very important. You don’t really know what indicator it is you are looking at,
just by the title of that indicator. You need to know something about the
methodology, something about the numerator, something about the denominator.
We, at least, put that in front of you.
We also tell you the data source and what years might be available for these
data, and some of the dimensions or variables that we have. It is a quick click
over to the data display, which we start with a chart. In three or four clicks,
you are looking at the data that we have on a particular indicator. This
interface is to help you get familiar with the content of the warehouse. We
then let you download these data, on an indicator by indicator basis. We also
expose the database through an API, so you can gather as much about all of the
indicators as you want.
DR. TANG: Who determines the 1100 indicators the first time, and who
determines that HHS will actually maintain it?
MR. CRAVER: A really good question, it is one that we struggle with. We
started with what we refer to as an initiative. We made the assumptions that
there are groups out there, within HHS and closely aligned with HHS, that have
already done that heavy lifting. The obvious one for us is Healthy People. The
data stewards for Healthy People happen to be in NCHS. They happen to be about
50 feet down the hall from my office. It has been a very nice collaboration.
Now, ODPHP and assistant secretary for health, they have their interests and
their agenda, and their audiences for Healthy People. They also have a heavy
respect for the data that go into that. Right now, they are well over 50
percent of the indicators in the warehouse are associated with that initiative.
We also have a subset, not a complete set, but a subset of the county health
status indicators, and the county health rankings of the community health
status indicators, as well, as I mentioned earlier, the CMS indicators.
Together, that adds up to 1100, 1200.
Moving forward, it is a very good question, because what we like to do is
receive data already processed for us. We do some quality control check,
quality assurance checks on that. Within this project, we do some data
We try to increase the efficiencies with the other projects that are
happening at NCHS, including Health U.S., which I will talk about in a second,
which programs a lot of data to make available to the world, Health Data
Interactive, which I will also talk about, programs a lot of data, makes it
available to the world. But it makes it available in a different way. Both of
those projects and products have their own history and their audience and their
We think it is reasonable, and we do have a governance body that is the
overseers come from the NCHS Board of Scientific Counselors. Then, we have our
indicator advisory group, which is made up of members, of representatives of
different HHS agencies. We have a statistical standards group, which is made up
of staff of NCHS, and representatives of some of the data sources.
Just to make sure, we have a couple of levels of criteria before we put any
indicator and any data in here. For example, if you have a survey, we want to
know what the response rate is and we want to publish that. We don’t want to
just say, it has a good response rate. We want to include confidence intervals
where it is relevant, or standard areas where available, those sorts of things.
We make sure that that is true.
Each year, when a survey repopulates its publically available dataset, then
we go through and we process it as quickly as we can. It is not small task, as
you can imagine.
I have jumped back to my presentation. Just to move along a little bit here,
a couple of screen shots –metadata from a different indicator, stroke deaths.
Here is just an example of a different way that you can display the data. We
saw the chart format for binge drinking. Those data are also available in map
format, state by state. Where available, we have county level data, and that is
mappable, as well. As you can guess, most people want the lowest level of
geography as possible, for as much data as possible. We assure you we do that.
Unfortunately, that mainly means county level.
I am going to move over to Health United States, and I see that I am going
long here. I am going to stay within the presentation here. Health United
States, I actually meant to bring the volume. Who has seen or who owns a copy
of Health United States? Everyone here should have it.
MS. GREENBERG: It is routinely sent to the members of the National
Committee. Actually, for comment, but also, we will include the workgroup the
next time it goes out.
MR. CRAVER: I think the first time the workgroup met, we distributed the
Health US in brief.
MS. GREENBERG: If anyone doesn’t have it, let us know.
MR. CRAVER: You are very familiar with Health United States, the reason for
its being, the fact that it is legislatively mandated, and that it covers the
broad range of sources and topics. It provides, for example, this window onto
the world, how might you start to describe selected trends in health care use?
There are tables on preventive services, prescription drug use and inpatient
surgery. Here, they are associated with the different surveys that we use as
the sources. I just have some example slides, pulled from the text of Health
One of the interesting developments just around the corner for Health United
States is a collaboration that we have with elaborate(?) medicine, to make an
interactive version of the Health United States available to the world. You can
dive deeper and deeper, and deeper still, into these tables, deeper than what
is in the published version, deeper than what is on the Excel sheets that you
can download for the content of the warehouse. That might deserve its own
presentation at some point.
I mentioned Healthy People earlier. We have recently transitioned from
Healthy People 2010 to Healthy People 2020. That allowed an opportunity for
people to review the indicators that are there, the number of objectives have
increased. That, I think, just reflects the additional feedback and visibility
to that project. Moving forward, their challenge will be to track that data and
to keep that data updated. One of the things that they have done is to hone in
what they are calling their 10 leading health indicators. They are producing
reports about those 10 leading health indicators for Healthy People.
The example that I have pulled out here has to do with heart disease and
stroke. For Healthy People 2010, there were 17 objectives tracked. Now, there
are 18. There were five data sources. The same data sources are used, but there
are an additional 31 developmental indictors. Developmental indicators are, if
you want to think about sort of aspirational indicators. They are indicators or
objectives that ought to be collected and ought to be available in order to
more thoroughly understand the area or the topic.
This is a capture taken of the final review for Healthy People 2010. I think
it is really worth looking at that PDF. The URL is up here, and I apologize
that I did not have the slides to distribute. If that is necessary or desired,
I would be happy to do that.
MS. GREENBERG: We will post them. Is the working group on the SharePoint,
too? We will post them publically, because it is part of the public meeting. We
can also put them on the SharePoint.
MR. CRAVER: I am sorry you don’t have them in front of you. This is, I
think, a really creative way to display a lot of information and to look at the
trends on several indicators or objectives for a topic area, all at once on one
page. This really does capture a lot of complexity. You can stay at that
surface level, or you can look at the details and really gather a lot of
Health Data Interactive is another project that NCHS has. It also has
aggregate data. It also has indicators, or in this project, we talk about
tables. They are tabular views of the data. I am going to hold this up as I am
talking about this, because it really has an interactivity that I think is
worth understanding. Once you understand that interactivity, then you can
really see the value of this site and why it exists separate from the
When you drop into HDI, as we call it, it was the original HDI, I will have
you know, it is not that Health Data Initiative, we had the name first. You
come to our splash page with the table topics. When you click on a table, this
screenshot shows the results if you had searched for the word, asthma, you get
the list of tables inside the application. From there, you can look at charts,
you can look at tables.
It is much more interesting if we look at, let’s say, okay, ED visits in the
United States. What I want to show you is this is our opening view for this
table. We think that this probably meets most of the needs for most of the
people who come to this page. However, these little shaded parts allow you to
click and drag, and customize the view of this table. Suddenly, you have taken
this dimensional hyper cube of data and twisted it, and shown a different face
that you might be interested in.
Perhaps, you want to look at diagnosis by regions of the country. If that is
not how you want to display it, then maybe this is how you want to display it.
HDI, Health Data Interactive, lets you take these 50 odd tables, and not just
look at the data that is there, but start to manipulate it and start to come up
with a way of changing what you see, being able to focus on something that you
are interested in exclusively.
Now, this is really your table. This is your customized view of the table.
Of course, this can be looked at in chart format and where available by
geographically, a map display. That is Health Data Interactive. That process is
the same process that was used to create this customized chart. It really is an
interactive tool for manipulating the data.
Lastly, I am going to mention FastStats. FastStats is the place to go, if
you don’t know the number, but you know the descriptor of the number. Say you
want to look up the number of people who have diabetes and what is the current
value of that. FastStats is the place to go. That is in the upper right
quadrant of the homepage. It provides a quick access to a long laundry list of
topics that we maintain and update. Whenever there is new data on a topic, we
go in and we update that FastStats page. Here is an example page for diabetes.
We cover that for, in this case, morbidity and health care use and so on.
That is the end of my talk. I encourage you to take a look at each of these
tools, to explore the NCHS webpage. That is really your portal to these tools.
Each of them has an approach and an audience and a reason for being, that is
slightly different from the other. We are, at the backend, really trying to
increase our production efficiencies.
One of the other things that certainly the Health Indicators Warehouses
spurred us to do even more than we already is to pay attention to harmonization
and pay attention to standardization. That is flowing back and forth, as we
speak, with the Healthy People project, with Health Data Interactive, with CMS.
It is an exciting set of tools that we have. I hope I have exposed you to some
of those, and that you are able to take them and run with them. Any questions?
DR. CARR: I just want to say, it is great. It really is well thought
through. It is sort of user friendly, because it makes sense where you want to
go and where you find it. I really like it.
DR. ROSENTHAL: One of the external developers that Bryan and company are
trying to target, what has been their reaction to it. Speaking as one of those,
one thing that springs immediately to mind would be entity diagram. My question
is, in terms of target groups, you mentioned it was for different audiences,
One of my questions was, how do you reconcile this with what Bryan was
saying, in terms of what is the nature of the reaction of external developers.
From an external development kind of perspective, when I am looking at things
outside health data, the very first thing I would expect to see is a big ole
entity diagram up in the middle of it.
I am looking at interactive tables. If I want to develop, it is not enough
to know this metadata is defined in sentences. I need to see what is this
thing, and this goes for CMS, too, parent, county, org, et cetera. What is this
thing, this piece of metadata, where does it exist as a piece of data or as a
summary. It is very, very important for someone coming in, who doesn’t
understand the nature of the difference between physician, provider, clinician,
et cetera, et cetera.
Show me that physically, what is the nature of that relationship. Typically,
call it an entity diagram. We have spoken in other committees with some of the
NORC and IMPACT people. They said there is no reason not to share that.
If I am coming in and trying to put on HHS’s glasses, and take a look at
what the world looks like through your eyes, when you’re looking at your data,
what is the single quickest thing I can see, I immediately go to that. My
question is, this is all fantastic and wonderful, but from a development point
of view, that would be the absolute first thing.
MR. CRAVER: I don’t disagree. The warehouse is, to date, the data that is
available through an API, and that is the place where there is the most
critical need for an ERD. One of the issues that we have as a federal project
is security. We have some people who are at one end of the continuum of locking
down everything, and we have other people on the other end who say, it is
public use data and there are no secrets.
For systems like that, we have to make sure that we are not handing the
tools to someone who wants to crack it and open it up, and do something
nefarious. I know that same continuum of people who anticipate that and worry
about that, and make me as a project officer and project manager behave a
certain because of their fears, whether they are real or not, they are what I
am compelled to follow.
We do have plans to put up an ERD that should be sufficient. I do want to
engage on a one-to-one basis with people who are trying to use our API, so that
I can learn from them how to improve it, and what other resources. We do have
user’s guides and we have data dictionaries and those sorts of things.
There is at least one piece missing, and it may be just the ERD, but I think
there is another piece missing, too. I actually have a person on staff, who I
am working with, I basically said, you go use the API and you build me an
example that I can post, an example app. I am circling back around with him in
the next couple of weeks.
DR. ROSENTHAL: I have actually prepared some slides, and when we can get
into what we could do, showing tangible examples of how to do this, especially
with things that are very concrete, specific examples that have very little to
do with security, in terms of nature of relationship between parent or
contract. I can share those with the committee.
MR. SCANLON: Jim, if I am remember correctly, for the warehouse, anyway, the
backup data that supports the graphics and others, isn’t it machine readable?
Am I thinking of Health Data? So that if you had the metadata, you saw an
application, we have access to the data, as well, right?
MR. CRAVER: Absolutely, that is correct. What people have struggled with is,
I mean, it is an obvious problem. You have a series of tables that are related
to each other, through foreign keys, and you don’t have the Rosetta Stone of
how those relationships are built. If you know the data, you can sort of guess.
If you don’t know the data, you are lost in the wilderness. What we want to do
is we want to encourage use, not discourage.
MR. DAVENHALL: Is it possible, Jim, that you can start to provide for this
working group visitor statistics, metrics sites, give us some sort of sense of
both anything else you can tell us about that.
MR. CRAVER: I can give you a kind of up to date statistics on the warehouse.
We are generally around 1000 unique users in a given week. That has been pretty
steady, it falls off in the summer, as you would expect. It is starting to ramp
up again now that schools are back in session, people are back from vacation,
I haven’t looked at HDI recently. That, a while ago, was more around the
5000. I am guessing that has tailed off a little bit with some of these other
tools. Frankly, Health US, I don’t really know. I could probably find that out,
but I would have to look at that, and FastStats, also.
DR. CARR: The users are identified by their email, is that how?
MR. CRAVER: Again, there is something called CNA, Certification
Accreditation, which any IT project in the government is supposed to go
through. There are levels of oversight involved. If you collect PII information
from visitors, such as their email address and their name, first name, last
name, location, phone numbers, those sorts of things, you have a much, much
higher hurdle to jump over, just to get your project out the door.
Most projects, including these, basically it is an open door and anyone can
come and go as they please, and we don’t track who you are. We might track
whether someone has walked through the turnstile, but we don’t take the
fingerprint when they go through. We can give numbers. For the warehouse, we
are currently using Google Analytics, which allows you to take a look at unique
visitors during that time period that you are looking at. You can say, in a
given week, this person only came once. Or if they came again, I didn’t count
them a second time.
The other tools, I am less familiar with how they track their users. Unless
you see a site where you have to register or you have to deposit your ID, your
email address or something like that, you are not likely to do that. Now, I am
on the brink of making a decision, and having a discussion about asking more
information about users of the API, so that we can provide more one-on-one
relationship. I think it is worth the cost for me, as a project officer, to get
that through. I think the audience will give that stuff away. They are not
going to care about it.
MR. SCANLON: But you do accept suggestions or complaints?
MR. CRAVER: Absolutely.
DR. VAUGHAN: Are you thinking along the lines of what they are asking for
labor, in terms of applying for developer keys?
MR. CRAVER: Yes, just so that we can then go back to them and get the kinds
of stories that Bryan is asking for. Or say, we have this new resource, or we
are thinking of doing this, can you give us some more targeted feedback.
DR. VAUGHAN: I was real interested to know, to what extent exists now, or is
anticipated alignment of these same initiatives with what is going on in the
states and counties, many of whom are also looking to open up their data, and
kind of piggybacking on what Bryan is saying. They are using Drupal, blah blah
blah blah. What does that alignment landscape look like?
MR. CRAVER: Well, the warehouse is not open source. Health United States,
the data is available through Excel. HDI, Health Data Interactive is also in a
propriety system, and FastStats is html, it is up on the web. We have not had
much focused interaction with representatives from the states, although we are
also on the brink of doing an evaluation study and trying to get input from
state directors of public health and their deputies, feedback and baseline
awareness on survey. That should be happening soon.
I am hoping that, in addition to getting that data, that will sort of climb
a little bit of the pump for that back and forth communication, so we can have
those exchanges. Now, in terms of open source, that was a decision to go down,
not to have open source. It was a decision that was made, but that doesn’t mean
that couldn’t be changed at some point in the future.
DR. COHEN: Can I respond a little to your question? There are, I would say,
between 20 and 30 state-based, web-based query systems, either directly from
state health departments or in conjunction with partners. Some of them are open
source codes, some of them are in-house development, and some of them use
essentially cut solutions. Then, there a couple of vendors who sell essentially
products to create that functionality at the state.
Obviously, the states in general go down to county, and some go down below
county to large communities. In a couple of cases, ZIP Codes for some of the
data that they include in there, they are equivalent to the indicator
warehouse. They draw from a variety of data sources, the analogous data sources
that are available at the state level.
I hope that, as we move down the road here thinking about development, part
of the conversation should be stimulating state and local developments. There
are a bunch of counties, San Francisco is a leader and has always held up to
the light. There is a variety of these initiatives at the county and community
level, as well. Some of them have very interesting properties. Seattle allows
actually folks to enter their own data in a framework, so they can choose data
that is secondary data, as well as marry that to primary data. There are lots
of different threads going on.
DR. ROSENTHAL: What might be really helpful, in terms of assessing where to
allocate resources, in terms of bumping up depth or frequency outside of the
privacy conversation, is just a basic kind of utilization or usage by the
individual dataset. A kind of hit, download, maybe or something like that. If
there are three of these that are accounting for 90 percent of the usage, that
would be very informative in terms of being able to make an assessment.
DR. CARR: Jim, are you able to stay around?
DR. SONDIK: Can I say something? Let me just add something to Jim’s really
excellent description of this. There is another level here of data that I don’t
want the committee to lose sight of. Really, these tools focus on the
aggregated, the indicators. A lot of this comes from raw stuff.
There are two kinds of raw stuff. This is my view, it is a technical term,
raw stuff. There is the vital statistics, okay, which really is the listing of
births and the listing of deaths, and it is as complete as it can be. Then,
there is what we get from the surveys. The surveys is the really tough nut to
crack here, because what you asked for is really hard to describe, when you
asked for the diagram.
DR. ROSENTHAL: It is a specific technical entity relationship diagram. If
you have a database, you do have an ERD somewhere.
DR. SONDIK: I don’t know how to do that for a survey. I am sure if we sat
down, we could do it in two minutes or so. The thing is that the survey, you
get the data because you have used let’s say a statistical technology beyond
methodology really, to identify who you are going to bring in. Each one of
those people, if you are surveying people, you can describe that individual in
terms of what they represent, in terms of that diagram that you were getting
at. They are a county, they are a block, they are a census, you know what I am
DR. ROSENTHAL: Yes, it is actually not a mini to mini. It could be as simple
as saying, what states belong to a region, that is the type of thing I am
talking about. If you are coming into this site for the first time, you don’t
know anything about the data. You see region up there, my first question is
which states belong to a region. That, in terms of a diagram, is what I had
expected to find as a developer. Say West Coast region includes.
DR. SONDIK: I am just saying that a lot of that thinking, when you look at
an indicator, like the indicators in Healthy People, that thinking has already
taken place, and that can be done. When I look at a survey, it is not so easy
to do that. It is there, but it is many things. It depends on how I combine
that data, that raw data, and how to combine the raw data, the 5000 individual,
very, very long records, but there is only 5000 of them, in one year of HANES.
There are many ways to combine. Do you agree with what I am saying? There are
many ways. You can disagree if you want. There are many ways to combine that.
The same goes true the 125,000 in HIS. That is a tremendous source of data.
Actually, if we could get a few apps that start to use this, it will become
clearer to the applications community as to what can be done. There is a
challenge with this. I don’t know of any apps at this point that go into any of
the survey data.
DR. VAUGHAN: What would be your top three most important ones that you would
want to have apps developed to?
DR. SONDIK: Well, I mean, there is the HIS, which is the core dataset for
the department, a wide variety of information on everything from prevention to
services that people are receiving. It is a nationally representative piece.
That differs, in a sense, from what you get from CMS, because CMS says this is
exactly what is going on, although there are surveys in there, too, that say
what is going on. It is that survey thing that I think is really a challenge.
There is one other thing that I wanted to put in front of the committee,
that is interesting. We did a briefing yesterday for Hill staffers. Somebody
came up and said, I think this thing you got is really terrific, so it is very
much on my mind. It is the tutorial that NHANES has. The tutorial is extensive,
very extensive. It is not a two-minute deal, it is an extensive tutorial on how
you get the data, how you analyze the data. It is actually award winning, and
people can get credit for this.
Because it is so extensive, it is not as accessible as perhaps it could be.
There is a possible application. It is not the usual app, what I would think of
as the usual app. It is not just in front of you, per se, but the committee,
because this is part of what we all do. We all being the people who produce
this core data. We aren’t as facile and as creative as we can be, if we can
liberate this. The stuff I think we need to liberate is not only the data that
has been massaged, and what I would call indicators, but what leads into that.
DR. CARR: I think it illustrates exactly why we have this meeting, because
we have questions that we didn’t know existed. He can use it, but you can’t use
it, what you need, he might need to make.
DR. ROSENTHAL: I think we might be speaking that kind of cross purpose. Just
so you know, my Fulbright post doctoral was on qualitative data, so I have a
reasonable approach to that. From a development point of view, it is kind of
80/20. For every database, there exists an entity relationship diagram. I
absolutely guarantee it. That is cold, hard facts. You may debate about the
nature of the relationships, but there is something behind that.
If you are wondering why there are no applications being developed, first
thing to look at is market utility. Is any developer able to create anything
valuable out of that, which is an interesting question. The second is, how easy
is the access. The first thing a developer is going to look at is some form. It
can be sifted, it can be only 20 percent of it, it can be 80 percent of it. Any
basic metadata relationships, before I am even going to bother, because
otherwise, I click on that and I don’t go any further.
DR. CARR: I think this is exactly the kinds of stuff that we will begin to
do. Let’s hear from Allison.
DR. SONDIK: I want to give just a rejoinder to that. If you look at
genetics, there are a lot of tools that have been produced that nobody around
this table is going to use. Those tools are absolutely essential to the people
who are doing genetics research.
In a sense, I have that kind of thing in mind, I think, when I talk about
these core survey activities, whether they are what we do or what SAMHSA is
doing or the many surveys that CDC has underway. I don’t know how ready they
are for the public.
DR. COHEN: I think they are. I disagree with you. I can see many uses, and
this is a much longer conversation. If I am a woman and I just got pregnant, I
am going to want to know what the C section rates are for a 35-year old woman
with my particular risk profile. I would love to be able to go to an app that
would combine the hospital discharge data and the birth data for my city or my
community, and it would tell me, this place is good because, this place is bad
because. All of those data exist in different places. I think the surveys can
provide that kind of information, too.
DR. CARR: I want to make sure that we have time to get a scenario, to do
just that. We have talked about that this morning, and want to hear what Josh
has to say, too. To say, here is an issue. How would we populate this with the
information, where would we go? Hold your thoughts. Allison, please.
MS. OELSCHLAEGER: I am Allison Oelschlaeger. I am from the Centers for
Medicare and Medicaid Services, our new Office of Infomraiotn Products and Data
Analysis that was launched, as I am sure many of you know, back at the
Datapalooza in the spring, late spring, early summer.
It has been a whirlwind in the past couple of months, getting up to speed as
the new office and really starting to think about the chart of our office is
supporting both internal and external data users, CMS data users. Really,
having a focus on data and the people who are using data out there in the
world, and inside of CMS. There really hasn’t been a focus at that level, until
we announced this new office.
I am going to walk through a couple of the various data tools that we have
out, starting with the Health Indicators Warehouse. I am actually going to
stick to the web and show you some of the various things that we have on
Healthdata.gov, look at the Blue Button Initiative, which is a cool thing that
we are working on right now. Then, talk a little bit about how we are updating
and improving our process for sharing actual claim level data with researchers,
with states, that kind of thing.
Starting with the Health Indicators Warehouse, and for the people in the
audience who aren’t here, I am at HealthIndicators.gov. CMS has a specific kind
of way that we present our data in the Health Indicators Warehouse that is a
little bit different from how the other data is presented. We like to look at
our data in specific report, so taking indicators that are related and having a
report that shows all of them across a geographic area.
I am on the resources tab, and down at the bottom is the CMS indicators. We
have our methodology paper that you can click here and kind of read about our
various population and the things that we are doing to clean the data up. Then,
we have our various reports. I am going to jump into the hospital and patient
report, just to give you an example.
This is Alabama. One of the things that people are always interested in, and
there will be more interesting data as we have trends in here, which will be
loaded very soon, inpatient admissions per thousand beneficiaries. Alabama is
obviously above the national average. When you click in and look at the HRRs,
let’s pick Tuscaloosa, it’s even higher, 436.8 admissions.
What if we want to say, well, how does Tuscaloosa look compared to the areas
around it? You can click through here and look at the data itself for just this
indicator and jump to a map. Obviously, one of the things that you always have
to think about when you are looking at individual indicators is the various
interactions that it has with different populations. For example, maybe Alabama
has a lot of dual-eligible beneficiaries who have higher resource needs than
the general Medicare population. This is just one way of looking at indicators.
It is not comparing across various sets.
Looking at the hospital referral regions, you will see down here that, here
is Tuscaloosa, and it is surrounded by HRRs that also have higher inpatient
admission rates per thousand beneficiaries. It helps you start thinking about
targeting resources, or looking at causes for that in this area. We think that
the Health Indicators Warehouse is a great tool, and we are really excited to
see our trend data get up here, because it is even more information for people
to start using, and putting into apps and that kind of thing.
I am going to jump over to Healthdata.gov. You can get to the Health
Indicators Warehouse from Healthdata.gov. One of the interesting things that
we’re just starting to come out with is called Basic Standalone PUFs. I heard
you guys talking a little bit earlier about kind of privacy and HIPPA, and
thinking about how do we release clean level data, while worrying about privacy
and making sure that we are not releasing beneficiary-identifiable information.
This is kind of CMS’ first crack at doing that. These are public use files.
They are available for download on the CMS website. What we have done is we
have worked with a contractor to strip them of all identifiable information. It
is still actual claims. It is not aggregated information. It takes things like
age puts it into buckets. It reduces ICD9 codes to three digits instead of
five. It is keeping hopefully a lot of the information that you need, while
allowing you to download claims data, but also protecting beneficiary privacy.
If you click into these and you are not impressed, which was my initial kind
of perspective, I think you have to remember that HIPPA has a lot of really
strict rules around cleaning data, to get rid of identifiable information. We
really have to be careful and make sure we are not releasing beneficiary
DR. COHEN: Allison, does this have the three digit ZIP or what geographic
MS. OELSCHLAEGER: I am actually not sure what the answer to that is, but if
we click, I think it will tell us. There is some geographic information, I
think. Each of the basic standalone claims have their own kind of rules around
cleaning things. This one, it doesn’t look like it has geography, but it does
have the age category, sex, ICD9 code. Inpatient may have some of that
DR. FRANCIS: Could I just ask you what the de-identification methodology is
that you are using? Is it the Safe Harbor?
MS. OELSCHLAEGER: It is not the Safe Harbor. We are actually going through
statistical review with our contractor. They are both masking, kind of creating
groups and categories to mask things, and then also making sure that there
aren’t any cells where you could aggregate and have it count if less than 11
beneficiaries or providers.
DR. FRANCIS: Who is the contractor?
MS. OELSCHLAEGER: NORC, yes. It looks like we are not doing geography right
MR. SCANLON: Is that the national file, all of the national level claims?
There’s no geography.
DR. FRANCIS: You can’t get it by state or by region?
MS. OELSCHAEGER: Right, it is a 5 percent file. The other thing that I
wanted to show you on Healthdata.gov is our Medicare and Medicaid statistical
supplement. That is in the Medicare box down at the bottom, and it is the
fourth one down here. This is something that I don’t think all that many people
know about, but is a really great resource if you are looking for basic
statistical information on the Medicare/Medicaid program. It has things like
enrollment by state, enrollment by eligibility type, all in individual Excel
tables that you can go in and look at.
One of the things that our office is trying to do is improve the way that we
start sharing data. Right now, this is just Excel tables that you have to
download individually, but more and more, we are trying to think about how do
we move into the iTools or other ways to share this data that gives people more
access and makes it easier to use things.
Then, the final thing I wanted to show you on Healthdata.gov is the compare
tools. Medicare has a rate set of compare tools. Probably the one that is the
best known is Hospital Compare, and that has been around for a while. Now, we
have Nursing Home Compare, Dialysis Compare, and that data actually sits on
Medicaredata.gov. Let’s go over here.
DR. VAUGHAN: Allison, what is the recent year for this?
MS. OELSCHLAEGER: For compare data? I actually don’t know the answer to
that. It might be ’10, but I can check and get back to you. Hospital Compare,
Nursing Home Compare, Home Health Compare, and these are tools that allow
people to come in and actually look at providers and see how they are doing,
and compare across. In their geographic area, for example, what are the
different providers that they could use and what are the comparisons on various
indicators of quality. Medicare.gov has a setup that allows you to
automatically do things like filter and visualize the data yourself.
I am going to jump over to MyMedicare.gov. Another initiative that we are
working on is the Blue Button Initiative. This is a way to share claims data
with the beneficiaries. One of the most recent changes that we have made to the
Blue Button is to expand the amount of data available to beneficiaries. They
used to only be able to get one year of A&B claims data. Now, they are
getting three years of AB data and one year of part D data, which is a great
Scrolling down, it is right here on the main page. You can click Blue
Button, download my data, and it opens up a website that allows you to read
about the Blue Button program. You can actually go ahead and download your
data. For example, at Datapalooza, we saw a developer who is working with Blue
Button data and helping patients start to use their data in more interactive
ways on their iPhone or Android or whatever, and actually share the data with
their providers, and then have the providers share the data back with them.
There is a lot of tools, and this is not only a Medicare program, but also a VA
program, which is where it started, and also some of the FHB plans are starting
to do this, as well.
Then, the final thing that I wanted to talk about with you guys was the
Research Data Assistance Center. This is our external facing group to help
people who are interested in actually getting claims data, or Medicare current
beneficiary survey data, identifiable data. They are going to sign a DUA, they
are going to promise us that they are not going to share the data or sell it,
or use it in incorrect ways.
ResDAC just designed their website. It used to be kind of old format. It
looked like you were going into an university webpage from the early 2000s. We
are really excited about this new website, and all of the kind of improvements
that they have made in communicating with researchers and communicating with
states, and making sure states can come in and get Medicare data and can share
Medicare data within the various agencies in the state.
Going back to your earlier point on having a program for people to go look
at quality of providers, in the winter, we announced the Medicare Qualified
Entity Program. That is a program, qualified entities come in through ResDAC to
request the data. It is a program where you can get Medicare data specifically
for public provider performance reporting. We are really excited about that
opportunity. It gives us a new way to share our data with people who are going
to use it for good purposes. They have to combine it with other claims data, so
you are going to have one report that kind of covers the provider’s practice,
Medicare, Medicaid, private plans. That is an exciting thing, too.
I want to give you guys a chance to ask me any questions.
DR. FRANCIS: I have a question about your data use agreements. Actually,
first of all, I would love for us to be able to get a copy of it. Secondly, I
am curious about whether you do anything with respect to following up on
whether people comply, any spot-checking, anything of that sort. And what the
penalty would be if somebody broke one.
MS. OELSCHLAEGER: That is a great question. Our data use agreement is
available publically, but I can also make sure that the working group gets a
copy of it. There are penalties in the DUA for violations of the agreement with
CMS. In a lot of cases, I think we have found that most of the people who
violate the DUA don’t mean to. They are not selling the data or whatever, so we
try to work with people to make sure that we are not penalizing them for doing
something that they are not necessarily at fault for.
In terms of follow-up, I think that is something that CMS is kind of still
trying to figure out the best way to make sure that we are not. It takes a lot
of resources to follow up and make sure that you are tracking people. We, so
far, have only shared our data with kind of trusted academic researchers.
As we move towards more data sharing, one of the things that we are
considering is a data enclave model, so giving people virtual access to
Medicare data in an enclave. That gives us a little more control over the data,
making sure that they are only using it for purposes that we kind of think are
valid, and making sure that they are not running off and doing something else
with it. I think that is a good solution, moving forward, and we have already
started piloting an enclave with 200 users.
MR. SCANLON: Allison, on the follow-up question, ASPE, we actually did a
joint project with CMS privacy group, and designed a program for sample audit
follow-up. It is just a pilot basis to see if it works. They might have
actually tried that, Allison, I am not sure. It was basically not just
complaint-driven, but to actually follow-up proactively on a sample of some of
the data holders.
As Allison said, it is usually something like the researcher added two
graduate students to the access list and didn’t tell us. It is usually not more
serious that than, but it would have come to our attention.
MS. OELSCHLAEGER: If you are really interested in how we are approaching
privacy, there is a whole privacy group at CMS that knows a lot more about this
than I do. It would be great to have them come in and talk to you.
DR. MAYS: Is there a tutorial or a place that you can go to? What I am
thinking about is, in terms of other academics and students, et cetera, getting
access, is there some place you go and it would give you a kind of a sense of
all of the datasets, or would I have to kind of know the different things?
MS. OELSCHLAEGER: In term of beneficiary level information, if you want to
do your own research, or in terms of kind of the various data that information
products that CMS makes available?
DR. MAYS: Yes.
MS. OELSCHLAEGER: There isn’t. We are announcing very, very soon, there will
be a data navigator function on the CMS website. It is pretty much ready to
roll out, but we are just finishing up a couple of things. Our website is
notoriously difficult to use and to find things on, so the data navigator will
be a great tool to help you find the specific, if you are looking for Medicare
MR. SCANLON: ResDAC actually has got a little ways towards that. It shows
you all of the datasets they have.
MR. CROWLEY: Do you have any mechanisms in place to understand how the users
that are coming to you are making use of the data, or cataloging their needs
and wants from the data?
MS. OELSCHLAEGER: I don’t know the answer to that question. When you guys
asked Jim, I was thinking we need to figure it out if we don’t have it. The CMS
website is not something that our office manages, but the office communications
that does manage it probably does collect that information. If they don’t, they
MS. QUEEN: There is a CMS privacy board, and so when things are requested,
restricted data is requested, there is a whole protocol that gets submitted,
that you know exactly how the data are going to be used by whom and for what
MS. OELSCHLAEGER: When researchers come and request underlying claims data,
we do collect that information. The statistical supplement, how many people are
going on there and downloading tables on Medicare enrollment, I don’t know if
we actually are collecting that kind of information.
MR. CROWLEY: As a researcher, it is useful to understand how other people
are making use of certain datasets, from techniques to methodologies, to the
extent that they are willing to share.
MS. OELSCHLAEGER: ResDAC is doing a great job of starting to think about
ways to share that information. They are going to soon have a page up that will
allow researchers to come in and say, here is what I am doing with the data.
Here is where I have published, and then that will be available to other
researchers, as they come in and think about, okay, here is what other people
have been doing with this data. What can I do to add to that?
DR. KAUSHAL: What is the time lag between updating the data? For example,
claims data, as an example.
MS. OELSCHLAEGER: Meaning for access to researchers or public products,
DR. KAUSHAL: Yes, I’m sure there is a range.
MS. OELSCHLAEGER: Yes, there is a range. Historically, CMS has been almost
at like a two-year lag, in terms of getting data out publically. Right now, we
are at probably a year lag for public data, so for kind of stuff that we are
getting out and sharing, maybe a little bit more than that. As Jim said, we are
updating the health indicators warehouse with 2011 data now.
Six months to a year is when we kind of feel like we can start to get data
out publically. For researchers and for qualified entities, we are starting to
move more towards a six-month lag. Between five and six months is when we
really see the data starting to settle down, and we are doing more and more
work, trying to figure out when can we actually share data, and what is the
earliest point that we can get data out there.
DR. KAUSHAL: The aspiration is to cut that six months down to even less?
MS. OELSHLAEGER: It is, except that we are really relying on providers
submitting claims. When they don’t submit the claims and the data is not
complete, we really can’t do anything about that. I guess the other option is
to tell people the data is not complete, and it is up to you to figure out.
DR. CARR: That is what we are doing with the Pioneer program, getting
MS. OELSCHLAEGER: Yes, as we are starting to communicate with providers, we
are giving them even more up to date data.
MR. DAVENHALL: I want to make a comment. I wish Walter were here, because
this is a standards issue. Why I want to point this out to you is that, one of
the other problems we have to address is there is no standards in how people
are setting up these websites. We saw the National Center, and they have this
look and feel. If you want people to really start to use your data, you have to
start to worry about being consistent across the board, as to what these things
look like. Otherwise, people go to these sites, spend most of their time trying
to figure out where this stuff is.
If you look at this site, this is, in my personal opinion, the richest, best
site of all of the sites that we have been talking about here today. I would
ask you to go to that one and play with it, and download that file she has out
there, the provider number. Now, this is one of the most difficult crosswalk
files to find in Medicare. Right, Allison?
This has every hospital and provider, and has their Medicare provider
number. If you have that Medicare provider number, you can link that to a whole
bunch of interesting statistics from intensity rates in hospitals to DRG
adjustment rates and so forth. You have got to have this crosswalk file
available. You can download that.
If you look across this tab, right here, you can’t read it from where you
are here. It is the most fully functioning set of tabs for anybody to get ahold
of data out of CMS. I am just proposing that, as we go down this road, we think
about using something like this as a model. It was immediately obvious what you
are going to be able to get from that site, where the other places, it was
always, can I download from this site, can I export, can I share, and that kind
The other thing that I wanted to mention while Jim is still here, tell Ed
there are people out there who are using his data in the way he said he has
never seen it. We need to find a way to bring him up to speed on that. I would
say 50 percent of the hospitals in this country are using that survey data in a
way that he would be shocked.
MR. SCANLON: Can I ask Allison, so this is institutional provider? It is not
individual professional provider?
MS. OELSCHLAEGER: We don’t make that available, although the qualified
entity program is moving in that direction. As we start to announce QEs, that
is pretty much what they will be doing.
MR. DAVENHALL: There is a file called the hospital service area file, by the
way, which now you put data out in six months. That file has 2011 data in it,
of every dollar that Medicare paid into a ZIP Code. It tells you how many days,
how many cases and so forth. It provides a provider ID. It is totally worthless
to you, unless you have this file, to tell you who that provider is.
I would say some of these things, as we think about, as we do our work, how
do we make these crosswalk files? Could there be another file there, when you
go to that other hospital service area file, that says, oh, just link here and
you will get this crosswalk file, kind of thing. I really want to compliment
you on this site. I would say that if more of the sites had that kind of look
and feel, we would all get to it easier.
DR. ROSENTHAL: If you do that taxonomy work or see the entity relationship
diagram, that stuff will pop out. You will say, oh, we don’t have that, and
that will be your roadmap for your metadata development, fi you will, to say,
here is where we need to build something, as Bill said.
DR. CARR: Josh, did you want to come forward and go through?
MS. GREENBERG: I know Allison mentioned about the data use agreements, et
cetera. You were pretty much talking about public use data tapes. I don’t know
if everyone on the group knows about our research data center, where we make
more access to more restricted data. I just thought maybe you could just say a
few words about that, and then maybe, in a future meeting, they might want to
hear more about that.
MR. CRAVER: I mentioned our RDC, our Research Data Center. It might not be a
bad idea to have Peter Meyer give a quick tour of the concept. The Research
Data Center is a facility really that is under lock and key, and restricted
access part of NCHS as a physical space that researchers can submit proposals
to, and if accepted, come to our place and have the access to restricted
datasets and datasets that are non-public, and do their research on it.
There are a couple of caveats to that. One, it costs money, because you are
using our facility and our staff to do that. We have to set up the files for
you, so that you can have access to it, and we also do disclosure review of
your resulting files, so that we are not releasing into the wild something that
legislation should. There are non-trivial reasons why we care about that. One
of them is if we do disclose data, we, as individuals, are on the hook for
hundreds of thousands of dollars, and many years of our lives in jail.
DR. COHEN: It is important to know that, not only NCHS data, but we actually
are going to be using the RDC as a data enclave, to store data from another
part of CDC that we are linking with state data. There is no way that we could
get it out of CDC, because of the agreement that CDC made when they actually
collected the data. You don’t have to go to lovely downtown Hyattsville to use
this. Remote access is a key for using the data at the RDC. It is a really
powerful option that really hasn’t been explored fully for using confidential
MS. QUEEN: Will the CMS data enclave be like that, be like an RDC?
DR. COHEN: Yes. The census uses, as well, a portion of data enclaves.
MR. CRAVER: At NCHS, we have ongoing and developing relationships, new
relationships with census. NCHS, RDC now installed in Atlanta, there were
recent discussions at a university in a laboring state to install a new RDC
that would have census data, as well as NCHS data. It might be the time for us
to discuss with CMS.
DR. VAUGHAN: Jim, is it now possible to align CDRC data with RDC, or is that
a special super security? Is it possible to unify that under one secure roof at
MR. CRAVER: I think that that is happening. I think that the efforts that
need to take place are in process.
DR. CARR: We have an hour and four minutes remaining. This has been very,
very helpful, very rich. I would like to hear from Josh, and then hopefully
have 45 minutes or so to talk about what would be the next thing that we might
want to do, to begin kind of getting a feel for what is available, how it
works, what the issues are, et cetera.
DR. ROSENTHAL: I had from a previous meeting and this meeting, I just
decided to put together some slides instead of put it in the Wiki. I had
roughly speaking, and these were some suggestions and conversations I had with
Todd way back when.
I had about seven very basic recommendations. Some of them are crazy and
challenging, some of them are kind of common sense. I know Greg and company and
Allison and John are already working on some of them. The specific
recommendations, where to start, would be taxonomy, communal learning center,
baking in business or market value into the challenges and contests, files
accessible, the NORC IMPACT stuff you are talking about, I would also add
synthetic files, completely synthetic for priming and loading systems, as well
as just getting rid. I know you might already be doing that. Just leave it out
there for the whole committee, so everyone is aware of it.
Data browsers, which are very different, Google Public Data, TABLO data,
where you don’t have data available to people. You just allow them to do
analysis on the fly. This is common in the web world. Some of your data was
actually used by TABLO and ReadWriteWeb at a contest, half a million people
used the thing. Someone won, a young girl, for diabetes, comorbidity. It was
fantastic, your data was being used. It was in this browser, they could
instantly ask questions.
Then, partnerships and product development, we got into some of that. That
will take me into basic web utilization. This is Google analytics stuff, so you
get a sense of who is doing what. Opt-in, which is completely crazy, but
consider it like green button for people who want to have their data shared.
Don’t underestimate that, as everyone knows from Blue Button.
What is taxonomy, just really quickly so everyone knows what we are talking
about. It is not just metadata or talking about it or putting out sentences
about it. It is showing the relationships between the data. Before you do
anything else, before we get into it, you have to have a taxonomy, and you have
it one way or the other. It is there. I can reference internal or external
Let me give you an example. I am looking at this file from CMS. I download
the thing and this is what I see. I am like, all right. I am a developer with
no health care experience, now what am I going to do. What are these things?
Well, I can go and maybe I can find it in a metadata catalog, but that
doesn’t tell me anything I need to know to develop an application. I have to
sift, bit through bit, and be able to reconstruct the relationships. It turns
out that plan belongs to contract. I mean, you get into kind of mini to mini,
it belongs to organization, belongs to parent organizations.
Taxonomy defines business entities and the relationships between them.
Parent orgs have orgs which have contracts which cover plans. Then, there are
attributes, right? Contract number, legal entity, this little cell on the
spreadsheet, what does that belong to? That will kill a developer coming in,
who has no familiar with it. They expect to see that sort of thing.
Anywhere else outside of health care, you see it. You see it in government
geolocation, you see in weather. All of the great success stories we have
talked about have this. If we are wondering why it hadn’t picked up, if we are
only hitting 1000 of user downloads, this is really, really important stuff.
There is sort of an expectation in it. It doesn’t have to be. I understand the
privacy thing. It can literally be what states belong to a region. It can be
that level, same thing applies, 80/20.
If you create all of this, you have CMS doing their data products and tools,
Niles’ office, you have researchers doing their things. You have people in the
commercial space, who will build applications, if they can create market value,
myself being one of them.
Inside your extract system, call it nonclave, you need a public learning
center, I will get to you. The center of all of that is taxonomy, the business
relationships of the entities. If you want anyone to build something cool up
here, it is very, very difficult to ask them to kind of knit by hand all of
those relationships. I just gave you one example of a file. Thousands of them,
to be able to pull together a reasonable application. That is taxonomy.
Learning center, I will let you read on your own time. I want to be able to
see what other developers are doing, and actually how some access, Bryan talked
out speaking with the data experts. We could do that at little roundtables and
committees, but actually having an interactive community to do that would be
really, really helpful.
You know what would be really helpful? Sharing the web analytics, not only
so you have them, but so that when I walk into it and I see 500 files, I say,
what is the most important one? What file is everyone going to, right? I know
that Bill is a super user because he has 5000 posts and five stars and five
recommendations. When he says that particular crosswalk file is the most
important thing, I am going to listen to him. That sort of information, doing
that in a scalable way.
I might suggest, and even Bryan mentioned it, this doesn’t have to be kind
of big budget, cost-built internally. You are hosting various challenges and
spending money for this or that. Point it towards the core infrastructure, and
you will get the developers themselves actually doing this. Learning center,
blah blah blah, all of your videos, blah blah blah, okay, here is what it might
look like. I know they are working on this. You have different things in there,
For the challenges, you track an internal database, or you look at crunch
base, in the health startups. This is some of my work and I have had a couple
of successful ones myself. They fail at an astronomical rate, far outside peer
play technology. Why is that? Legacy guys don’t want to retool, fine. The young
MIT and Harvard kids come in, and they can’t navigate perverse incentives, they
can’t figure out the market layout. They don’t know the business case of the
data they are looking at or how to use it. They built WheatCracker 2000. If you
guys want to steal this app, you can. How fat am I today, a pig, a hippo or a
Jabba the Hut, right? This is what they come up and they try to sell it at DTC
and it doesn’t work.
If you are building a challenge and you are saying, I am going to spend
money and reward someone for building an application, one of the criteria for
building that application or product should be having viable market value. I am
not talking about a whole business case. Todd referenced the HDI analytics and
data session I put on at HDI, and where I had a little tiny Mad Lib template,
where people had to fill it out, and very well reviewed. That was just
literally just forcing them to say, what is the market, what is the product,
why would they buy this, what are the challenges, what are the opportunities.
Doing a little bit of kind of business thinking around the data, I would
humbly submit would astronomically increase the success rate in these
challenges. By the way, if you have a learning center, you can put all that
information up there so I can see. I see that he says this and it is working
well for him. I say, oh, that is a really interesting use of that data. I never
thought about using it that way. Then, we can build on one another, and that is
sort of what you try.
DR. KAUSHAL: What do you see as our role in teaching that business case?
DR. ROSENTHAL: I am going to humbly refrain from answering that question
now. What I am going to say is that I think, however we do it, if you want to
crowd source it, if you want to experts, personally, I think multiple
perspectives, I would love to have James, I would love to have the West
Wireless fellow talking about some of that. I would love to have some of the
challenge winners talking about that.
By the way, if you are spending dollars internally to kind of track, I might
look at the companies that have successfully done this and had exits, which are
few and far between. You could have a multiplicity of perspectives and uses.
MR. CROWLEY: Essentially, you can wait the scoring of the challenge to have
a model as one of those factors. We do this when we run challenges at the
business school. We will bring some engineering students, get the business
school students, public health students, and it is really helpful. Will you
bring in either a resident or the business expert residents, sort of throw them
DR. ROSENTHAL: If you capture that and keep it as part of it, and now I go
to the public website and I cannot just see the data, I can see the
relationships of the data and this taxonomy, business relationships. I can see
what people have submitted and how they are trying to use the data for
different business issues. That is how you develop and create an internet time.
DR. KAUSHAL: There is a supply side of data and then there is the demand
side. Our role in this working group, my assumption was a little bit more on
the supply side, but would love clarification. I think that you need both.
DR. ROSENTHAL: I am just throwing out it would be good to get both.
Basically, before we even start saying what is more important, where should we
focus, there is a demand issue. I might humbly suggest that a little tiny form,
in terms of demand and what you would like, with some basic categorization. The
stuff you consider, like the iterative quick form, as well as just the basic
demand issue, who is using what. Let me just get through and then we can go on
anyway, because I have a couple of more.
If I actually win this thing, I am talking about it, and by the way, this
has worked really well in other instances. You are doing public data, blah blah
blah, perhaps offer some synthetic or fully synthetic. That is really nice for
reasons you will get into. A lot of the big folks, if I want to build an
application, which I did, and prime the system, I need something. I don’t need
security clearance, I just need fake, made up data in your structures. You will
hear the big analytic vendors and big payers say that sort of stuff. You have
already created it.
If you did that in a very specific way, you might even be able to kind of
walk around a good deal of the demand for privacy. Pure synthetic files, if it
has a specific type of meaning intact. Here is someone at Caltech. She isn’t in
health care whatsoever, no affiliation with what you are hearing earlier. They
use the stuff for privacy, for credit cards, pure synthetic creation that
retains a specific analytic usage. It is done under industries that are out
Push the ERD, here is the data browser. If I go to Google Public Data, I
will actually be able to see all of the wonderful data and I will be able to
move things around, and click and ask questions. This gets tremendous usage. If
you want to engage the broader community, who is not even data savvy, but the
tech business students who might want to get in and ask, what might I want to
solve in the health care, starting from the business side, before I hit the
data. This is tremendously valuable. These are data explorers, free, Google has
one, TABLO has a public one. I mentioned they did a ReadWriteWeb one, there is
a Google one.
It got like half a million hits, half a million uses in their contest,
right? The contest was they put up a little bit of data, some of your stuff
publically available, about a half a million hits. A young girl wins it with
comorbidity of diabetes. Why does it generate that usage? She doesn’t have to
understand the data structures. One of the people actually put it together, and
then it allows everyone to play and ask the questions. No data leaves the
environment. It is all de-identification, aggregate. That is very good partner
usage, basic web analytics, demand and supply is obviously a question, but just
somewhere to start with.
Partner publicized monitor, and then you can actually allocate resources and
build based on the most popular stuff. I did that with Todd, so it’s kind of a
joke. You do a photo gallery of Todd and it is the most popular thing.
Finally, Opt-In. Blue Button was crazy, no one will ever use Blue Button,
right? You guys remember all of the talk before it went out, 10,000 uses or
something like that. What is it, 10 million now? How about just a crazy idea of
allowing individuals, like myself, there might be some others, who want to
share their data for specific usage. I want to Opt-In and say, please use my
data for research. Call it a Green Button and see what happens.
Obviously, you have to work out all of that, but this is another kind of
creative way to attempt to get around the privacy. If I say I want the Opt-In
for limited usage in a crafted way, if you get even a very small percentage,
and if you get something like Blue Button, you might be surprised. All of a
sudden, you are creating another very rich source of data. That is it, so thank
you very much for the few minutes.
DR. CARR: Jim, did you want to say anything?
MR. SCANLON: No. I think we are at the point now where we just do some
brainstorming here and what do you think would be a reasonable next step.
DR. VAUGHAN: One thing I would also suggest, too, is there are use cases,
and looking into some of the things we just talked about over the last couple
of days. The products that you are looking for don’t just have to do with
building private businesses, that part of the audience and part of the customer
base is public health departments, non-profits. Their use case and their
outcomes and their products and their needs and engagement issues are going to
be different from a developer oftentimes. They may be far more data literate
than somebody who is trying to approach it from the point of view of, I don’t
have to know any data.
I think that also we shouldn’t lose sight of what is a very rich public
health heritage, and that people have been donating their data for many, many
decades, in the Washington County study, the Alameda County study, in framing
in the national nurses study. Much of that is with privacy protection, but has
offered a very, very rich template upon which we have learned a great deal.
Perhaps move those forward in ways that make it more accessible, I think, is
not going back to reinvent the wheel, but use what we know works really, really
MR. SCANLON: I think in terms of developers, the motive is not the interest.
It is more, if there are public health person apps, that is fine. If they are
for profit, whatever they are, I think we have experience at reaching some of
those audiences, and we have formal ways audiences. This community, we don’t.
We don’t have much experience and obviously we need your ideas about how to
get those two communities together. We have the auspices of a federal advisory
committee. Bring people in, whatever they would be able to say publically, we
could do that. If there are some questions we should pursue, in terms of
environmental scan a little more. I think you are right. I think what we will
do is look at the usage statistics. It might be embarrassing, because you have
DR. VAUGHAN: There might be an opportunity to say, well, given what we know,
we have use statistics now. What would be helpful and useful moving forward, if
we found those that weren’t quite what we might need.
MR. SCANLON: If you could give us ideas about the metadata, here we are, we
just renewed and revised the metadata for the Healthdata.gov. If we are missing
a couple of more items, obviously you can’t have infinite metadata. No one
would ever put up that set. If there is a tendency toward a standard or at
least a core that would make it available, not only for what we have to do in a
government agency, but what would make it easier for developers, that would be
DR. VAUGHAN: I guess to not discount that there the potential community of
developers is far broader than you might imagine. To try to think out of the
box, so that we are being as inclusive as possible, because I hate to think of
these very rich domain experts, who are in struggling health departments all
around the country. What are those people doing, when they really should be
tapped and brought to the table.
MR. SCANLON: I would think all 3000 counties shouldn’t have to do
independently a decent set of community health indicators or population
analysis. That is more than just the data. That is what we do in the U.S.,
everybody does it, that is an obvious. Everyone shouldn’t have to do it. If
someone makes money doing it, too. Otherwise, you have to be an aficionado. You
literally have to know who does this, how do I find this, how do I pull
together. There are aficionado.
DR. ROSENTHAL: Even if you are an aficionado, I will speak for decades worth
of Dartmouth atlas, I still as an aficionado say, where is the ERD. It is not
mutually exclusive, it benefits everyone.
MR. SCANLON: That raises more questions than it answers.
DR. ROSENTHAL: Even the super savvy aficionados, basic kind of community
sharing, if government doesn’t want to share, if you want to say actually, I
don’t want to share this, or there are security reasons, or we don’t have an
official taxonomy, which we don’t, there are multiple ones floating around. If
you take kind of the community development perspective, I share mine with other
folks in the community, and they share theirs with me. It would be really nice
if we could post that somewhere, where other people can share that, as well.
DR. COHEN: I think the sharing issue, we can certainly figure out how to get
past that. I don’t see that as an impediment to doing what we want to do. I see
the impediment from my point of view is, we are the data holders, we understand
this really incredible set of very complex and data that we think is useful for
certain things. We don’t know how it resonates to the real world, because we
are these geeks who focus our entire lives on making the distinction between 37
weeks and 38 weeks gestation.
Not only do I need to learn to speak your language about the kinds of stuff
that you need to develop the apps, I need another perspective to ground my
distorted reality about how these data can be used and applied in ways that the
folks that are actually developing these applications know resonate with
communities who might want to use this. I am asking just as much for guidance
in that area.
I am having trouble thinking, I could name a million business applications
that I think would be cool. I don’t know whether anybody else would care. That
is the kind of feedback that would help me figure out how we can best get you
the information that you need to do your development. That is where I am coming
MR. CROWLEY: Basically, that question that you just asked, Bruce, there is
not enough staff or time in the world to curate and catalog all of the
potential uses of this data and find out people’s needs. As was discussed, and
has been discussed in different ways, by virtue of a learning community with
certain social features, to allow people to have that conversation, for you to
also engage in that conversation with them, as they had their questions, you
bring their expertise and have it in a community-mediated way that is
accessible to others, then that creates those answers to those questions.
DR. COHEN: That is a great idea. Just say, here is all the stuff we have.
What can you do with it and what would be valuable for you, or people like you,
and begin that conversation. Perhaps, priorities would emerge.
I think we have some basic usage statistics we have for our web-based query.
We see what reports people like to generate and stuff like that. It is the
people we don’t reach. We have a cult community of data users, because our data
has never been free enough for people to get access to it. It is the people who
don’t use the data who we want to be using the data, not the usual suspects.
I think learning community is the way to engage folks. Show them what we
have, in a way that they can understand it, and get the feedback so you can
develop the tools that they can use to meet their needs.
MR. CROWLEY: You might want to consider incentivizing in some way. There is
a community willingness to learn an approach, but maybe run a couple of pilots.
Some of the challenges that have already been used for leveraging those
mechanisms, to build that into the community. There is some additional quip or
quo for participating.
DR. CARR: Okay. Now, it is time to land the meeting, at least for our next
thing. I am trying to keep up with you all here. We talked a bit, there are
clearly infinite numbers of ways to approach it. There is a need for us to
articulate what is it, what are we doing to do. Is it that we are going to talk
about, as you said, the uses of the data, the usability of the data, the
applications. I think all are valid, but we have got to figure out one.
I will just bring you back to what we talked about today. Because we are
linked into NCVHS, we have an interest in getting more expert in what is
available. One of our themes for this year is thinking more about our
communities, the data available to them, what they do with it. It brought us
back to this slide that came out as part of the NCVHS Shaping Health Statistics
10 years ago.
As we look at it today, there is a way we could make it come alive, and even
test the validity of it, with the data that is out there. At the center is the
population’s health, disease, functional status, well-being, the incidence and
the distribution. Beyond that, our community attributes, so some of these we
could get. The biological characteristics, community age distribution, gender,
genetic makeup, health services, number of personnel that are available, cost
and financing. These are things that I think that we have been seeing today,
that are out there, and population-based health programs and so on.
Then, beyond that, context, natural environment, and this goes beyond HHS,
but air quality, water climate, weather, cultural context, political context,
and place and time. Bruce, you may want to speak to this a little bit to say,
would this be something that we would land on, to say if we use this to kind of
walk through, do we have this kind of data. Does it help us, what are the
complications that we encountered that we were not anticipating.
I just use this as something. I know it highlights Massachusetts because
that is where I am from. This is something HRQ has put together, and it
combines a whole host of factors, and comes back and tells you what every state
looks like. I just want to point out New York, as opposed to Massachusetts. It
is just something I happened to be looking into this week. This would tie in
very well with our ongoing focus on our community. This is Bruce’s idea.
DR. COHEN: This is the space where I want to be, but again, I don’t know
where other people want to be and how they see what we can provide. I am very
interested in pursuing this from the national committee’s point of view. I
think this can add value to us understanding community, the health of
communities and how people relate to that by essentially breaking down the
traditional silos of defining public health very narrowly. I think this is an
incredibly important space for the quality of life.
Listening to what Josh had to say and some of the folks today, I don’t know
whether we can develop applications from this that people will want to buy, if
that is the bottom line. Again, I am very ambivalent. I really want to be here
and explore the possibility, because I think we can add value from my
perspective. Again, I don’t know what the folks who once we liberate our data
independently, what they are going to want to do with it. I have these two
competing notions in my mind.
DR. CARR: Datapalooza and cool apps are sort of the enticements, but do we
want to take it to a level of a little more sophistication that is beyond a
cool app, but it is a relevant, sophisticated important question. I am not
saying this is the only one, but I really like Bruce’s idea that it ties in
with the work of the committee. You have an audience, and it would take us
through our paces if we did Healthy People 2020 and put that in there. I don’t
know if it is simplistic to think about rolling it up like that, that would be
a really cool app.
Ed, you have been saying everybody is looking for the one number, right,
that tells you about a community. I think also, as we look at this, we don’t
know what of these things, in context or community, drive health. We have been
talking about it for 10 years, for a decade, but no one has ever put it all
together to see if we could get one number.
DR. FRANCIS: Mostly what I have heard addressed is that the current data
that are up there could be more user friendly in many important ways. What I
haven’t heard is, and this goes to what you were saying at the end, and it also
goes to the choice question that Heritage put out, really an underlying problem
is that the datasets that are currently out there just aren’t rich enough, or
enough interesting enough.
What do we then start to run up against with respect to stewardship
questions, protection questions, and so on, if we are going to go richer. I
don’t know, and I think that is partly my role here. I just want to tell you
that there is some work that the committee is doing, if you weren’t here
earlier, about data stewardship for uses of community health data. This group
is going to get a copy of what we are looking at, before we do anything with
DR. CARR: I think of all the array of things that we have seen, there are a
huge number of things that do not touch into that sensitivity, and when
juxtaposed and integrated harmonize, can tell a very powerful story. I think
that we have got to walk before we run. If we jump into de-identification, who
has the right to know what, we are really going to have the same experience of
DR. GREEN: I would like to make an observation and nominate something for
one of your PowerPoints. I know there are going to be next steps or goals or
whatever you are going to do up there. I think I am safe doing this in behalf
of the NCVHS committee.
First observation is, I have been back there listening to you guys all
afternoon. So far, I have not heard anything that cannot fit comfortably into
the three themes that the NCVHS is going to guide its work going forward. If
you disagree with that, I really want to hear that dissent and understand what
that dissent is about. This looks very promising, from my perspective.
The thing I want to put somewhere on your list, if you will let me, goes
like this. Regardless of what type of sharing we have, when we stay focused on
helping the people in the United States have better, longer lives, and to get
the health care they deserve and need, we keep re-exposing, over and over
again, that this nation is missing an infrastructure. It happened again this
Infrastructure probably means different things in everyone’s brain, sitting
around the table. I am using that word to put a spaceholder out there for
systematic proper use of new knowledge and new technology. We have these
infrastructures for other human enterprises, but we do not have the
infrastructure in place to either turn the health care delivery system into a
learning system, or to turn communities into learning systems. What NCVHS has
been seeing now for a decade is a silent cry for that infrastructure.
A minor, but I think important, example for NCVHS, I usually just try to
beat this to death. I have annoyed Ed at least three different times about the
workforce. We heard in our hearings that the analytic capacity of this nation’s
public health system is weak. In some places, it is not weak, it is absent.
We have heard that we don’t have a fellowship program some place, that is
preparing people to do the sorts of work that I hear you guys wanting to do
here. Then, share it, disseminate it, teach it, augment it, scale it. Where is
that national infrastructure that prepares the workforce to work within a new
infrastructure, to just totally change health care and health status in this
I have passion about this now for several reasons, but the main one is, I
will state this negatively, there is a risk that we are going to get a lot of
cool stuff done and it isn’t going to matter, because we didn’t bother to think
what it was going to take to move data into information into stories that
change the world. I want to get that on our list somewhere.
DR. CARR: I am going to push back a little bit, because I think we can
either come at it from here down to here, or when I look around the room and I
look at the expertise, I am thinking if we do one thing, we will learn many
things. I am wondering if try to land on walking through one thing, it doesn’t
have to be a long time thing. Let’s see how it goes and that will obviously
I would agree with you 100 percent about the workforce. I have been saying
it as long as you were, even longer. Maybe it is too hard, so maybe this is the
interface. Maybe we create information that is so easily accessible, that you
don’t need a PhD in these various things.
DR. GREEN: I don’t hear that as a push back. I hear that as helping us with
our question about how are we going to adjudicate and coordinate our work. If
this group could get down and dirty, and while we are thinking about changing
the world, if you could just change something. If you can get a couple of apps
that just knock your socks off, and finally you say, now, I get it. I think the
full committee is well-positioned to grab that and then to provide advice at a
broader level, to do broader policy work that helps get that done.
DR. CARR: Which gets back to Bruce’s idea, which is if we landed on this,
the long-standing NCVHS kind of vision, and tried to say, okay, now we have
made all this data available, let’s fill in the blanks, what is it telling us.
If we take the healthiest community and find out it actually doesn’t matter at
all what the streets and the roads and the air quality is. And take the worst
community and we see, oh, actually, they have the best of everything. I don’t
know what the answer is. I don’t know if that is a simplistic way.
DR. ROSENTHAL: Real quickly to interject, just pick something. Do whatever
you are comfortable with. Do what you know to start with. You are not going to
find something else you don’t know.
DR. VAUGHAN: I would say maybe a lot of those questions are answered with
GIS. What happens where and why, and how is it different. I would also say
that, for all of the infrastructure we lack, we also have a lot of
infrastructure to work with, that we are not using. It is not so much that that
is legacy or old school, and this is bright, shiny and new, and a killer app.
It is what is the best practices of both and bring those together, and making
it possible for those to come together.
MR. SCANLON: A couple of specific things I would like to get out of the
group, not today. Number one, you all have ideas about, I wouldn’t call it
principles or guidelines, but something like that off the top of your head,
about how examples of what you heard today, which HHS could do better. Very
practical or theoretical, but again, that we could begin to. You made some very
practical suggestions today. This is to improve what we are doing already.
Maybe what we could do is to ask people to think about them and we will start
getting it on our worksite.
Secondly, I see a lot of student level applications, I will call them
student level, and a lot of enthusiasm. I am an optimistic. I think generation
three, those will turn into serious, not that there aren’t serious ones now.
Again, I don’t want to dampen the enthusiasm, but if you have ideas about how
you get to that stage, how we could get there a little quicker in the health
area, that would be helpful to us, including the idea of down the road demos or
incentives or challenges. Again, I don’t want to dampen enthusiasm. We want to
be careful of how we use your time. I think we are looking at sort of beyond
Then, third, I think do we want to pick, and we are not ready to do this, an
area. An area where it seems to want community level data, and there are a lot
of examples of that available now. I am guessing that many of these tools are
ignored. You can assess the use of the various tools now. We could focus on
this at a hearing, what differentiates use and application from just putting it
up there, and trying to get some best practices from some actual example.
Then, knowing what is useful and what differentiates, we probably have a
dozen, at least, community-level indicator type systems, and probably the
states have even more. We don’t need to build another one, we need to find out
sort of how to get the best out of those, I think. I would like to start that
way, from my recommendations, advice and principles about the job we were first
asked to do, which is to help HHS, one, get the data up, get it out there and
then we will see what happens later. Get it out in a way that can promote and
accelerate applications. Some of the ideas today, I think, were fine.
Then, two, maybe this is just the way things have to, how do you accurate
the serious applications. I don’t know. Again, maybe we don’t know, maybe it
just has to happen. Maybe there is no theory, maybe it is atheoretical, it is
Then, the other one, do we want to pick an area. Again, I would ask the
committee to come back with an idea. Is it community health tools, is it
quality tools, which you probably don’t want to go there. Just stay in the
health area, but otherwise, is it social determents, is it environmental.
Again, this is the information that communities put up, that organizations put
up. You are looking for a house in a neighborhood. Some of these apps will tell
you the quality of the schools and probably the environment in there. Is there
an area that we could look at to kind of specifically apply some of these
MR. DAVENHALL: Jim, I prepared a one-pager. I don’t want to discuss it
today, but I want to read it on the airplane. Part of it is me trying to figure
out what you have asked me to do, what job you really want us to do. I actually
think we ought to spend some time thinking about how we enrich the ecosystem
that we are talking about. Good medical advice to be that good, and proper
diagnosis before we start to fix what we don’t understand. I offer that to try
to figure out what it is you really want to do, and at the same time, give you
some ideas of what I think would bring the developers to the table.
DR. COHEN: Thank you so much, Bill. That was going to be essentially my ask
for our joint homework assignment. I need to understand more what space you
live in and what makes sense, and how you use words and language, so that we
can provide you with the tools that you need to succeed. What I would like for
you to do is understand what information we have in the space we will operate
in, and see if that leads to a fruitful relationship.
DR. VAUGHAN: I would also say, though, that part of the answer to what you
have asked is when do we get to from the cool to something that is useful, is
to start out with a different question. That is, what would be useful to which
we would like there to be easier access to the data, that that would empower
communities, that that would push this thing forward.
Same thing as any health survey, the same thing as any other epi study, any
other public health intervention. To start with, what are your goals and
objectives, to sound like somebody who used to beat me over the head with it.
DR. ROSENTHAL: I would respectfully disagree. When I use the word business
and usage, I also mean non-profit, as well. I am just using that broadly. I
think things have been done for quite a while, saying what would be good water
to drink, and actually asking the market, where do the horses want to drink, is
at least how I understood how Todd and Bryan framed the problem. I will share
my slides around that.
Bruce, to your point, I can share my two cents, I can provide resources that
you guys may want to look at. I think the beauty, what we have learned in the
consumer web world, whatever you want to call it, is the learning center, as
crowdsourcing that question is a really powerful way to go about it.
MS. GREENBERG: I think this has been a really useful and interesting
discussion. I think we have gotten some substantive reports from the
department, but also some really great little reports from the members. It is
Friday afternoon, and I, for one, am exhausted for various reasons.
I think that what Josh presented, and then what Larry said, and I am sure
what is on here, Bill, and what Jim said and everything, there is an
opportunity to try to connect these things, because we are talking about at
least two different things, which would be very helpful to, like Bruce, who is
running a state data operation. That is, what are some really top priority
things that would make these data more helpful to developers.
Also, that is a different question then, how could we make this data more
accessible to people who are not developers, but have all of these questions.
We know that health websites are among the most visited websites. Increasingly,
when you ask people where did they get their information, whether they have a
health problem or whatever, they are all going to the web, we are all going to
That touches on the workforce issue, which I agree with somebody, I don’t
know who it was, who said it is going to take years to quadruple, and it won’t
happen probably, but the people who are getting MPHs, and you don’t necessarily
need a PhD in epidemiology. That is where some of Josh’s suggestions, I think,
and we need to explore them more, are ways to make the data easier. Whether it
be these tutorials or these examples of these learning centers, not just for
applications or for people who really are data savvy, but for the rest of us.
NCHS, years ago, we had an applied statistic learning program or something,
and it was one of many things in the cooperative health statistic system that
was somewhere kept and had new names, others were lost. There was a reason why
we have that, and nothing ever filled the gap frankly.
I think that yes, we want to increase the really technical and well-trained
workforce. We want to just push this data out to people in a way that just
makes it much more accessible. They don’t have to be statisticians, they don’t
have to be graphics gurus. I seem to be resonating with Josh here, so I feel
like I am not complete on the wrong track.
MR. SCANLON: The employment rate is almost 8 percent. I think there are a
lot of folks who have the skills who can actually help here. I don’t think you
need necessarily a lot of new folks. I think you have to repurpose.
MS. GREENBERG: We don’t need what?
MR. SCANLON: Well, I think there are a lot of folks, including MPHs, who are
unemployed. I think there is probably an excess.
MS. GREENBERG: People who are unemployed, who could get up to speed, because
they have other skills or they could be repurposed or whatever. Our focus is
not just the people who do the apps, yes. It is also several of the things that
Josh suggested, I think, are really for leveling the playing field.
I think we will have a transcript from this meeting. We will get your
slides, we will have this. Then, those of you who are members of both groups,
of the working group, think about how to integrate it, possibly with the
project that you have suggested. I think both the full committee and this
working group will be more effective and more productive if they can support
MR. DAVENHALL: Marjorie, something you can do to help us with, some of the
things that you are mentioning, like, I look at your conference you had a month
ago. Most of those sessions with the data geniuses were filled. That tells us
there’s something that is really working there, but there was no sessions for
CMS at that. They didn’t have a data guru. Allison wasn’t present talking about
it or the data guru behind all of that. I am saying that was well-attended.
This is the place where you go and meet a person who has the diagram you want.
MS. GREENBERG: We have to realize the environment that we are in, that we
have been told conferences like that, which are biennial, should be no more
frequent than triennial or quad annual. We have got to look at all of the
DR. CARR: That is why we are here. We each speak different languages. I
understood the first part, the second part, not so much. Likewise, I know what
an MPI, but I think it is convening these kinds of groups in this small
setting, in larger settings. Josh and Ed, having conversation, coming at it
from completely different environments. That is what we are here to do. I think
that again, our next NCVHS meeting is in November. It would be nice to sort of
have a thought about what is the next thing. I mean, I think you are right, we
need to reflect on what we heard today. I would like to hear from you guys,
what would feel right to you. We have done some presentations of what there is,
what would be the next thing.
MS. QUEEN: Who is the audience for this working group? Is the audience for
the working group developers?
MR. SCANLON: It is HHS. You heard Bryan say today, and he really did it sort
of without substance. It was how do you just promote this generally. I think we
are looking for the data that we have, and we can go into a lot of detail, and
for the data we have already posted on Healthdata.gov, other mediums, too, how
can we promote the use of the data for applications of all kinds. I think he is
specifically looking for applications, third-party developers.
Number two, are there other datasets that should be made available there? If
not there, should they be available somewhere else? This is clearly an HHS
working group. It is not necessarily for developing reports. This is really to
give us fairly practical advice on bridging the gap between our data producers
here and other government agencies, and the apps.
Again, we could if we wanted to go into the various user communities like
public health and research, but I don’t think that is where Bryan is actually
asking. There are a lot of ways to do that. I think you have been selected
because you are technology. You have developed apps yourself, you know the
community, you know them. You know what we are not doing.
I think we are looking for not a report. I think we are looking for advice
from you, what are the principles, what are the recommendations that you would
provide us, I would say just after today, for example. I would like this to be
an actual group where, in two weeks, you give us your comments and we take them
to Healthdata.gov. I don’t want us to revise Healthdata.gov again and get the
metadata wrong, so there we are for another three years.
I think I would like us to keep this fairly practical, agile and oriented.
What is your sense of advice after today, after you heard from Bryan, after you
took a look at the Healthdata.gov, at the Health Indicators Warehouse. Give us
some advice that we could take from the department for that. That is number
one, because you already are good, you are experts at this already.
Then, beyond that, it would be is there a longer term, are there other
issues. Do you want to think of ways to bring the communities together. It may
not be the conference, but it may be other ways. It might be Challenge. We’re
FACA now, so we could probably host the meeting where we literally bring in,
not thousands, but folks that we think would represent the points of view and
expand our own.
I guess what bothers me is Healthdata.gov. I love it, I support it. It is
such a small part of the data we have, that it is what we are making available
without restriction. I guess what I would really like you to do is think about
how we could take these other datasets that are not restricted access. You can
get to them now, but you have to sort of have a data use agreement.
Then, look at those in the sense of, is there technology that can help us,
are there apps that can help us. I don’t think we are going to be able to just
put those on the website without restriction. Maybe it doesn’t have to be quite
the way we do it now. Maybe there are platforms and technologies.
We have probably six of these research data centers within HHS, and we can
describe those to you. Maybe there is a much more facile, agile platform that
we could aim for, that would do a lot of the work for the analyst. Then, you
get to the point where now you need the data, like the 5 percent sample or the
partially synthetic. I think that is really what Bryan and the leadership here
is looking for. Not that we don’t have broader issues and needs, but I think
that is really what this workgroup was set up to do.
DR. KAUSHAL: Can I make a suggestion? Thank you, first of all, that helps
clarify a bunch of things for me. I think we need to divide the workday into
the framework I sort of started with the supply site. Joshua had some great
ideas at best practices. That is not everything, so how do we increase the
dataset. I think we should spend some time, thinking around how do we improve
the actual supply of datasets and how do we just make it easier to access.
Then, I feel the next piece of our work, and with our collective networks, I
think we could help with that, work on the demand site. As an example, I mentor
in New York, Blueprint Health. It is Rock Health in San Francisco. All of these
young engineers are crying out for data to make their business models and apps
better. I also mentor them on business models. I think we could bring them to
the table once we sort of got our ducks in a row around the supply side,
iterate with them, get their feedback and then maybe go through another cycle.
DR. ROSENTHAL: There is a course at Harvard we are doing on entrepreneurship
around that with the young developers this January.
MS. GREENBERG: Could I just suggest that if we have a transcript, we are
going to have a draft summary. We also have these other documents, but they
will maybe be integrated since they were part of the record. We have the
SharePoint site, if we could use that to start teasing out some principles, as
you said, some priorities, both for applications and for more general use, for
increasing literacy really in the data across the board. Use that as a way to
be communicating, and do you want to have a teleconference before the November
meeting? It is two months away.
I think there is a lot of content as to what I was saying before, in what we
have discussed just this afternoon and before, and that has been presented,
that we could start organizing it somewhat in the ways that Jim had suggested
and that all of you have said.
MR. SCANLON: I would think by November, we would have a rough draft. We
would just accumulate your ideas for principles, and sort of how HHS do this.
MS. GREENBERG: Priorities, if you can do these four things.
MR. SCANLON: This is what would be most useful.
UNIDENTIFIED SPEAKER: Part of it goes back to the goal to increase the
number of apps, the goal to expand the universe of users so that the data is
DR. VAUGHAN: That is why we do everything, to improve health and healthcare.
MS. GREENBERG: Many people as possible, able to make some use of.
MR. SCANLON: There is an approximate kind of goal and that is get our data
out, so that would provide the conditions for these.
DR. VAUGHAN: You will not forbid me to tell any CMS and Ed Sondik and
everybody else how they can improve theirs as well, because ultimately, they
will do that. You see how it all started with the open government focus and
with these ideas of Healthdata.gov. By the way, it is health and environment
data. Healthdata.gov includes other agencies.
PARTICIPANT: I think if you want to get beyond a cool app, you have to ask
for something beyond a cool app. You don’t have to know that it has this button
here and that button there. What is the question you are trying to answer.
MR. SCANLON: What is the health or other personal issues? By the way, it is
tools, as well. For example, I think NIH just put up an BMI calculation index.
We have had others. Those are tools. You decide, you give us a sense of, are
they valuable? Are the datasets more valuable? Are they both valuable in just
separate things? Then, those kinds of apps get put on all kinds.
DR. VAUGHAN: Does that make a difference in people’s health? They get made,
but you have to ask, so what? Again, kind of pull back to kind of a more epi
framework, what is the question you are trying to answer.
DR. COHEN: The question that I am trying to ask and answer is, how can we
use all of the data we have locked up here, to create information to help
individuals and the communities improve their health.
MR. SCANLON: Through this particular technology for now.
DR. CARR: What I am hearing is we are going to get all of the information
compiled. We will have our transcript, perhaps we could get an executive
summary out of that transcript. I have tried to capture and revise, as we
speak, what we said here. We will meet again on the second, on November
14th, so between now and November 14th, is there a sense
that we would want to communicate, develop, revise, have a call?
DR. KAUSHAL: I think it would be a great idea. We have three hours again on
the next meeting? I think we can realistically have time to maybe jump into one
of the big issues, however you want to define it. Again, I am open to whatever
framework. If it is a supply site, we can focus maybe the next one just on
supply, if it is demand, we could do that. If it is supply, I would ask us all
to think through.
Joshua has already started with his stack, as well, around what are some of
the best practices we have from our combined experiences. Why don’t we just lay
that all out, maybe on the short side. Hopefully by the next meeting, we would
have our compiled thoughts and a couple of pages probably around what we think
is the best practices. Then, we can prioritize, okay, what are the most high
impacts of these, and use the next meeting to debate that.
Hopefully, that will whittle down to maybe 10, 15 different things. Then,
hopefully, we can maybe even implement a couple of those. That is just a
suggestion. I am not biased either way. How do we feel about that? We have
DR. SONDIK: Let me throw one other thing out that I didn’t hear. First of
all, I think the focus on Healthdata.gov, and is this a good tool for marketing
the data, I think is really important. It seems to me sort of a two-dimensional
kind of thing. I don’t know, it seems like we need something more dynamic.
MR. SCANLON: I think that we start with that, but I think we immediately get
DR. SONDIK: Let me throw one other thing that comes back to the people on
the supply side, I suppose, which is do we have a responsibility that goes
beyond the metadata in saying, here it is. Is there something else there,
because we know the ins and outs. We know what it is not saying. Or maybe we
don’t, we think we know what it is not saying. Is there a responsibility here?
Is there a liability here that goes with putting this stuff up, that goes
beyond the mosaic issue which is a concern?
The other thing, though, is that a point that came up many times, you have
raised it and Bruce raised it, as well, which is the use side. I think before
that first Datapalooza, we had the supply people, one strange group, and then
we had this other strange group, which was the developers. They got together at
the IOM, at the National Academy of Sciences, and I tell you something really
There were a few questions, like somebody said something about, well, you
know we know the difference by race of the number of people who get screened
for something. Somebody said, yes, you really know that? Well, of course we
know that. It is clear to us that we know that. The other side had no idea,
really no idea, of what was there.
Of course, they weren’t thinking about it from a standpoint of what the
questions were and the users were. I think another component to this is the
user side. When I said before, well, you know, it’s not clear exactly to me how
people would actually use the survey data, Bruce had semi-infinite number of
ideas immediately about that. I think it is really important in this, to crank
that in. I feel we are doing a lousy job of marketing the data, and it is not
clear to me that Healthdata.gov, a website, is what we need. I think it is
terrific to build on, but I think we need something more dynamic.
MR. SCANLON: It will become obvious to all of you that that is necessary,
but not sufficient for some of these, and that is why that will be the next.
DR. COHEN: One other comment, I think this endeavor is based on an
assumption and a shared belief that data will lead to better decisions, and
improve individual and entities lives. That is our underlying assumption. We
have got lots of numbers, we have got lots of things out there, and we want to
leverage that to improve people’s lives. We are looking for strategies to do
that. I think we just need to keep in mind that that is our assumption.
MR. SCANLON: I would say that information is necessary, but not sufficient.
Other than New York City, you can’t force people.
DR. SONDIK: We can prove that it is true.
DR. VAUGHAN: It is not always true in the same way, in the same place, at
the same time. You have to measure it. It doesn’t mean you have to have perfect
measure, but you have to measure.
DR. SONDIK: We know how many people in a county don’t know that they have
high blood pressure. We know that because we have it really solid on a national
level. We can model that, that is an app. Now, how good is that model? That is
an issue. That is the kind of thing where we use market forces to try to tell
us that. Right now, nobody is marketing that. Nobody is marketing the number of
people who have undiagnosed diabetes at this point. Part of that is we don’t
have the workforce over there to use that information.
DR. COHEN: I am comfortable. I have drunk the Kool-Aid.
DR. CARR: The table has been set for a robust and exciting discussion, based
on the entries onto the SharePoint. Thank you all very much. I look forward to
our next meeting.
(Whereupon, at 5:04 p.m., the meeting was adjourned.)