[This Transcript is Unedited]

Department of Health and Human Services

National Committee on Vital and Health Statistics

Working Group on Data Access and Use

November 14, 2012

National Center for Health Statistics
3311 Toledo Road
Hyattsville, MD 20782

Proceedings by:
CASET Associates, Ltd.
Fairfax, Virginia 22030
caset@caset.net

Table of Contents


P R O C E E D I N G S

Agenda Item: Introductions – Review Agenda

DR. CARR: Thank you for being here. I would like to convene the working group on HHS data access and use. What we do is we go around the table and say who you are and where you're from and whether you're a member of the working group. So I'm Justine Carr, Steward Health Care System, and chair of the working group.

DR. SUAREZ: Good afternoon everyone. My name is Walter Suarez, I'm with Kaiser Permanente and I'm a member of the working group.

DR. FRANCIS: I'm Leslie Francis. I'm at the University of Utah in Law and Philosophy and I'm on NCVHS's Privacy Confidentiality and Security subcommittee and I'm a member of the working group.

DR. GIBBONS: I am Chris Gibbons. I'm from Johns Hopkins in Baltimore and I am a part of the working group.

MR. CROWLEY: I'm Kenyon Crowley with the University of Maryland and a member of the working group.

DR. ROSENTHAL: I'm Josh Rosenthal with RowdMap and a member of the working group.

MS. QUEEN: I'm Susan Queen with ASPE, and staff to the working group.

MS. KANAAN: Susan Kanaan, writer for the committee and the working group.

DR. VAUGHAN: Leah Vaughan, member of the working group.

DR. MAYS: Vickie Mays, I'm a member of the full committee and I'm a groupie of the working group.

DR. COHEN: I'm Bruce Cohen, a member of the working group, a member of the full committee from the Massachusetts Department of Public Health and co-chair of the Populations Health subcommittee.

MR. SCANLON: Jim Scanlon, I am with HHS Office of Planning and Evaluation and I'm Executive Staff Director for the full committee.

MS. GREENBERG: I'm Marjorie Greenberg, I'm from the National Center for Health Statistics, CDC, welcome to NCHS, and I'm Executive Secretary to the committee and I'm also a groupie to the working group.

MS. JACKSON: Debbie Jackson, National Center for Health Statistics, committee staff.

MS. SEEGER: Rachel Seeger, Office for Civil Rights.

MS. JOHN PAUL: Tammara John Paul, NCHS, CDC.

MR. SOONTHORNSIMA: Ob Soonthornsima, member of the NCVHS committee.

MS. BEBEE: Suzie Bebee, ASPE.

MS. JONES: Katherine Jones, CDC, NCHS, and committee staff.

DR. CARR: Great to see everyone. We have had the opportunity to integrate the excellent input that we had at the first couple of meetings. And I think that we'll have an opportunity today to really have a working group with some deliverables or suggestions for HHS at the end of today. So what I wanted to do was just go through where we are and who we are.

As you can see we have here listed the membership of the committee, and as you heard Bruce Cohen, cochair of Populations, Leslie, Co-chair of Privacy, Walter co-chair of Standards, and Paul Tang couldn't be with us today but he's co-chair of Quality.

And then what was interesting to me is just the richness of our other members with great expertise in IT. Kenyon Crowley, Bill Davenhall, Josh, and Peter Hudson, not here. Mo couldn't be here, but their tremendous expertise in IT.

And then our other expertise. Wonderful that Chris is here today in his expertise in community health informatics. Patrick Remington has called in but not today I guess, and Kevin just had a baby, and Leah with her experience in social media.

And Chris, you weren't here for the first two meetings but it's been a learning process for these disparate groups to come together and get to a common ground, but I think we've made great progress. Our liaisons of course are Susan Queen, Ed Sondik, Nialls Brennan, Jim, and Marjorie.

So our initial charge is that we monitor and identify issues and opportunities to make recommendations to HHS on improving data access and innovative use, including content, technology, media, and audiences. Second, advise HHS on promoting and facilitating communication to the public about HHS data. And finally, facilitating HHS access to expert opinion and public input regarding policies, procedures, infrastructure, to improve data use. So that's our big charge.

In the details I've just highlighted seven things that we are asked to do, and began with reviewing current portfolio of HHS data, monitoring trends about new information dissemination, social media, identify and monitor types of data and information that's needed by all participants, improve data access, promote and facilitate creative communication, facilitate HHS access to expert opinion and public input, and to advise HHS in understanding and evaluation of how HHS data is being applied and the value it is generating.

So where we are on this ambitious agenda is starting with we had two webinars reviewing some of the data. We're up to half of the data that HHS has. And I mentioned, Katherine, if we could put links on the Working Group SharePoint site, it would be helpful to be able to go back and look at some of these data that we've seen already, and we'll be arranging additional webinars to go over the rest. And actually at the end of our meeting today, 4:30 to 5:00, we have a CDC presentation on some of their data.

But we talked last time about the fact that we wanted to use the opportunity of being face to face to really work and save the presentations for outside this meeting time.

So when we last met we talked about thinking about the two sides of the house, the supply side and the demand side. And so when we think about the supply side what we want to do is take some of the data that we've seen, and as said, identify and study areas of opportunity to improve data access and application. And Mo couldn't be here today, but we talked about let's come away with five tangible recommendations or next steps coming out of today that Jim can take back.

So I asked Josh to take us through from the supply side what does the data look like, what's available out there, what can we do for it to make it better. We'll look at some of the front-end platforms, and then we'd also like to go through some of the issues that Josh raised – taxonomy, hierarchy, and so-on, that we who use the data every day take for granted but not so clear to a developer who looks at this terminology for the first time.

So Josh is going to take us through that and then actually building on what Bill Davenhall also had recommended, that we identify candidate data sets to sort of pilot. But I think actually that's a bit of what we're going to do today. And then Susan, we also want to talk about the special considerations related to survey data availability and usability.

Once we kind of get through this side we'll then begin to think about the demand side. And as we said in the concluding comments at the last meeting what do individuals and communities need to make better decisions and improve health.

So let me stop there and open it up for any comments or suggestions, additions, directions. Are we good with that, Jim?

MR. SCANLON: A little more framing, Justine. And remember now, HHS has really worked with several audiences over the years: the public health community, the research community, the health care community, and that's really where our efforts have been focused.

So it's almost a business to business kind of an arrangement. You need a fair amount of analytic capacity to take the data, use the data, interpret the data, and apply it to the levels.

With this initiative what we're trying to do is – all of you have heard this before – liberate, democratize, all of the data that we have and probably other places, to make it available to a new set of audiences, and using more customer friendly technology, and certainly the latest technology.

So we put together the group – all of you were hand-picked by the way – but we put together because you know sort of both sides of the equation. What we've been doing at HHS for example is besides the usual outlets and ways of disseminating data. The policy now is to put our data holdings into HealthData.Gov - and so does EPA and the other health agencies and the federal government – and to put it there hopefully in a format that developers, applications specialists, and others could take. And really without us having to intervene could just go from there, use the data properly, but basically go from there.

And what we're really asking you to do is help us think of additional ways of getting data to HealthData.Gov, but really besides HealthData.Gov, there are a number of other publically available platforms - we're just beginning to learn about them in HHS - where we can make the data available, where you don't necessarily have to know how to program SAS or SPSS or all the others.

So again, these are different audiences. We're not trying to dumb down the research side or the public health side or anything like that, those dissemination outlets will continue in their full glory. But what we are looking for is applications for communities, consumers, patient groups, community coalitions, so they can take data and really use it to improve health and health care sort of in their own settings.

And you're sort of a unique group because we've kind of put together both sides of the equation, and the first couple meetings I think we've had a translation, everyone getting on the same page in terms of technology and data. But we're actually moving along very nicely and we're already beginning to get some very good ideas and we're really asking you to give us additional thoughts and help us think this through.

We've already looked through the portfolio of data from NCHS, as well as SAMHSA, our mental health substance abuse agency, and from the Agency for Healthcare Research and Quality, which many of you are aware of. So we're about halfway through.

I think at the end of the day, later in the meeting we're going to have CDC talk about its portfolio as well. And then we have a couple more agencies that I think we'd like to brief you on. So then you'll have an idea what it is we have. CMS was one of the first.

And these are all different kinds of data. Some of them are survey data where identity has to be protected, research data, surveillance data. But in CMS's case it's claims data, administrative data, literally from all over the country, probably the most complete set of local data that we have. And clearly that data is being used for hospital compare, to can compare hospitals to other places as well.

But again I think everyone would agree that we are not experts in HHS in the many current and evolving technologies and platforms and really ways of getting the data out, and that's what we're looking to you for.

DR. CARR: Okay. We have a couple of people who just joined the table, if you could introduce yourselves. And then Josh, you want to come up and start your presentation?

MR. QUINN: Matt Quinn from NIST, staff to the committee.

DR. GREEN: Larry Green, member of the NCVHS committee.

Agenda Item: Discussion of best practices and practical suggestions for release of "open" HHS data

DR. ROSENTHAL: Good afternoon. Hi. So three or four days ago I spoke at an industry thing with Brian Seebach and Niall Brennan and some other folks who were on my panel, and we came up with the ultimate device for displaying data securely.

And you're going to see Hopkins professor from MPH and a bunch of other folks, at BCDSCO trying to figure out exactly how to use this, but we put our slides and our analytics on this and this went over like gangbusters. You guys remember the old view masters you used when you were kids? So that's what they're doing. The Hopkins guys trying to figure it out.

But that's not what we're going to be talking about today, although that is funny, that tickles me ever so greatly. At the last HDI thing we had huge foam cowboy hats and you can see all the HHS guys running around in those as well, some industry folks.

so I'm just going to take 15 minutes and be done, in and out very quickly, but I just want to set up where I think we've been and where we're going and what we're doing. So this is our third meeting. The first meeting we got our charge from everyone including Todd, and he was very fiery and inspiring. And then out of that we all sort of left, or at least I left trying to figure out what was going on. And we talked a lot of meta talk about what we could do.

And so I just went together in sort of true startup fashion and slapped a bunch of stuff together. So I put together seven immediate things that I thought would be pretty good in terms of recommendations. They're probably wrong, they're probably backwards, but at least these were seven very specific things. If you walked out of here with nothing else at least we've done that part of the charge.

And then there's various specifics talking about what taxonomy is, network through file, showing how to go about doing that, learning centers and why those are important. Business value, baking that into the challenges. Semi and synthetic data sets, we have an IT person talking about that. Data browsers, this means the dissemination. Partnerships and products - and kind of mocked Todd there as well. Then a green button with an opt-in kind of like the blue button.

So that's what we did, and then Mo and company took it off and said let's actually try to make this reasonable. But this was the first cut as far as that. And so then what happened was over the past few weeks we took a tour. So those were just initial points. And a few of those were around taxonomy, and engaging other people and business contacts as part of the questions, so we weren't just doing cool fluffy stuff. And then also disseminating the data to different sorts of audiences.

And as you remember my background is also in kind of public health data from doing the Dartmouth algorithms and whatever, so I have a firm foot on that side of the house, but also on the startup side with Harvard and MIT and Hopkins, some of the other places doing some of that sort of stuff.

And so we walked through these tours of the data that the government agencies have, and we'd talk about data.gov, and basically there we kind of dumped files and we dumped datasets and maybe we structure it and maybe there's some meta data. And then we took tours of two different things, either a big file or a database like a claims identifiable database that has restricted usage, or kind of a small file that is kind of all over the place and it's really tough to figure out what the pieces are in it, or these kinds of secure enclaves.

And to do that there are kind of supposedly user facing because you can create your own table, but to do that to be honest, you really have to know what you're doing to go in there and most of those were restricted access anyway.

So if the question is how do we engage other sorts of people to use this data? I suggested that there might be a couple things we might want to think about and that's what we're going to talk about for the next ten minutes or so. So a couple things. Does that sound like a fair, accurate summarization?

So I'm going to walk you through some real life resources as best I can, but I'm also going to frame it up. And this file will be available for you as well. And again this is just kind of first draft, putting it out, I'll send that on to you guys. And this is talking about data browsers and really building community exploration and creation of intelligence and market value around information, not data. And so I'll walk you through the examples.

But if we're going to frame up the choice, what can a user – and I'm using that very broadly – do with government data? Users can do a couple different things. If you're very sophisticated and you have specialized expertise and you have access, you can go in and pull the files down from .gov and figure out what they mean, or maybe you're granted restricted access into an enclave when those fire up and you can play around and create your tables, and that's one thing. The amount of users that do that, I tried to poke around, we're talking no more than thousands at the very upside.

The other thing is you can build apps, and that's what data liberation and datapalooza has been about up until this point. We put the data up there and then we allow the app builders to come and do their thing. And theoretically they create market value, just like what happened when we put weather data up and when we put geolocation data up. That really hasn't happened as people will kind of know from looking at CrunchBase and the internal databases that HHS uses to evaluate, in terms of creating market value that hasn't really happened.

Part of that is because they're mostly based by tech people who really don't understand health care, and partially it's because it's DTC, which historically hasn't worked very well, and partially because in order to do that I can't really do that, and I can program pretty well but I have to have a decent amount of money and go into the app store, and maybe I build a team, I mean it's not easy to do.

So I always point to iTriage, and that has about seven million dollars of venture funding behind it, and that's one of the few successes around that. So that's a pretty high bar.

So when Todd talks about putting the data out there and letting the communities build the interfaces that's still a pretty select group of people who are going to do that, not just because of the technical skills which a lot of healthcare folks don't have those technical skills, but also because on the tech side to understand the healthcare data and the nuance you're looking at, those tech folks do not have those skills.

There's a whole informational educational conversation we can have, and that's kind of the last session we did at HDI where we did that sort of information for folks. But that's very difficult to do. The third option is what you can call a data browser, and I'll talk about that, or kind of an interactive front end. And in this scenario the data is already defined, all the meta data is done for you, the taxonomy is done for you. These are wildly popular in the tech world.

In fact one of the more recent contests which was one of the bigger ones was comorbidities of diabetes actually using some of your data and it was wild and no-one really knew about it. And the beauty of that is you play with the information rather than the data. So you don't need to do anything except drag and drop things. So we're going to talk about that.

Just to give you a sense of scale, one of the things I've been trying to figure out is who uses this data and how many people actually use it. And we've tried to get some internal stats from CMS and other folks and that's sort of been processed and what have you. So I am going to roughly, roughly 80/20 proxy it. When I go to Alexa, and this is a good place to go if you want to figure out how popular a site is, just FYI, I know you guys know this but it's a good place to go.

So this is up here, and go to Alexa.com, and this whole slide there will be available to browse at your leisure. So what you see is this is where you type in a site, you basically see how much traffic it has, what its rank is, how many people are linking to it, and you can look at different metrics. Data.gov has about 3000 sites linking into it, and that's a pretty good metric. That tends to be better than hits, but you can use your own stuff.

We're going to look at Google Public, and most estimates of Google Public data, they only say about one to five percent of the people using Google are using Google Public data. But that's still a lot of people. There are five million sites linking into it so you have millions and millions of page views every day, every hour.

MS. QUEEN: May I ask another question? What do you mean by sites linking into it?

MR. ROSENTHAL: So if I link to something, if I have my little blog or CHIDS put out a little post on the economic form you're talking about, and they kind of link, they link to Data.Gov. So they put up on their website or their blog or a tweet or anything on the web, they send a link and say hey there's this cool thing called Data.gov, go here. Or it's anything specific, it's anything that belongs to that domain. It may be check out the hospital compare site inside. So this is just ball parking it.

What I'm trying to say is Data.gov has a couple thousand sites linking to it. Google has millions and millions, and only a fraction of those are using this Google Public thing I'm going to talk about, but my point is that's still a lot of people, that dwarfs the amount of people. I'm just trying to get a sense of scale of what we're talking about.

And then there's these big tech sites, just to give you a sense, when they have these data explorer contests like ReadWriteWeb, they have about 40,000 sites linking in when they do that. And so you're talking more than ten times as many as Data.Gov, and that's just one instance. I'm just trying to get a sense of scale of when we talk about users we try to get some internal metrics and that's still really good to do and to hunt down but I'm just using some external metrics, back of the envelope, right?

This is kind of like the Google Consumer Surveys example I was talking about, you can run a consumer survey using their algorithm for a couple hundred bucks instantly, and it doesn't give you what you'll find out about it but I'm just trying to get a sense of scale.

And so Google Public Data Explorer, let me hop out of here and show you. Actually let me show you the slides before we hop into the live environment. Basically you just search for this thing and you go in and it has these dynamic graphs, this is like the TED talk where you click the play button and it does all the stuff for you, it animates it.

DR. CARR: I am going to take Susan's lead and just ask like the dumb questions to just bring everyone along. This is the browser. So you went to public data, and what's the data that's populating this?

DR. ROSENTHAL: So there's tons of public data and you can search by data types. The World Bank, almost every government institution has had their data taken out and sucked into this thing. Private data, public data, individual data sets, world health organizations, a lot of its international data or it's US data that's not health. There's a ton of census data in here, there's a ton of economic data.

DR. CARR: This is a site you go to and you can pull in the data set that you want, or do they give you here the things you can choose from?

DR. ROSENTHAL: They do this all for you. So if I were to go to this probably the best way to show you is just give me 30 seconds to show you.

DR. COHEN: You said they don't have any health data?

DR. ROSENTHAL: Very little. They have some, I'll show you what they have. But not nearly as much as what you could have. So I go here, I go to Google, I'm going to show you live how to do this.

MR. QUINN: How does Google populate this? Do they have a team of people there that search for this data? Or is it done by computers?

DR. ROSENTHAL: Both. So they have some automated things which Google tends to do, and they scrape a bunch of stuff, and then they have a team of people, and then they have users contributing this. Say if I'm at the World Health Organization, there's someone at the World Health Organization who puts the thing in a CSV file and dumps the stuff. But they have to do three things. Typically they have to give you the data and they have to define it in this taxonomy.

Taxonomy is a fancy way of saying table of contents, and I'll show you what that looks like. And the beautiful part is it's not just a data set. Once you put your data up there I can pull in economic data from World Health, I can pull in US census stuff. Once you put your data up there that mix and mashing, right now if I want to mix and mash health data I as a developer have to take it and I have to go and find census data and I have to go do some Google Survey data and I have to scrape it with some other stuff. Google does it all for you as do all these public data browsers. And so once you put your data up there anyone can mix and match it with anything else that's in there, and the stuff that comes out of that is crazy.

MR. QUINN: How can you make sure that Google finds it or Google gets it?

DR. ROSENTHAL: You'd want to contribute to it. World Health Organization contributes a lot of stuff to people because they want people to use their data and they think it's good to do.

DR. CARR: Just to bring it back to us, if with the HHS stuff some of it is there but there's more that could be there, then we could initiate and push that to that site.

DR. ROSENTHAL: Oh, yes, they would love it.

MS. QUEEN: What was the third thing? You said give the data, define the data.

DR. ROSENTHAL: That's about it. You give them the data, you have to define it in the taxonomy and you kind of have to label it. So they basically ask you to tell people what the data means. So when you say morbidity – here, I'll show you. So I log on here, and here's these different graphs that people have put together. And I'm going to look at this one. I'm going to hit explore the data. And on this computer it will take a long time.

DR. CARR: What you are looking at it looks like it says fertility rates, but that happens to be one data set that you pulled up.

DR. ROSNETHAL: Yes. So here's all the data. So here's the data that's going into this particular view. So this is world development indicators and I can search these data sets in different views and mix them and match them. This is going to take a long time to pull up.

DR. FRANCIS: Is there any way to tag the data so you know the data contributor?

DR. ROSENTHAL: Yes. It's funny you mentioned that.

DR. FRANCIS: So you can tell interrelationships between sources?

DR. ROSENTHAL: Yes. That's exactly what this is over here. So these indicators, that shows the data set contributor. That's different than the person who put the view together. The person who interprets the data and basically says you know what I think – I'll flash over to this one here if the computer fires up. So she won this little contest and she put together this thing. And so what she looked at was diabetes and these things she thought were the greatest comorbidities. And so she pulled essentially from Google Data and other places and tried to figure out what she thought had the greatest correlation, and now she has her name on it as well. And so she kind of became an expert and people tend to look at her and et cetera et cetera.

So there are two levels of transparency. There's one: where's the data coming from, so show me just the stuff from World Health Organization, or from US Census or another source. Or show me by type what am I interested in, or show me by the person who actually pulled the stuff into view.

This is Google doing all the stuff for you. The next layer on this is individuals using it and creating their own stuff. And here I can search by country. So here I'm looking at fertility rate and life expectancy, and I want to subset that by different types of lending types. I might want to pull in what am I interested in. I'm interested in COT emissions to see if that has anything to do with fertility rate. And so on and so forth. I can pull any piece of data in the system.

DR. CARR: Can you show us?

DR. ROSENTHAL: Yes. This is going to fry this thing but let's see.

DR. GIBBONS: What level of data is this? Is this county, country, what level is this?

DR. ROSENTHAL: Yes. This is the beautiful part about this. When you have a taxonomy, and this is actually far more helpful I think actually showing people than me just rattling on like last time. So when you have a taxonomy you're sort of defining your data, and one of the attributes about that is geographic region or unit. and so you have a separate part of the taxonomy, the table of contents, that talks about that. And so you link up country to region to HR to whatever you want to look at, CMS contract, country, et cetera, et cetera.

All of that is built in as a dimensional taxonomy. And so the answer is different data sets have different grains, and if you want to look at the view that only shows you the highest minimal grain, so you can't go down below what you shouldn't see. So before when we talked about how do we do grains, do we want to do zip code, this is how they do it in the non-healthcare world and it works pretty nice.

If you want to submit something you can say I want to submit that, I don't want to do zip plus four, I don't even want to do zip, I don't even want to do county, maybe let's do CMOS contract, maybe we'll do HRR, you can do that. And it works like a charm. Once you figure out that master table of contents then it's very snazzy.

DR. COHEN: You don't hide the data from the public, you just submit the level you are comfortable with sharing?

DR. ROSENTHAL: Yes. That is absolutely well said. And so what you'll find is that this may be a public set. If you wanted to be really snazzy you could do a synthetic set on it. If there's something that you think is very meaningful that you want people to wrestle with but they don't understand the concepts but you're scared of sensitivity or mosaic effect you can use a synthetic set, or you can just use the public sets you already have.

But the beautiful part about this is it doesn't require me downloading an excel file and dabbling around in SAS or whatever I'm going to do. It doesn't require me going on and even trying to figure out how to generate tables on my own. And there's this kind of compounding learning. When I put it up there the next thing I can pull in any of these other sources. And you can see by gender, by urban, by country, I can play the thing. I think what I'd like to show you, let me just take you on a little tour and then we can ask any questions because I want to put all the stuff in your head before we take specific questions.

So here we can look at metrics. Let me look at say health. So I have my data sets I'm saving here, let me say I want to look at world development indicators, maybe I don't want to look at that, maybe I want to look at other elements.

MS. QUEEN: I just wanted to add something to what Bruce was saying. It would be my assumption that something like BLS or Census that have made their data available, they already have determined the lowest level.

DR. FRANCIS: Actually, that goes to my question. If I submitted at this level because I don't want it going any lower, but somebody could mix it with another data set and get it lower.

DR. ROSENTHAL: They are going to typically enforce that maximum. So Google is pretty good at security stuff. I know the government is good too but Google is pretty good. And one of the ways they typically handle this is that maximum grain. You can basically set that in as a rule, as a business engine rule, and say anyone who uses my data set in combination with anything else, the minimum geographic setting is either the maximum of the data I submitted or a separate setting. So here I just searched and I found all these different things, out of pocket expenditure, when we look at to your point the data provider, who do I want to look at? World Economic Forum, Human Development, et cetera.

And here's what they have on health which looks like a lot, but I'm looking right here and this is global obviously and you can click on that. Condom usage, contraceptive prevalence, depth of hunger, diarrhea treatment, HIV, et cetera. The beautiful part is you can mix it with the other stuff in there: economics, lending, social media, et cetera, any of that sort of stuff.

And what would be really interesting is if I'm a researcher and I have a question, hey is condom usage and Twitter usage related on Friday night, I can get an answer to that question pretty quickly without having to write a grant and do that sort of stuff. That's probably not the best example but it gives you a sense of the sort of things you can do.

So that's Google. Google sort of does that stuff for you and you have to work with one of their reps, but they love the Government.

DR. GIBBONS: Can I ask a question? So are we assuming that data integrity is all very good?

DR. ROSENTHAL: They do a pretty good job, and usually they'll rely on – if you think you can do better, if they don't like it, they won't post it. They're pretty quick about taking stuff down if you don't like it.

DR. CARR: Is there a checklist that they have that assesses the data integrity?

DR. ROSENTHAL: They used to – yes, you would probably have to talk to them right now. There are folks that run that sort of thing. And historically there are kind of forms or templates you submit. It's not just a manual checklist, it rejects it if it doesn't do certain things. And then they run things. Not to get too much into the weeds but maybe the best example for the NCHVS people is to think about Google Consumer Surveys.

If you go under Google Consumer Surveys you can ask anyone anything and basically you pay a couple hundred bucks or something stupid like that and at the end of the day they enforce all of your survey stuff using Google's pre-existing algorithms of specificity. So they're leveraging Google's intelligence on top of a survey platform. And that sounds like those things don't go together but they're really good about doing that sort of stuff. And they sort of do the same stuff with the data as well.

Google is really good about data integrity, they're much better than probably anyone you've ever met, so when they pull your stuff up there it's going to be in the best shape it can be. But at least historically as I said if you don't like it they do it in partnership and they take it down. But this is a private entity doing this stuff for their own users, and they're creating these views and this platform. That's very different from the next level which is folks like these.

Here's ReadWriteWeb, and this isn't anything special but it's just one of a dozen highly read technology blogs that talk about data and healthcare is on their pretty predominantly. And so they're not health 2.0 they're kind of bigger web 2.0. And what they said is they'll give someone a pass for free to their version of ATI and $500 to get some tacos and beer if they win this prize.

It wasn't an application challenge where someone had to develop iTriage or something like that. It was just we're going to put some data up there, we're going to label it and put it in a meaningful table of contents i.e. taxonomy, and then we're going to allow people to play with other types of data and see what insights there are. The prize isn't can you build a cool app that does something that people may not use, the prize isn't can you take data at a granular level and come up with a better algorithm, the prize is can you come up with some interesting insights and literally do it in a way that lets other people explore your work and build on it, disagree with it, et cetera, et cetera.

And so they put up a little form and they partnered with these other guys, not Google but Tableau, although these were the guys that the Google founders at Stanford they built this company after they taught Larry and Sergei how do to the search stuff, so they're pretty good. And they basically step one, two, three, four, you submit, and then the person who won – so you can use any of this data that we looked at – so this got a lot of traffic, and what won out of everything, and this was like baseball stats, anything you can think of, crazier stuff than I would dare mention in this company.

And what won was an interpretation meaning a view or a pulling together of data that allows you to explore it of US obesity and comorbidities. And she won, and she was a student at DePaul I believe. And there she is and she talks about it, and her tagline was supersized caskets which is pretty snazzy.

She's interesting because she has no technological skills, she couldn't develop iTriage, nor is she really MP nor could she go through and say hey this is unwarranted variation in this HR and we think that's a comorbidity. And she put together this little thing. And so this is an explorer and she pulled the data and she said here are the things I want to look at.

And so here is the US and color density indicates RAID and so the beautiful part is this little thing went viral, went all the way around the web. I can take her view and I put it on my blog, I put it on my Facebook account, I put her interpretation of analysis in this public data everywhere. They don't have to go to Data.Gov, it floats all around.

DR. CARR: What does this tell us?

DR. ROSENTHAL: She tells us the things she thought were most correlated with obesity and diabetes. And so these are the factors that she likes.

MS. QUEEN: How did she determine which factors were most related?

DR. ROSENTHAL: She went through these and she probably did a regression analysis one by one. So she dragged, just like in Google I showed you the little dot bubbles and did the regression analysis with the flick of a wrist. And so the academics will obviously say I disagree with this or I do this, but here's the interesting thing, here's an 18 year old kid talking about comorbidities in diabetes.

And actually if you look at what she did she actually had some fairly high-falootin folks, some of whom are in government, judge her work as part of the criteria, which is interesting. So she talks about fast food restaurants that would typically take a five year study to do. I mean I know we have food atlas and food deserts and we've played around with that.

So she pulled from that set and flipped it up there real quick. Income, mileage to a store, a fast food restaurant, poverty rate, convenience stores with gas she was using as a proxy for fast food, low income receiving SNAP, price of low-fat milk. So she was looking at stuff that you might not look at typically and she's able to do it.

DR. COHEN: She used existing public data sets at the state level probably?

DR. ROSENTHAL: I think it goes down to county I believe. Yes, county. And in fact there's low obesity, price of sweetened drinks. So whoever asked the question, and I'm assuming, and this is typically how they do this, so she looked at probably 50 things and she said with low obesity price of sweetened drinks, if it's a lot of money for a sweetened drink, consumption of fruit and vegetables – and she's doing this from various sources – adults meeting activity guidelines, meeting household income, full service restaurants, et cetera et cetera.

And the point is not to look at the specific example but just to look at someone doing this sort of stuff. And the beautiful part is she's using different sort of data. She doesn't have to code, she doesn't have to program, she doesn't have to build an app. And she's just interpreting information, some of which is public and is out there and some of which should be out there. And that's her.

And then various folks from kind of the New York Times, they have their new departments who do just this sort of stuff. So it's not just on the fringes of the tech community, even in the mainstream media community this is big. So if I go into the gallery I can look at different things. If I look at health you can browse this on your own. Geography of diabetes, contributors to obesity, health care cost, tooth decay, and I can go on. I encourage you to play with this on your own.

So when we're talking about making our data available and thinking that someone has to build iTriage there's this whole other movement going on. And so very quickly I can filter. Here's what these guys did. This is Annette Griner and she does very good work actually.

So if I were going to hire a consultant or if I wanted to identify who were good at doing something I would look. It's just very much like doing a piece of code in the tech world. I can see how good they are and see what people think about them and review them. And let's filter it by poverty rate and obesity rate and ethnicity and see what happens.

MR. QUINN: Can you identify the underlying data sets?

DR. ROSENTHAL: Yes. So with Google you can and with these guys you have to download the client install rather than just the web-based one, and so you can absolutely do that. So what I tried to show you was two different approaches to this. Not to say that you have to do it one way, I tried to show you the two ends of the spectrum. Other people do this too. You can think of it as YouTube or Kickstarter for data and analysis. So Google takes an approach where they say everything has to fit in my table of contents if you give it to me.

And the benefit of that is they don't create and define the views, they allow you to pull anything else you want to at the drop of a hat. Tableau takes a different approach which is ironic considering their shared Google heritage. They basically say it's scattered. The data that you give us is a freestanding little unit with referential integrity only between the sets that you submit as a block.

And so if I were to look at this geography of diabetes versus tooth decay or versus healthcare cost I can't link back and forth between these sets. And so that means that the individual who sets it up has more flexibility in terms of the metadata structure in what they want to do. The downside means someone else can't come along and talk about baseball stores, do they have a correlation, which believe it or not there is some interesting stuff in there.

MR. QUINN: I was going to ask, Google publishes its taxonomy, correct?

DR. ROSENTHAL: Yes.

MR. QUINN: I didn't see any nations, I mean Google being worldwide, have any nations or any public health entities in the world said we are going to shift our vital and health statistics or other national statistics to accommodate Google's taxonomy?

DR. ROSENTHAL: I don't know the answer to that. Historically, their approach has been to do it collaboratively. So what they say is if a subject matter expert or a producer of the data knows something and feels very strongly about doing it in one way they typically accommodate that or don't show it.

MR. QUINN: In the world of building standards, you start with something that everybody can agree with you hope, and adoption results in standardization.

DR. ROSENTHAL: Back to my slides, I said kind of taxonomy, and we said what's taxonomy, it's really whacky, how do we do it, et cetera. And I said well actually people have your data in a taxonomy right now. You might disagree with it, but there's a starting point, there's a rough draft you can start from.

And what I'm trying to show you is there's different ways of putting this data out there that has different types of adoption. So this user is not a hardcore tech person. And by the way, as you do your own public health initiatives you should really look up there, there's some really interesting stuff believe it or not. Nor are they necessarily a data analysis person.

So in terms of kind of doing contests and incentivizing people I used that little ReadWriteWeb, I'm only using it as an illustrative example. Here they're giving away $500 and a free pass to their own conference so it basically costs them very little, and they partnered with a technology platform who was all too happy to partner with them. So outside of having a dedicated person in-house do it they weren't spending a lot with that and they got massive traffic per Alexis and some of the another analysis.

So what I was trying and hope to show you was you can do this sort of stuff with very little investment. I mean the users or the partnerships do it for you. And it reaches a very different sort of audience. And you can do that in two ways. You can say I start supply or demand side with the data I feel safest with, put it out there and see what happens, or you can say actually we're really interested in this subject and maybe we do a synthetic file or something we feel secure with at a particular grain and see what happens, what conclusions do they have in a particular topic of interest.

And what you can really do since there's only five or six kind of secure access passes granted to different enclaves you can say whoever wins the contest using that synthetic file judged by HHS - and the people in this room if you want to do it, so you can have your hand in seeing if it's worth doing or not - actually gets access to the real data. So there are all sorts of fantastic creative things you can do. So that's it.

DR. FRANCIS: On that slide, there was a place to click download, and what I'm interested in is what can be downloaded, and if data are downloaded is there a way to continue to enforce the levels thing when it is downloaded.

DR. ROSENTHAL: What you can do is if you are working in the Tableau world or people like them you can do whatever you want to do. You can basically say we don't want this downloaded, or you can say we're not comfortable with it downloaded, do something synthetic. Or you can say the only grain we're going to do is whatever you want to do. And they typically download extracts.

So a good example of this is run a free Google Consumer Survey and just see what happens, because the download you'll get – and this is a good analog or actually look at this – is a PDF of insights, so they do all the correlation for you. A survey I run rather frequently is what people think about Medicare Advantage. And I do it by open users, I do it by enrollees, and I do it by current users of Medicare advantage. We do that for our private thing. And it does it all for me.

And so in this state women with this education and this income actually have – it does it all for me. And I can download the PDF of those informatics or it gives me just Excel sheet basically, and their grain recently went from state to county.

MR. SCANLON: Rather than peer-review processes, this relies on the community of users to judge the quality or the accuracy of usefulness, right? And open peer review I hope process.

DR. ROSENTHAL: Yes. And they do it in different ways. So Google has kind of – not surprisingly, this is why they get criticized for kind of being like Microsoft. So they're much less peer-review based than say Tableau. So they actually are much less open. They actually, because they're a closed system, their experts internally review it. But Tableau takes a different view and they basically open it up.

DR. CARR: This really is remarkable for how disruptive it is for our traditional ways. And I think about the meeting that we just had over the last day and a half, we think a lot about privacy, that was a lot of our discussion, and it's interesting that in some ways they've solved it. Like we're not going to allow you to get to too small of a level. So it's not debatable, it's just done.

And we're not going to allow you to merge. And I think the thing that's interesting is this is hypothesis generation in a way more than a peer review article. And I think it speaks to how we learn in this new environment. We talk a lot about how many journal articles there are versus 10 or 20 years ago. Now we say how much information is available and minute to minute it's not possible to make a world of peer review articles over 18 months or whatever, 20 years, how long it takes to publish an article.

So this is extremely informative as we think about the data and now where it goes and how it's used. I mean the examples you've showed, the diabetes one is just powerful. We've had lots of article about growing obesity, but when you see that and you juxtapose with all of the community factors, again it goes back to for NCVHS because we have our diagram of the influence of not just the person and the community and et cetera, this is the embodiment of that kind of diagram.

DR. ROSENTHAL: They don't have your expertise and they would obviously value that very highly. The other thing I should say is that hypothesis generation or education. So rather than how are you going to field the knowledge workers and et cetera, you can get a traditional MPH and that's interesting, or you can go to Corsair edX, MIT and Harvard put their courses online, Stanford has a data mining certificate for like $10,000 you can get a certificate from Stanford and you're certified as being able to mine data and you play with some of the stuff.

And when I've been hiring in our product development I've always taken the latter candidate rather than the former. So the point of the story is there's a whole bunch of different things in here that are worth considering.

So I hope this is helpful because the first time I was talking about stuff it wasn't terribly helpful because this was the mental model I had. So feel free to dismiss it or change it or do whatever, but I just wanted to give a sense of the type of stuff which is out there, and I think that can be a real credit or something that the committee can contribute. Because right now health data initiative is about putting data out there, letting people use enclaves, and helping the app developers do it.

And then you know what happens? The guy building the where's my parking app also just takes the hospital compare, that doesn't really solve a business need. By business need I mean also public health, public social need, reasonable value, however you're going to do it, where this is kind of a different way of going about it.

DR. COHEN: This is great. Thank you for letting us see how your mind works. I see this really as evolutionary rather than revolutionary. From the public health perspective first we generated huge stacks of reports of numbers that sat on a shelf. There was probably something that preceded it, probably just filing statistics and books, and then we generated the reports when we learned how to bind and when Xerox invented photocopying.

And then a lot of folks at a variety or perspectives – my perspective is government – have produced web-based query systems at the state level, at the local level, and at the national level. And the focus there was nobody reads the report so let's put them online somehow through these web-based query systems, and that's been perking along for about 15 or 20 years.

Some of the state WDQSs are more sophisticated than others, and I see this as the next version in that progression of disseminating information. This is just, to me, an easier to use easier to access web-based query system that gives the user a lot more flexibility because the contents, the direction are predetermined by the data holders.

So we're saying we've got all this neat information, let's get it out there, some in more combined formats and some – why don't we put all the vital statistics at the county level from the US, give it to Google, and put it up there and let people work on it?

Or if we wanted to focus on a project that's more directed, why don't we take heart disease mortality and behavioral risk data on heart disease risk factors and what we know about the social determinants related to heart disease and work with somebody you think and put a data set out there and see what folks do with it.

DR. ROSENTHAL: That was sort of when we were doing the webinars. It was kind of asking the questions, are they thinking about this sort of stuff in these ways, it didn't go very well so I tried to back off. That's exactly what I was trying to articulate, much better said.

DR. FRANCIS: I want to go back to the point Justine made about privacy. From that privacy perspective the interesting questions are what level, if I'm a data contributor, what are the levels I want to put in?

And another sort of version of that is is there a way that I can make sure that there isn't a proxy for something lower than the level that I'm comfortable with those results when you aggregate data sets. And I simply don't know the answer to that, but you're shaking your head, maybe there isn't any way to do it. But I'm assuming that there must be, so you don't get – I don't know, I'll call it GIS coordinates replacing zip codes or something like that.

DR. COHEN: We started with the basic building block. Let's say county, which most people accept, people of New England really relate to. Then you can essentially build robust data sets and where the data are too sparse there are a variety of ways, and you can have some blank counties or there are techniques to impute county level values. If you get aggregate county level values from a variety of indicators depending on the cross-county classifications, we can throw as many different data sets on top of those for people to look at simultaneously without increasing the chance of identifying any one individual. So I think looking at aggregate data overlaid is a different strategy than linking in individual level data that increases the probability.

DR. ROSENTHAL: That question came up in a previous meeting, and so in a slide, not this one I gave you but a previous one, you'll see privacy comp side, personnel side, health care, I have her quote up there, a professor on that specific thing at one of the conferences I spoke at, and so she's addressing that and essentially saying the same thing in much more complex language.

And number two, here's a crazy idea: You say we don't have that download button. It's literally a closed environment. The only reason you need to download data is to be able to pull it down, link it with other things. But if you can do it all in a browser you don't need to download it. So those views I was showing you, one of them had download and one of them didn't. That's what I was trying to say earlier, if you can do it all in the web you don't need to download it.

DR. FRANCIS: That is where there are going to be really hard questions about what you allow there, too. For example the SAMSA example that's in the minutes raises exactly that question.

DR. ROSENTHAL: That is why I started with the view master, that was sort of tongue in cheek.

MR. SCANLON: This is not for everybody. Obviously the research community, the public health community, they're going to continue to get the data they want for their own work and then run analyses, hopefully in better ways. But this is another attempt to reach a whole new set of people, it's not so much business to business, it's more business to person.

DR. ROSENTHAL: Some of the stuff planetRE(?) does is a version of this, and Tableau, they do a lot of B to B. So Fair Isaac, your FICO score comes from using this. So this is the public or kind of dumb version of the thing that actually supports the very high end B to B stuff right now.

MR. QUINN: The thing that really struck me about this, and this has got my brain going, is this taxonomy. That is the key to all of this and anybody who's ever said, hey, let's develop a taxonomy and get everybody to use it, you can just here the whooshing sound of time, and it's 20 years later and you're no closer. And this is the key to putting all the data together.

DR. ROSENTHAL: And notice their taxonomy isn't like a metadata of taxonomy, it's data, it's the business taxonomy. In your original files you'll see an example of a business taxonomy of our data. What is this, how do I use it to answer business or performance questions.

MR. QUINN: It is more normalized.

DR. ROSENTHAL: Yes. So it's not just this is the ICD-9 code, this is answering specific questions.

MR. QUINN: You can always build it out for that. That's an area where I would love to see communities in specific areas – for example public health, or specific areas where there is the need for metadata or semantic detail.

DR. ROSENTHAL: The community kind of has three approaches to it. Typically the academics is to say let's get in a room, let's forma standards committee, we know what's best, and that's going to be our taxonomy. And that tends, historically, not to be the most efficient way to do it. And there's also some questions about the integrity of that as well.

The other option is just to do a purely open source community which is what Google and these guys have done already. The other option is build a community of subject matter experts. Every MPH, grad, et cetera, with peer reviewed et cetera. You have different options.

MS. QUEEN: Josh, on your previous slide I think it had the contract number. This was where at the last meeting when Ed Sondik was here, there was a definite disconnect for those of us in the survey world who don't even know who in the agency or where in the agency you have the information that's needed. You knew what you were talking about, I don't think we understood. Where do you get that kind of information?

DR. COHEN: I think the best solution is to take a data set, take NHANES or take mortality data or take a Medicare claims file and sit down.

MS. QUEEN: But NHANES is a data dictionary. What you're talking about is something different when you're talking about contract number.

DR. ROSENTHAL: This is payer compare data with Medicare and MA contract.

MS. QUEEN: The surveys all have the detailed variables.

DR. COHEN: It is not going to be rocket science to create this taxonomy, the file formats and structures exist, they're just not arranged in a way that developers think about putting data into applications.

DR. ROSENTHAL: The beautiful part about this is once the developer does that then – this is what I was trying to say at the other meeting, if I want to do this right now I have to as a developer do it myself, and then the next developer, and then the next developer. In this scenario, I do it once and then I put it in the browser, and then everyone is living off the fruits of my whatever. You can have an official standard or you can have multiple people using different things.

MS. QUEEN: But there is one big difference with the surveys, and I know we'll talk about this later, the survey data has to be run to generate. So you have to run it, you have to weight it, when you download these data files you're not going to pull out a record from that, you're going to run it to get whatever it is that you're looking at. If it's health insurance you may have six different health insurance questions.

DR. ROSENTHAL: You are essentially giving them an analytic extract, and in that Google Public data there is results of kind of international survey data in there.

DR. CARR: If we were going to then – because we have our charge of coming up with specific recommendations – what do we want to take away from what we see here today?

DR. ROSENTHAL: Can I say one more thing and then I'll be quiet for the rest of it?

DR. CARR: Don't be quiet, that would be foolish.

DR. ROSENTHAL: Don't tempt me. What I was trying to show with this was originally in the recommendations we talked about kind of taxonomy and learning center, all this stuff, and the question was what is the unit around the learning. Is it around the data? What's the definition of contract versus HR? No, that's not so helpful. Is it around this, that, or the other thing? Originally when I made the screen cap in your other slides of doing kind of HHS and CMS data driven product development around seeing what people use, this was a piece of a broader learning center.

So I didn't originally envision this as its own thing, free-standing, but to your point whoever asked about authorship and community, you have this functioning as a layer in that. Someone working with the data and saying how do you do it, someone working with information and saying how do you do it, and then someone on top of it saying how do we actually build applications for public/social good, business value, et cetera. So I just wanted to throw that out there, I don't want to overly narrowly focus on this. This was just kind of a piece of the puzzle.

DR. COHEN: I understand the context for what you're doing, but we have to start somewhere. And to the extent that starting small, the first goal is can our charges deliberate the data. How do you start liberating the data? You start by getting it out there and see how it resonates and then working up from there, I think it will create its own context rather than trying to think about the grand scheme and work down.

So in terms of a first step these tools exist and they're well used by everybody apparently except folks in the public health field. And representing the feds here, we're not familiar with these, we haven't used the value of these in getting our data out for people to do the things they want to do with it. So in terms of a first step I think this is a wonderful strategy.

MR. QUINN: The thing that strikes me with this is how about we choose a data set that we're comfortable sharing with Google, with the world, and see if it fits the taxonomy and work through the process of putting it out there, see what happens.

MS. QUEEN: I think with the Google one, because I was playing with it, aside from Census I think something from AHRQ is already up.

DR. ROSENTHAL: They pulled it.

DR. CARR: They pulled it meaning they took it down?

DR. ROSENTHAL: They scraped it.

DR. CARR: I need taxonomy. What does scraping mean?

DR. ROSENTHAL: They have these little automated things and they kind of crawl through the web and they pull information down in structured ways and then they can post the data.

DR. CARR: It is data that is already on the web and they pull it into their application?

DR. ROSENTHAL: Yes.

MS. QUEEN: It was something from AHRQ. It was either maps or HCAP or whatever's been made public. So with HCAP you don't have a public file.

DR. COHEN: Wearing my community hat, the problem is with surveys because of level of granularity. I would choose a more surveillance type data set like births or deaths or cancer where you can populate data from every county.

MS. QUEEN: It may have been the HCAP.

DR. COHEN: I think that would resonate and reach a broader audience.

DR. ROSENTHAL: I would only say, just while we're thinking about it, is that ReadWriteWeb example is very interesting. You have people at HHS who are pretty good at business liaising, who know some of the people at places like ReadWriteWeb. If you're going to do it you have a lot of cache, don't kind of post it on your own.

Depending on how big you want to go, if you want to have a million people using this data set if you do the PR and marketing around it – in the first meeting we talked about kind of PR and marketing around it, I just wanted to throw that up there as a lever. You can post it up there kind of in the night and then it'll be up there and no-one will really know about it, or you can go big and say the winner of this gets the HDI thing, et cetera.

DR. CARR: Wait. I didn't really understand what you said. Let me break it down. We can put data up on the web, we could put it on Google or we could put it on that other site, and we'd be better than we are today because more people would have access to it. And then you're saying we could draw attention to it as part of a prize or something like that?

DR. ROSENTHAL: Yes. If you approach in a public/private thing, and there's people at HHS who do this. There are challenges, and it's worth thinking about the marketing, do you want to do it stealthy and kind of put it up quietly or do you want to make it --

DR. CARR: Well probably all of the above. But do the one thing. I mean if we could make it available to these sites, Google and what's the other one called? Tableau. We could do that, and that would be light-years ahead. That would open that data up to orders of magnitude more people, so that would be huge. Then we could be better than that by drawing attention to it.

DR. ROSENTHAL: Relatively easily for 2013 there's the application challenges, those are on-going, they're planning it right now.

DR. CARR: You are talking about a Datapalooza?

DR. ROSENTHAL: Yes, as we speak. And the challenge is going to be who develops the best application. You already have your data up there. My point is there is very simple things you can do to broaden the users. Not just people who want to use the data and introduce challenges to the applications.

DR. CARR: They are not exclusive, though, putting it up there is number one and then drawing attention to it is number two.

MR. CROWLEY: Part of the value in this too is being able to mix and match different data types. So rather than thinking about one data file now they want to think about a couple of different data files that are being used in different ways, and then see what works how and what doesn't work, perhaps different opportunities within this data structure.

MR. QUINN: I just think the process of going through the process of selecting a data set or two or three or four or whatever, and actually walking through the steps of doing it, posting it and seeing how it works and the reaction, just the government process of doing that, and then the process of seeing if people find it, the feedback that we get, that's valuable in itself.

MS. QUEEN: I am wondering if with the Google site for example, if for those data that have already been made available, public use data files or public data sets that are already up there on HealthData.Gov or Data.Gov, Google presumable could go and download them as part of their site, is that correct?

DR. ROSENTHAL: Verify everything I am saying because things have changed in the while since I looked at this.

MS. QUEEN: If they are publicly available they've already been through all the proper disclosures so it can be downloaded.

DR. CARR: I am going to be the person asking the really simple questions, but I think that's a great point. So if we have already made this data available on the web, then Google could go and take what's there, right?

DR. ROSENTHAL: Yes. This will get into a business discussion really quickly, so you're going to have to have someone representing the agency talking to someone at Google representing it.

DR. CARR: We don't need to get to that, but just conceptually.

DR. ROSENTHAL: Conceptually if you're going to do it, or if you're going to spend a dollar, you could hire a third party to do the scraping and formatting for you.

DR. CARR: In other words HHS will be part of the process of that data going onto that website, whether it's proactive or reactive?

DR. COHEN: All the mortality data at the county level by age, race, sex, and cause are already up online via WONDER, but WONDER is pretty much a cult product because it's difficult to use and people don't know where it is and how to see it.

MS. QUEEN: Isn't it a tool? That's not the raw data, correct?

DR. COHEN: The raw data exists back into the system because it's got all the death information at the county level.

MS. QUEEN: The distinction I am making is from the viewpoint of somebody who is trying to download something. Like when I query BRFSS, I do the online tool. I don't download the data, I just get the results of my query.

DR. COHEN: That is right. I was responding to the notion that there are data that are publically available via these tools, so the underlying data exists there and I'd say they're all on parole, they haven't been liberated.

DR. ROSENTHAL: In terms of feasibility, of what's possible or not, I think that's a pretty good place to kind of get a subject matter expert to come in and talk about it, or I'd talk to Google directly.

But if the question is do we want to pick a set and try to do something with it, before you go into that I would at least do a feasibility analysis and say is it actually easier, can they do that without us doing it already? Do we want to do it to maintain control in certain ways? Or instead of doing that should we take an alternative path like some of the private scrapers I was talking about?

DR. CARR: Let's try to map out a roadmap here. We've seen about a dozen data sets, 10 or so, that are available only in these structured sites of HHS. So now we would want to move some of them, maybe all of them, to a Google data platform. So one is to do that we have to have the right people in the room from HHS, and the second is that they would define the parameters of protection, et cetera, sample size.

If we made a recommendation back to Jim that the data leads in each of the agencies and in HHS do exactly that, that they meet up with Google and report back, which ones are a go, and if any are holding back, if so, why? I think that would be a very valuable next step for them to do. I like the idea of piloting the data but I don't see a need to pilot anything.

If that's all out there then let's move that forward. If we thought about a pilot, I think then we would be in kind of a different domain of either responding to here's a need and this data could be put together in a particular way. Or maybe, as you said, cardiovascular or an app or a business case, or something like that.

So they're kind of two things. One if simply continue, complete the liberation by making it more digitally usable.

MS. GREENBERG: In this brave new world I'm this old-school executive secretary of this committee. Not of this working group, but of the committee.

So I guess we talked a little bit before lunch about what are we talking about, and I still, from the point of view of how this working group would convey information, I still think if you're talking about recommendations that have to go through the national committee on vital and health statistics, if you want to make some suggestions like we heard this presentation, this sounds kind of interesting, we might want to explore this or something, that's fine.

But I really, just being sort of steeped in all the policies, et cetera, of several advisory committees, I really don't think that this working group can make recommendations on its own to the department.

DR. CARR: Sorry. I am sensitive to that. As a reactor panel we could perhaps have the reaction that this front-end browser could add value to the datasets that are already out there.

MS. GREENBERG: That might be worth exploring. I'm from a really data for dummies point of view, so I'm more of a data policy person than a data user person. But if you take data that are already on the web, people can download them, maybe it's not so easy, but the intent is not to make it difficult, the intent is to make it available. I recognize that it may not be that user friendly to the uninitiated.

So if it's already there I'm trying to think what's the downside of putting it – let's just start with the Google rather than the Tableau – but what's the downside? Well, one is that you asked the questions, what do you have to do?

I mean a question of can Google just come and take it, it sounds like they couldn't because they would need this taxonomy. They're not in the position to write out a taxonomy, are they? I mean they have a taxonomy, but how do they put these data that they don't know anything about into this taxonomy?

MS. QUEEN: Marjorie, for all of the data that are downloadable from HHS that are publically available, along with there are links to the information, so the information is there. To me the difference is if Google has done this with data we've made available they've done it with data we've made available.

For HHS to intentionally ask them to do it or sanction it or whatever, it's sort of like HHS would have to be approving what Google does. There is a distinction between having it out there and letting them use it and encouraging their use if anybody uses it.

DR. ROSENTHAL: I will just say three things. They can absolutely do that right now if they have the intention, and in fact they've already done it and they do it with a bunch of different bodies and there's very little you can do about it unless you want to get very feisty with them.

MS. GREENBERG: Why aren't they doing it, is it too much work?

DR. ROSENTHAL: They have done it, with some data sets. There's more to be done for sure. That's why I say there's almost like a business context to think about. They can do this on their own, do you want to encourage them to do that as you redesign Data.Gov and do the next generation of it?

Do you want to just encourage them to do that? Do you want to have some sort of working partnership with them where there's some sort of intelligence or security you're putting on top of that? Or do you want to kind of do it yourself and take the more active role?

That's kind of the decision you probably want to think about, or one way to frame it up. They don't need your help to do that, they can do that on their own quite well.

MS. QUEEN: Isn't that what NIH is doing with Amazon? Something with the genome data, the big data?

DR. ROSENTHAL: Yes. That is just one way to think about it. You can pretend you never saw this and just kind of let them go on their own, and they'll put more up and five years from now you'll look and a bunch of your stuff will be there, or you can say we want to actively encourage them to do this, we want to promote it and we want to get users to do that, and that means some very specific 101 marketing and PR stuff that we're thinking about. Or you could think about becoming even more actively involved in terms of saying we have research and public health agendas, we want to put certain things forward in terms of et cetera.

MS. GREENBERG: I'm asking those of you like Susan, who are more familiar with Data.Gov, we have high confidence that even though by putting it there or it being there, we must, because they can do it now. No matter what other data they mash it up with or whatever, it just isn't going to be a problem.

MS. QUEEN: Well, we have all of these privacy and compliance requirements that the agencies go through, a lot of them are very similar but some of them there may be a suppression of three to a cell or five to a cell, it may vary depending on the agencies. But they have a lot of different statistical disclosure techniques that they apply to the data before they are released.

And that was one of the documents that I put up on SharePoint was something that we had to develop as we put together for OMB when HHS took over HealthData.Gov from GSA. So it described all the different techniques that the agencies use to protect the data and to minimize disclosure of anyone's identification. So that's out there.

DR. COHEN: I agree with Josh. I mean, what's the business case? The business case is we want to proactively liberate the data to expand its use and to make it easier for folks who haven't traditionally used our data to be able to get to it because they use these tools, they know these places, they don't know the indicator warehouse.

All the data at the community level in our first presentation in the indicator warehouse, all those data have already been reviewed and populated pretty much mainly at the county level. There's a huge amount of information out there.

If we push that into Google as another platform for release, or if we want to create patterns using Tableau, combining those data with other data that traditionally don't fall in the public health sector, I think that's the goal of data liberation, really.

So it's not protecting our enclaves, and there's already a lot of information that is quasi-public, but people just don't know how to get to it because it's not in a place where they go to look for information.

MS. GREENBERG: What is the downside?

DR. COHEN: My answer is there is no downside for the agencies or for the community, and particularly I think this will encourage agencies to think more creatively about how to disseminate information in their data.

MS. GREENBERG: Is the only reason we haven't done it is because we don't know about it, or we don't have the resources?

DR. COHEN: It is not a priority. We spend 99 percent of our time making sure the data we collect are of high quality. We spend one half of one percent of our time thinking about dissemination and getting it to the people that need it. And that's just what we initially thought – I'm speaking now as a government person – what our charge and responsibilities are.

We fulfill our obligations by being the stewards and making sure the data is collected and protected, and once we're comfortable with that we need to flip the switch and send our children away.

DR. CARR: I just want to make a comment. I think Josh said it in the beginning, you have a finite amount of time to spend and you can spend it trying to figure out how to manage the data, or you can outsource that to Google and now spend your time thinking about the content and the value.

DR. GREEN: I want to join the Susan and Justine line of dumb questions. In this arrangement, what is a data analyst?

DR. ROSENTHAL: I don't do very well on philosophy so I have difficulty answering what strikes me as metaphysical questions.

DR. CARR: Where are you going with that Larry? Maybe elaborate a little bit more.

DR. GREEN: It is not a philosophical question, it's a job description. What's the job description of an analyst once you get here? In the old world we'd whine and complain about public health departments having very little analytic capacity. And we paid premium bucks to someone who can develop mastery of a data set, understand it in depth and breadth, understand what can be linked and what cannot be linked, et cetera. So in this world I'm just asking what is an analyst going to be.

DR. CARR: I think if we get back to the life cycle of data or the supply chain of data, I think we have moved the nidus of control from one venue to another. And I'm not sure that an 18 year old can tell us about diabetes. So that's already happening whether we planned it, liked it, or whatever.

But I think we have to think about then what is after that hypothesis generation, what happens to that hypothesis. It looked pretty right to me, it looked good and there's a lot of stuff behind it, but someone else could juxtapose a lot of data that was not as sophisticated and that too will be out in the public domain.

DR. ROSENTHAL: In this sort of interactive business intelligence world outside of health care and even in health care, not necessarily at MHP world, typically there's kind of business analysts, there's business intelligence analysts, there's metadata, they'll called a metrician.

So one way to think about this is it's become more specialized. The person that sets up that taxonomy has arguably a much higher degree of skill. If I wanted to do that five years ago I had to run thousands of lines of SAS. That's all I did, I sat there and did that. I was an analysis but I was running SAS all day, I had to run a regression.

The person who sets that up is largely doing the same thing, they're just not doing it by hand. There's a greater degree of specialization. There's not so much a generic analyst but there's difference types of analyst nowadays is the easiest way to describe that.

And in terms of the fundamental tasks are very similar. One way to think about it is the speed and rate has increased. So if I'm running an analytics group which I have, and one of the analysts comes to me and I don't like his factorial analysis basically, I think his correlation is off, I don't think he's correctly done a feature construct set and come up with reasonable comorbidities for diabetes, I'm essentially doing the same thing that took me three months to do in two days now, but the functions are the same.

DR. CARR: I think Larry's point is that you have a group of people under your control, and you supervise them. But unsupervised people have access to these data as well. And that's a reality, and that is the disruptive piece around this.

DR. COHEN: Perhaps disruptive, but essentially we're moving the data upstream to put it in the hands of the decision makers rather than having to create this artificial interface between the people responsible for decisions and the data, who we used to call data analysts. I mean that's the goal of data liberation.

Decision makers make decisions whether they have data or not. Our underlying assumption is by providing more data directly to decision makers they'll be able to make better decisions. It might not all be decisions that we agree with, it might not all be decisions that we would have made given the data.

But our goal is to provide decision makers, whether it's individuals choosing health plans or whether it's community groups deciding what priorities to operate on, to give them the opportunity to use the information that we've generated to make decisions.

So who is the analyst? The analyst now becomes the decision maker, and the information is not filtered through any lens other than the lens that they want to use to make decisions.

DR. ROSENTHAL: I apologize. I misunderstood the question, I thought we were asking what in terms of the job description is an analyst working for an analytics shop. There you have a group working.

But in terms of the open sourcing, absolutely. The specialization is you don't need to require everyone to understand the taxonomy. One person can do that and then you can open it up and everyone else can build on that rather than hierarchical. And that tends to work pretty well, at least that's the charge as I understand it.

DR. CARR: I think your point is excellent that it gets into the hands of the decision makers. So if you get information, let's say by your community, and you say this doesn't resonate with my reality, you're going to be motivated to enhance, refine, or improve that data set so that you haven't missed anything. So it's kind of crowdsourcing or something, it's a lot of people weighing in on something.

DR. FRANCIS: What I want to actually push on, which I just don't understand, is in a way what's all the hullaballoo about, because if the only question on the table is you've got a public use data set and it's on the HHS website or it's over on the Google website, hey, I mean with the Google taxonomy, no problem. There are no more privacy issues raised than there were when it was on the HHS website.

Now, if I go to what Bruce said a minute ago, more data, and ask what that means without knowing it, or I look at something like the webinar summary from the 23rd where there's some tools to allow public access to do analytics behind a firewall, but you don't get the data because of confidentiality restrictions. Now if what we were doing was essentially taking that data and putting it on Google with the firewall allowing the analytics --

MS. QUEEN: I can never see the restricted data being made available publicly. The restricted access even has a lot of strange things. Certain SAS codes you can't run, there are little proc lists, there are a number of things they won't even let you do, and you can't download it.

DR. CARR: At the continuum of data there is a tremendous amount of data that as Leslie said, is hugely valuable, especially when juxtaposed with other available data that accounts for decades of learning. And they can still be the people who have the restricted access and expertise, et cetera, to use that other data.

DR. ROSENTHAL: Not to cloud the issue, and that's one thing for James to think about, but when we were taking those surveys of websites and what they were doing, if you were in the private world and looking at say credit card data, instead of a portal that allowed you to create a table very clunkily you'd have a professional version of one of these browsers up and running and be able to do that very quickly and very meaningfully and all the researchers could ask questions and get answers and do things intuitively.

So that's a separate subject for how do you get people out there but it is worth thinking about, like when I walk in with my glasses and see this is the generation of stuff we've spent a lot more money on in the commercial world with cost to put one of these things up behind a firewall for researchers like yourself to do, that's just something else to think about, but that's eminently feasible and being done really easily.

DR. CARR: We're at 4:30, CDC is going to do a presentation. Do we have 45 minutes now to kind of land the plane a little bit on what we'd like to put forward for consideration? As we as the reactor panel seeing this information, I just want to make sure we have the right language, Marjorie, that we've fulfilled the charge, that we're not making recommendations and we're not sending a letter, we are reacting to information that has been presented to us.

MS. GREENBERG: I am seeing a frown on Leah's face. Leah hasn't said anything, do you mind if I call on her? Your reaction to this discussion.

DR. VAUGHAN: I guess I am trying to understand what's most helpful for me to help you. One of the things that does kind of strike me, one of the things that might be helpful is to actually get you guys to drive data sets on a number of the platforms outside of your area of expertise and see what you understand and then have to feed it back to somebody whose area of expertise it was perhaps so that you have a sense of a range of tools which are far beyond Google which has its own challenges and problems around privacy, but so that you actually get a sense of what it is that can be done.

And there have been some wonderful examples in the non-profit sector of doing that, there have been some wonderful instances within the challenges of taking specific data sets and putting them forward, including a million parts. But to maybe not just have you drive it but have you drive some data, not scraping it, but drive it, and to just have you have the direct personal experience of going through it.

DR. CARR: I am hearing you say two things. One is that we shouldn't just focus on the two browsers that we saw today and we ought to broaden that to say we ought to leverage public browsers.

DR. VAUGHAN: There is that piece because at private companies there's a very strong open access movement both here and in the UK which is very much about not limiting it to only open access, but to ensure the integrity and continued to the extent we understand perpetual availability of the data sets to the public without them becoming proprietary. And there's been some excellent work done in both, there was just a conference last week. But my strongest impression is that rather than talk about it and make a recommendation we need to actually do it.

DR. CARR: Again, we are a reactor group, and the doing – I mean we're doing a bit of travelling through this today but I think that HHS has configured in a way now that there are individuals accountable for the utility of their data.

DR. ROSENTHAL: What are some other public data browsers that you're thinking of, and could you clarify the difference between driving and scraping for me just so I understand what you're talking about?

DR. VAUGHAN: There is a very large number of browsing products that are available, and I think my suggestion had to do with data domain experts using some of those visualization tools that are available right now to see what data looks like and feels like by using them for this public facing analysis. Well maybe to try it in your own area, but to try it outside of your area so you're perhaps maybe having more of the consumer experience.

DR. CARR: With regard to that part of your suggestion, you were saying for us to have an experience of what it was like – I mean Josh was like let's pull down this, let's pull down that –-

DR. VAUGHAN: That is not the same as you doing it.

DR. CARR: You doing it yourself?

DR. VAUGHNAN: Yes.

DR. CARR: I mean actually I did it during that last webinar, and so at the end of that experience each person will see what was hard, what was easy, what was unexpected. And then what? Are you saying that would influence which browser?

DR. VAUGHAN: I think – I am hearing some misunderstandings and I think the best way to, rather than just talk about it, is to just do it. I think the notion of doing outside of particular area of expertise and then reading that to someone in that domain whose expertise it is is, again, a little bit more of the consumer experience even though everybody here is an exquisite expert, to see what you're learning and understanding outside of your particular day-in and day-out.

DR. CARR: But again that information would then inform the direction of which browser.

DR. VAUGHAN: Not so much which browser but the ways in which they're useful, the ways in which they're limited, the ways in which there are still privacy issues, some real ones. It's not so much that the data sets aren't hugely refined to ensure privacy, it's that given that it's aggregated to a large area, there's still people who will find a specific address to have that meeting even though that's not statistically appropriate or policy appropriate, it still will happen, and it does happen.

And to a certain extent there is nothing you can do about that, but you should understand that that does happen, and to understand what some of those consequences might be. Vaccine myths are certainly a huge example of that. No matter of truth telling seems to quite absolve them.

DR. CARR: We have a lot of conversation at the meeting about the accountability across the food-chain or the supply chain or the lifecycle or whatever you want to call it of data. But I think it is true that we are now in a world where data is out there and responsible people use it and irresponsible people use it.

And so we talked about the stewardship that when you might be showing data that's alarming, disturbing, what impact, housing values and so on, and you have a responsibility not to suppress the data but to deliver it in a way that is constructive. But I'm not sure that that's where this group is. The fact that that could happen, we've already made the data public and I think the question that we're trying to address is how do we make it more usable.

And with that we're all on the same page that leveraging a browser might not be any of these but exploring whatever browsers are out there to enable a user to focus on the content, not the configuration, would create greater opportunity to learn from the data. So are you disagreeing with that?

DR. VAUGHAN: Not in general, I'm saying that I think in terms of strengths and limitations of many of these initiatives that I think you would come away with a better idea, better able to make those recommendations with an experience of a number of them, and just actually hands-on doing it.

DR. CARR: I agree with that. But I'm not sure that it's the role of this group that meets a couple of times a year to do that homework to advise HHS. I think that highlighting that the data can be manipulated more readily, not manipulated in a bad way, but moved around more readily if you had a browser. And to go to that point to then leave it to HHS whose day job is to make this data available.

DR. VAUGHAN: I think a lot of those things have already happened. I wish Patrick Remington was here today, I don't know if you've had the great pleasure of using his site, County Health Rankings, which I think does an exquisite job of doing just that already. Certainly American Academy of Family Physicians has been a leader in all of this.

I think there are a number of awesome really finely done examples of that already, I think there's great instances in some of the challenges already. Actually I did not read your cross-agencies, but to put forward specific data sets.

MS. QUEEN: Actually, I was thinking that one thing we probably should do, whether it's staff – I would want to look at what challenges that have been done that are ongoing that they're using to adjust data.

MR. QUINN: We could have ONC or someone at HHS do that.

DR. VAUGHAN: It's not just ONC. It's everybody.

MS. QUEEN: The other thing is I didn't have enough time and I was just playing around with the Google site for example trying to figure out what Census data, what BLS data, I was kind of in a hurry and I could see AHrq, so getting a better sense of what's already being used on these other platforms, I mean I would want to know, it's just something I want to know.

DR. VAUGHAN: In terms of the international data, the world bank is really just done an amazing turnabout in terms of how they open up their data and how they make it accessible, I certainly commend their initiative to your consideration.

MS. QUEEN: I think the HHS Chief Technology Officer in his office would be involved at least with all the challenges at the Datapalooza. I mean we have a source at least for the things we know about related to challenges and this other stuff, we could be looking ourselves at least some extent just to find out what's already being used.

DR. VAUGHAN: As to the specifics of Google, it's a multi-tiered process directly involved in the Hurricane Sandy initiative right now and how that data and that set are handled in other parts of the company. So there are many people I'm sure who would be delighted to talk with you at Google, but it's a fairly complex process and a large company and hard to generalize about any single thing.

MS. GREENBERG: I sort of have mixed feelings because I don't want to be a stick in the mud at all, and I want to encourage us to be thinking creatively and not spending a year studying things. At the same time I do think that even this working group, although it's formulated or established somewhat differently than the National Committee, does need to have sort of a deliberative process in which you gather information.

And if not doing the work yourself because I have to kind of agree with Justine that I'm not sure that that's – I wouldn't stop anybody playing around with it or whatever but I don't think that's solely the work of this group – so that if you just say well we heard about this, we think it'd be interesting, we really have to question how far that would go, I mean what kind of influence that would have.

On the other hand if you sort of write up this discussion, chop it around the entire working group because we don't have half of them here I think, maybe we'll be able to stop hearing about data sets but actually hear about some of these different activities that people are doing, kind of enrich the discussion, find out about challenges, some of these other things.

And then taking it – you can stop there – but taking it to the national committee and having a discussion there could result in something that people would actually listen to as opposed to – and I know what Jim was saying, the department might want to just bring someone to this group and say, what do you think. Fine, that's a reaction. But if you're the initiators I still say it gets very close to recommendations.

And if it isn't recommendations, if it's really in a low level, oh we just heard about this and we think it might be interesting, I don't know if it would have much impact. So I'm thinking both about whether we've been deliberative enough or understand enough to say something meaningful and then also how it would be the audience.

As Bill said, you have to think about the audience, how is it going to be received. I know that you want to come out with something and not just be talking heads, but I really am wondering what really – I'm trying to think about what would be most useful.

And I said if it's at a really informal just kind of suggestion level which is really all that I think would be appropriate given that you are part of a federal advisory committee, that it should not be making recommendations on its own without going to the full committee, then it seems to me that it might not be overly useful.

But you can try it and see I guess. We've heard about these activities that obviously have potential, they're of interest, we hear there are other activities, there are people who might be able to tell us more, some of the pitfalls, some of the challenges, some of the issues. I think it may be that there needs to be a little bit more deliberation frankly.

DR. CARR: I do want to get your feedback but I want to hear from the Working Group members. Why don't we start with Josh?

DR. ROSENTHAL: I am not so savvy in all of this, I'm just working from the charge and I just want to know what the charge is. I thought the charge was to actually make these specific suggestions. So if we're just hear to kind of review, then that's very different than what I had thought.

MS. GREENBERG: No. I didn't say that. What I was saying was there are two ways that this group can be used. One is that the Department can bring you some things and just ask for reactions. You can do that, you don't need to ask the National Committee on Vital and Health Statistics to approve your reactions, your reactions are your reactions, and your input, et cetera.

And the Department has an interest in using the group that way. And to learn really from you because you have expertise that the people in the department and even in this advisory committee don't necessarily have.

But if you're going to make recommendations, then just like any sub-committee, the sub-committee eon privacy and confidentiality just spent six months putting together recommendations related to a stewardship framework for community health data.

They had hearings, it was very deliberative, but they can't just make those recommendations to the department, they have to go through the full committee, it has to be deliberated through the full committee. So I am saying if this working group wants to make recommendations, it also has to do it through the full committee. That's the rules, that's the guidelines, otherwise you're basically violating FACA.

DR. CARR: We have reactions, we heard a number of things, and one reaction is that the data is hard to use. But I don't even want to speak for that. What I'd like to do is go around the room and ask each person to provide a reaction just from what we've seen so far, and then also what else should we be thinking of.

And we've heard about a couple of things today. We've talked a lot about big picture. But I think what Bruce was saying and I think actually what Leah was saying as well was actually getting to what is the demand side, what's a big population issue, what do we want to put together, maybe a data set that then users could use to develop. That's how I kind of think about it.

But let's start over. Josh, a reaction, and then to Marjorie's point, if we want to go in greater depth we can have other presentations, hearings, webinars, whatever else. So did you want to offer something, a reaction, and a next step?

DR. ROSENTHAL: Yes. I'd like to know more. The first set of slides I made available, this set of slides I will make available, I think it would be very helpful if we're talking about things at specific incidences of them kind of brought to the group. So I heard several things in your comments and I have to admit I was pretty confused. If there's a number of different public browsers, I want to make sure we're talking about the same thing because we're using terms.

DR. CARR: That will be for the discussion afterwards. What I want to do is just get a hearing around the room. I think you guys are seeing things from different perspectives and I get that.

DR. ROSENTHAL: I heard that security in a browser is a different risk than actually having the AHRQ file up on the website in its own browser, its own instance, and I don't agree with that from a technical concept. And I want to make sure that A) that was what was claimed and B) that's what we want to really dig into.

DR. CARR: Let's hold that for a discussion. Kenyon, thoughts, reactions on what we saw so far on the HHS data and thoughts about a next step?

MR. CROWLEY: Broadly, in terms of HHS data, I have been quite pleased with the different innovative programs. I mean I think the webinar on the behavioral health data was quite instructive. I mean typically that being sort of a data source that's so hard to get to and so hard to use, having found solutions then I think it's great – and the other sites as well.

But if you generate a lot of questions too, and I think some of the questions to think about, so as that new behavioral resource is being put forward and other resources that are made available through the health indicators warehouse and through HealthData.Gov, and as these are growing, I started to have some questions, including what can I answer with this data, how can I make use of it, how can I share the data with colleagues, where can I go to make more sense of the data.

And I started thinking about that and interestingly enough when I looked through the HCUP data they actually had a full set of these are the full questions you can answer with the data, these are the things you can do, so obviously there have been a lot of smart people thinking a lot about these issues.

Now I think in terms of the charge of looking sort of how can we make this more broadly available and create this learning environment, so I think the committee should continue to think sort of closely about how can we make sure that the learning data is captured as people are using the new data source and using existing data sources is made available to others, in that sense.

So maybe something as people are looking for data that if they do find the data they need, if they don't find the data they need, how do we know whether they do or they don't? And if they don't what mechanisms are in place to allow to point them to to allow everyone who is very familiar with the data, maybe within HHS, or more broadly within the community, to say this is your data research question, this is what you were trying to do, I know how to do this. I've used this, this is one way to do that.

But creating an infrastructure using sort of social media or sort of other architectures that have been used for other open source communities, that allows those questions to be moderated by a community and to have that learning and those answers within the community being fed back in to the community so that as others come to use the data they can more readily create the most value from it. So those were some things that struck me.

And as a reaction to the discussion today, I think that data browsers provide an additional channel for people to more readily use data. As it was discussed earlier, it's not for everybody but for many people who are visual learners and maybe not be experts in SAS or other techniques, having the ability to mix and match data and view those results somewhat instantaneously can – back to the analyst question – can sort of accelerate the analytical abilities of a much larger population base.

So essentially you're taking what was a public health analyst or other type of analyst with a very specific set of skills and training which is still very valuable but you're using technology and decision systems to enable those same types of results and findings to be accomplished by a different set and a wider set of people. So I think that's important, something that should continue to be looked at closely.

MR. QUINN: To bring this back – so the charge and the focus of the broader NCVHS effort is around the community as a learning system, using local help to improve local data. So this report talks about how we're missing. We heard from a variety of communities that are doing various things to improve health and use data in their communities, but by and large we're not building the infrastructure, the technical infrastructure but also the data analysis infrastructure in communities that's needed to improve health.

And the broader NCVHS is looking at solutions for that, strategies and recommendations for HHs to address that. I see this as directly involved with that, and ways of reducing the burden on local folks who don't have millions of dollars to spend, who don't have the time and effort and resources to reinvent the wheel.

And in this context I think what we could do is we can both understand what's going on today, so are any of these folks who we've talked to or others using resources like this or those provided by the government to address this? Would they like to, is it in the cards?

But also to understand what needs, what infrastructure could be provided centrally or on a distributed basis like this to address those needs and to inform the broader NCVHS on that to incorporate what we learn here through those recommendations for some of our other activities.

Is there, for example, not through the lens of subcommittees but there are other standards issues for combining data or making it available. Are there privacy issues, are there quality issues, to view it through that lens as well and to say our goal is to build the infrastructure to support communities as the learning system, can this get us closer to that or is it an unrelated thing and we should look at it as something else?

DR. GIBBONS: I am still getting my feet wet here trying to understand fully what's going on. But just on the little that I've heard today, sort of thinking about it, it strikes me that there are sort of three issues, and Matt touched on a couple of them.

One of them is the issue that the data is hard to use, and I guess we've been talking about that a little bit. The other is in terms of the same sort of thing, and I don't know the answer to this, perhaps you do. What is the sort of general knowledge in the country about the availability of the data sets just in general?

Because if that's widespread then it's a non-issue, but if it's very small that's different than the data is hard to use, and therefore the solution to that is education or something else. I'm just thinking about what types of problem might be preventing widespread use of this data. One could be that people just don't know the breadth of data that is available, whether it's hard to use or not.

And the last one, maybe it's getting a little bit on the demand side, but before even seeing this and hearing Matt my question was how are you defining community. Are you meaning some geographic area? It has many definitions.

But really what I'm getting at is what it is that we understand the community capacity to use this data, and do communities, however you define them, really have whatever level of capacity and desire? Because on the one sense we may be assuming that if we get the data to a certain level, then wow, 300 million people in this country are going to use it.

But maybe not, maybe it's only going to be a small subset of people who would ever have the interest, capacity, and desire to use this. I'm not saying that's bad or good but I'm saying that may be a reality. And if we have a better understanding of what's the universe of potential people really who would use this level of data then we could target or think more appropriately in terms of what is good and what is best, and where our efforts need to go in the future.

DR. FRANCIS: As I am trying to get my hands around this, as I understand it there are at least three different questions and I'm sure there are many more but there are three that I will outline that I think HHS is facing as it releases data.

One question is what data. Another question is in what format. And a third question is to whom. And some of what we've been talking about right now is in what format, and some of it is to whom, and for example when Susan answered my earlier question about the SAMSA data it was we're not changing the data that we release we're just changing the vehicle and maybe who gets it. But I'm not sure about that.

So what I see the role of this working group as is helping HHS to understand the questions, the options, the benefits, and risks of different options so that HHS makes as informed a decision as it can about what's a really important question.

I'm as deeply committed to having usable data out there I think as anyone out there in the room, but what I want to be really sure about is, and obviously from my questions and how I introduced myself, I'm not a tech person, but I want to feel comfortable that there isn't a reason why somebody who read in the headlines HHS gives its data to Google, I can say to them there aren't any risks here, that's a real misunderstanding.

But I think the role of this subcommittee or working group is to make sure that we first, at HHS, I'll go back to what Leah said, aren't acting under misconceptions because if we are something could happen that would be really scary and that's the worst case scenario that I don't think anybody wants.

MS. QUEEN: Can I just say one clarifying thing because I didn't meant to give the impression that it's always the same. The public-use data files are definitely not the same files as the restricted access files.

DR. SUAREZ: I was trying to get my arms around this, as well. I came up actually with a list of six. Some of them are overlapping, but let me go through those very quickly. I think the very first question is availability of data, what is the data available out there. The other one is a question about reliability, validity, and completeness of the data. Another dimension really is the limitations of the data, that some intrinsic limitations in terms of the characteristics of the data and then some external limitations like privacy related policy constraints.

So limitation to data is the third. Barriers to access, I think that's a very significant one. That might be technical, that might be policy. And then the last two are really sort of more once there is data, once people have data, it's really the tools to improve usability of the data and then the ability to aggregate that data. And the last item is really a mechanism to improve the analytical capabilities, sort of the data analytics and the resources that exist to this data analysis.

Now these organizations like ours, Kaiser's and others, are dealing with this big question of the big data issue and the whole data analytics. And I think the same type of challenges are going to apply to communities that are going to be trying to use in some way big data.

In this case virtualized perhaps but still dealing with the same challenges of the three V's as they are called, the velocity, the variety, and the volume of data. So those were my areas I guess that I think there could be some additional work that this workgroup should do.

MS. GREENBERG: I have already spoken, but I'm really concerned Josh. I don't want to sound like I'm putting a wet blanket at all on what you said, because I think we've all learned some things today, and I encourage that. I'm just trying to think through how you can be most helpful not only to the Department, but really more broadly to the public and to communities who, as we all said, need to use data and want to use data. I think a lot of the issues have been raised. So I think that's all I'll say right now.

DR. COHEN: Where to begin? I actually like Leslie's framework for what it is that we need to do. So the question for me is what are the next steps? Essentially the most specific question is what are the options for leasing HHS data at the county level to maximize its utility, to make it as visible as possible, and to let it breathe and have as much value as we can.

And the three questions that I think Leslie raised are, what are the data we're talking about releasing. We began a review but essentially that's pretty easy to answer, we can go dataset by dataset to decide what data and what level.

The second question is how, and I heard some suggestions from Josh today about channels for release, and I think Leah brought up some good points. We need to review other possible channels that exist because we don't want to reinvent the wheel and there may be other ways to get the data out there.

So I think the next step for us is to review what other mechanisms exist that are actively liberating the data. And the third question is to whom, and my answer is to as many people as possible and as many venues as possible and as many ways as possible as long as we're comfortable about the parameters of that release.

I think Chris raised a good question: what is community? I mean traditionally we've focused on geographic communities but certainly we can form the data to target other communities, whether they're race/ethnicity communities or gender or age specific or whatever affinity groups are, it's always possible to re-aggregate and reorganize the data in a comfortable way where we can address a variety of definitions of community.

So I guess next steps are identifying, I think we're moving along the path of identifying what data we're talking about, we need to really review what channels and options exist for releasing the data, and then I think the next steps would be I think really kicking the wheels, trying it out to see how it works, and then making suggestions or recommendations or providing advice to the agencies for what we think would be the best way to liberate their data. And it might require us as the national committee to make formal recommendations to the secretary about how to move this process along.

MS. GREENBERG: Let me just ask, when you said the goal is releasing county level data, are you talking about data beyond that which is currently being released?

DR. COHEN: I just wanted to have a specific target. The lowest level of aggregation that's viable at this point in time for most of the HHS data holdings that aren't address-specific like where homeless shelters are or where halfway houses are is at the county level. There might be other configurations like hospital market areas that people use, it could be MSAs, there are a variety of others, but basically the geographic configuration is the county level.

Some of the data are, as public use data, available at the county level. Other data, vitals for instance, is not available at the county level as public use data, it would require review to release that data. But I think that's possible. So, again, we would need to go data set by data set to see what data can be released. But that's a technicality, that's not an impediment.

MS. GREENBERG: It was my understanding that what this discussion was supposed to be about today was given that the Department has already made the decision to release a lot of data through HealthData.Gov, is there a way to make it more accessible and usable? Not even going to that next step which is also I think within the scope of this group to say are there data that the department has decided not to release or hasn't yet released that maybe could be or how could it be.

But let's just take the stuff that's already out there, and then there are some of the questions that you've all raised. Is it there but it's difficult to use? Are there tools that could make it easier to use? Are there approaches that would allow people to do more things with it?

I have a feeling that there is a very limited number of people who are going to be, even though there are a lot compared to how many used to, I mean we go out from NCHS, we go to universities and everything and say, oh you collect that data? I mean in schools of public health. So never assume anything I would say.

So that's a different question but I think we need to figure out, or you need to figure out, which questions you are asking. And so I would start with the data that the Department has already decided to release and take it from there.

DR. COHEN: We can discuss that in more detail later. My response is the indicators warehouse, HealthDAta.Gov, are good channels and they reach a target audience. It makes more sense to me to bring the data to where we know people are rather than to try to move people to where we put the data, and that's essentially -

MS. GREENBERG: That is what Josh was talking about.

DR. COHEN: That is exactly it. We need to maximize the visibility of the Department's data release efforts, but there are existing channels that people use to seek information and those are the ones that we should be promoting and providing the data through.

DR. CARR: Great.

DR. VAUGHAN: I think there have been a lot of really great comments. I love the idea of trying to understand who the community is and does it make a difference to them or how can we help it make a difference to them and make it more easy.

And to really expand the user-base of the public use data sets, let alone give more texture and bring a more finer grain of data. I think that certainly making it easier to visualize the data is one way, and there's lots of good choices in that.

But I also think that we miss a lot of opportunities honestly without going to the data analysts that exist and asking them if you could, if you weren't constrained, how would you want people to use your data. And I think we don't ask that question and there's a lot of wisdom there and a lot of really great ideas. So to understand that we have a lot more great ideas out there, that we should also be trying to use.

DR. GREEN: I want to make two sets of comments, one is about process and one is about the work. The process thing reminds me of earlier discussions in the last 36 hours. We have such uneven experiences with the technology that's now being used with already liberated data, and I really heard you saying one thing that could help us just work together in a better process if we had a shared understanding of how this goes, and then you proposed a tactic for doing that.

And this connected back to something you said, Justine, earlier in the last couple of days about the groups still in search of a common language so you can just talk to each other. This keeps surfacing in the process stuff. It looks to me like you're making good progress, but I understand what you meant better now than I used to, and it's still an issue.

My comments about the group go like this. It's going to be an echo of Matt and Chris. Where this anchors in the work of NCVHS is how can information technologies help communities to be learning health systems. I mean I think that's still the overarching question that runs through, that's the river running through it all. And I would remind you that that report that Susan wrote, one of the key things is we have a missing infrastructure, and what we're observing is that there are new infrastructures that are emergent, that are known to a few but not known to many, that they're sort of nascent and they're unclear.

We're noticing that often missing standards, they're sort of making up the standards as they go along, we're not sure whether they're good standards, are they useful standards, or they enhance the product, what's the deal? We just don't understand that much.

But you can see how it cuts across the groups of NCVHS pretty readily. Another thing, this stewardship framework, we do not have data stewards. And your conversation just called it out in spades. Who's the data steward for Hartford Connecticut? Particularly given the liberated data and the different people who are using it who we don't know who they are or what they're doing with it or what they're going to mesh it up with and that sort of stuff.

And this is where I'm probably just going to stake out a little personal territory. We're missing a workforce for this. The workforce we've got has a job on their hands to transition to the workforce we need. and that looks like very fertile territory to me in terms of advising the department.

I want to end with two metaphors that you might not find very useful. A lot of the work that I saw going on here today calls out the difference between knowing the map and knowing the terrain. You can map the terrain, and you can know everything about the map, and you can tell someone to go up here and turn right on road 13013, but if you know the terrain, you say when you get to this farmhouse that has a mailbox on it that looks like it should have been torn down 35 years ago, don't go any further. That's knowing the terrain. And we're further on in making the maps than we are at understanding the terrain.

Another one goes back to being a doctor again. My life as a doctor since the internet was invented has changed. People walk in all the time with data. They walk in with maps. They walk in with comparisons. They walk in with tables. You know what they don't have a clue about? They have no idea what they mean, particularly to them. To the particular community what these data means requires contextualization and local knowledge.

So if this group can help understand what the technology can and can't do, the full committee I think, is looking to help move that forward into stewardship frameworks of what's needed to make it work, what can be done to enable it so that the people aren't just reading the map but they're actually making a different in the terrain by doing it.

MS. QUEEN: I have a whole bunch of conflicting thoughts, but the first one is all of the agencies have health data leads rather than a health data initiative. So there is a source of like a listing of what's currently out there that has been made available from HHS.

So I've just been going to the HHS website and trying to figure it out from there. And the health data initiative also has an indication of the granularity of the publically available release data in terms of geographic, whether it's county – most of them aren't county – but there is an available listing that we could use to give us an idea of what's already out there.

I personally am going to be compelled to go to a couple of the things that we've looked at today to get more information on what is being used there and then also just checking with our chief technology officer regarding some of the current challenges or the ones that have already passed but what's been done so far.

DR. CARR: Our discovery continues. I think that everything that was said today, this is a time of enormous change with an asynchrony of available data information and skills of knowing how to use it. And I think that if it feels like it's hard to come up with a simple answer it's because of the enormity of it.

MS. QUEEN: I think we may also want to hear from NIH about their public/private partnership on their big data project since that is an initiative that was announced earlier this year, the Whitehouse initiative with big data. So they've managed to put their stuff in the cloud, we should find out.

DR. CARR: I guess it is harder than we thought to have five next steps, deliverables. But I think we still have a little bit of confusion I think, with regard to what are the boundaries of reaction versus recommendation, and it's just that it's new territory. So I think all that's been said today, and I hope that we can actually get the transcript, or Susan Kanaan is taking some notes, and get it out and try to frame it.

There are multiple directions here. Do we want to choose one direction, kind of go a little deeper on that and come out with something, while planning for the next direction? Clearly the intersection of the work of the full committee on communities and the opportunity that is in front of us here with the data liberation, is important to marry up and we can perhaps take this to the Executive Subcommittee. I know that we have some data, are we going to get a presentation at 4:30 about the CDC data? Is someone on the other line?

MR. BUELLER: Hello. Hi, this is Jim Bueller from CDC, I just joined the call a few minutes ago.

DR. CARR: Thank you for joining us, Jim. We have mapped out about a half hour, does that work for you? 4:30 to 5:00 to take us through it?

MR. BUELLER: Sure.

DR. CARR: Are there slides?

MR. BUELLER: No.

DR. CARR: The floor is yours.

Agenda Item: CDC Data

MR. BUELLER: Thank you. I'm Jim Bueller, I direct the public health surveillance informatics program at the CDC, and just briefly we run several of the large surveillance systems that are associated with the CDC, that notify the disease system which is based on reporting that occurs within states, and then states volunteer to share that with the CDC according to conditions that are by law reportable in states, and then the states agree on which subsets of those they'll all share with us. Those data are updated weekly.

We run the BioSense system, BioSense 2.0 which is a large syndromic system which keeps track of patterns of disease they're seeing in largely hospital emergency departments on a daily basis, and we run the behavioral risk factors surveillance system, which is a telephone survey that all states conduct, to track trends in a variety of health risk behaviors or other health care use behaviors. And again, we aggregate that at a national level, and the survey is conducted by states, we support them in doing that.

In addition we provide a variety of services and informatics services, that supports infrastructure of the public health surveillance. But broadly beyond that we are a place at CDC for addressing cross-cutting issues. So while we run three surveillance systems that's a small fraction of the over 100 or so different surveillance systems that are managed by CDC programs, and the vast majority of those are run by different programs.

In essence, if you think of your patient as a population within your jurisdiction, whether that is local or state or national level, the surveillance simply is what we do to keep track of the health of our patient or that population. And the way that that gets done are varied as the spectrum of things that are of concern to public health, from traditional infectious diseases to injuries, to chronic diseases to maternal and child health, occupational health, et cetera.

Whatever the types of issues that CDC programs are addressing there typically some form of surveillance to go with it, which means that they are run by experts in particular diseases or conditions, as part of individual programs and in gear to meet those needs.

And I thought I would just give an example of several different types of surveillance activities to give you a sense of the flavor of that. So for example with the notifiable disease surveillance, as I mentioned all states require that doctors or laboratories and others, report certain diseases. Most of those are infectious diseases, although they also use that authority for reporting to things like cancer registries or birth defects registries or maternal mortality. But for the national system, it's focused on infectious diseases. The states get together and agree which ones should be nationally notifiable, and then they agree to share that data with CDC. They come in weekly.

But in addition to that there are a variety of other systems that complement that. So for example a system called PulseNet, for certain infectious diseases the states may require or they give clinical laboratories the option of submitting isolates of certain bacterial infections to the state public health laboratories.

State public health laboratories then do a DNA fingerprint or electrophoresis that characterize in a very specific level the particular strains and then those data are reviewed at a state level and also shared with CDC. And that provides a way of finding disease outbreaks that are associated with products, typically food products, that are sold in multiple states that might be resulting in disease that at a given state or locality level, would never, at least initially, be recognized as enough of a change to be out of the ordinary but when you pull that data together from multiple states and look at the very highly specific DNA fingerprint, you start to see that something is unusual. So that's how a number of the fairly high profile multi-state outbreaks have been detected in recent years.

Another complementary system to that is something called the emerging infections program, which is an even more detailed project that working in about 10 different states where different jurisdictions either at a local or state level, are funded to do a very comprehensive effort to find all cases of certain infections within a specific geographic boundary.

Trained abstractors go in to perform a detailed record abstraction and get information much more detailed than would be possible through routine reports. Information on the antibiotics sensitivity, information on a specific anatomic site, information on other clinical aspects of the illness that would be beyond the scope of what's routinely collected.

And I could give a few other examples, but the point there is that for a given condition there may actually be a mosaic approach, one approach offering on a broad national level that provides for routine data, and then a supplemental approach that goes beyond that perhaps on a subset of cases, based on whether laboratory specimens are available or based on a specific project that's funded in a few localities to dig more deeply.

And I think you'll see that if you look across the number of diseases, that different surveillance systems operate at different levels. For example another example might be tracking the impact of influenza. Influenza is not necessarily a reportable condition, and yet states keep track of, during flu seasons, the number of visits for influenza like illness.

It's a relatively nonspecific definition that might capture people with some other diseases but that's for the practical need of tracking when has the flu season started, is it worse or more severe than other years. and then death attributed to influenza and pneumonia are tracked at the other level.

At the other level, efforts are made to collect specimens, not at a comprehensive level but at enough of a level to get a sense of whether circulating food strains line up with a vaccine in a given year, whether they're sensitive or resistant to different antiviral drugs, et cetera. You see a similar approach in the chronic disease arena where there may be a need for data that is updated less frequently, even on an annual basis, maybe decision to keep track from one -- to another.

And more and more you see that many systems don't involve the primary collection of data but draw on data that are collected by others. So for example diabetes surveillance tracks do diagnose of different outcomes of diabetes. And they draw on a number of the NCHS surveys or information systems, vital records, national health surveys, NHANES, et cetera, to do their mosaic.

I've been talking so far about information that is generated or arises because people have sought health care in one form or another. Another approach is to do survey. And as I mentioned, many programs at CDC draw on some of the NCHS surveys.

There are also some surveys that are sponsored directly by the other parts of CDC. The major risk(?)surveillance system for example, has been running for about 25 years, we fund the state to conduct it, each state conducts the survey themselves, using a standardized set of questions, and then they have the option of adding any of a number of different modules that could be employed to look at different issues. And they have the option of adding some questions of their own individual state interests as well.

But it differs from NCHS surveys which are drawn upon a national sampling frame. These surveys are conducted by each state, and many times they actually are able to look at sub state levels and they are then used extensively by states to manage their chronic disease prevention activities.

So there are lots of different approaches to surveillance, lots of different systems. They operate on a variety of timeframes, at different levels of detail, different levels of geographic coverage. You can collect a little bit of information about a lot of people or you can collect a lot of information about a few people, and you have a trade-off. But you want it to be timely or complete, whether you have a lot of money to spend or just a little money to spend, et cetera.

But perhaps I can just stop there and see if there are questions and make sure that I'm giving you the perspective that you're looking for.

DR. CARR: Thank you. Susan?

MS. QUEEN: Hi. This is Susan Queen. I was just wondering, are any of the surveillance data made available to the public.

MR. BUELLER: Yes, it varies. There are a variety of considerations. So for example, with the BRFSS, you can go (teleconference operator interruption). Some programs at CDC have the resources to invest in preparation of the public access database. It varies in terms of what's available. Part of the process of surveillance is providing information back to people so they're all producing reports in a variety of formats and ways, but they vary in terms of whether they are public access databases.

And within the public access databases we have to be mindful of what the level of detail that's provided to minimize the likelihood that an individual patient could be identified.

There's also we have to be respectful of whatever concerns the state may have and when they may prefer when people go to them and then to us and information about a particular state. There are some instances where there may be data use agreements within a state between a health department and a hospital that provides data. It is just a variety of considerations that go into that.

DR. COHEN: Does CDC, itself, provide de-identified individual level data for research or public use?

MR. BUELLER: In some instances, yes.

DR. COHEN: Would the IOI surveillance data be available?

MR. BUELLER: That is an example where we don't even get it at the individual level. We get from individual providers the percentage of patients that they're seeing that have IOI. It varies from program to program what level of access or availability they have.

I think one of the issues is it's really what are resources that it takes to prepare and document a public access database. That is going to vary from program to program. There are a fair amount that is available but certainly not all of it.

I mean there are also instances when a public access database may be insufficient for a particular researcher, and there are precedents of researchers working directly with an individual program within various agreements of what would or wouldn't be done with the data. I think you can appreciate the importance of sensitivities about the confidentiality that surrounds many, not all, systems.

DR. COHEN: Is there a summary by surveillance set of what's available and what variables and at what geographic level?

MR. BUELLER: We actually have an inventory of surveillance activities at CDC, and right now that's an internal resource but we're working on making it more broadly available. It's most immediately available also to people within state and local health departments, but we are working to make that available in surveillance activity.

And that does include information about what the URL on the CDC webpage is to go and get more information about that. But it's going to be highly variable from one system to another.

PARTICIPANT: Just one quick question. Just curious Jim, what are your most popular, most used data sets?

MR. BUELLER: That is a good question. We run something called WONDER which is Wide-ranging Online Data for Epidemiologic Research. There's a fair amount of NCVHS data there, there's any number of different systems from CDC, you can access BRFSS data there.

I think some of the notifiable disease data, there's census data there that you can use. I would venture a bet that probably the BRFSS is one of the most heavily used. If you ever want to dig into it just go to CDC.Gov/BFRSS and there's a tremendous amount of information that you can get. I guess BFRSS is one of them.

I know that AIDS has maintained a public access database. I know that from having worked in AIDS in the past. I don't work there now, but I would presume that that's maintained. But there was a lot of interest in that for a number of years.

Obviously the NCHS systems are very, very heavily used and I think it's fair to say the NCHS would say that they don't operate surveillance systems, but their data are used for surveillance and for many other purposes as well. They're really geared and built up to provide public access databases, so they're very heavily used.

DR. FRANCIS: I have only seen proposals for surveillance systems that are operated on a distributed query basis. I just wanted to ask whether the sort of standard model is that you collect the data or whether there are considerations or discussions of distributed query surveillance structures?

MR. BUELLER: So just in case others aren't familiar with that concept, the notion of a distributed system is that the data sits behind each owner's firewall and when you have a question you develop a query and the data owners are asked to hold the data again by format so that you can craft the query, go out, hit that against each of the data owners, and bring back aggregate report.

We tried something like that, there is actually a system called Distribute that was a grass roots system, developed in the 1990s. When H1N1 hit, we put a lot of effort into scaling that up. And basically rather than getting individual level data we're asking the states that had syndromic systems to provide aggregate counts among a handful of variables.

And that worked reasonably well up to a point, and I know that it parallels in a small way, many of the discussions I've heard at FDA around something that they've got called, Sentinel, or things like the HMO Research Network. It takes a lot of work to understand what you're getting when you can't get it at the individual level, and it's much less flexible, particularly if something happens and you need to query the data in a way that you hadn't done it before.

It's much easier if you have that individual level data, particularly when you see something and then the first thing that happens is you start to get five more questions, and to be able to answer those questions is much easier at an individual level.

But with that said, there is a lot of precedent, the CDC, many of these things like the HMO research network which is the forebearer of the FDA Sentinel was based on a project that really came out of CDC with the vaccine safety data language that's based on this notion of a distributed approach. That has been used, VSD, Vaccine Safety Datalink, is a very successful project, but it's like any approach, it has advantages and disadvantages.

I know it's one that ONC is very keen on supporting, the Office of National Coordinator for Health Information Technology, they've got a query health project and members of our staff have been involved in helping them think about that. So it's one approach, we do think about it, it has strengths and limitations.

DR. CARR: Thanks. That was very helpful, very informative, we appreciate it very much. And you're welcome to stay on the line, we're just going to have a couple of concluding comments from today's meeting. Thank you for joining us.

Agenda Item: Wrap up for Future Meeting Next Steps

DR. CARR: I want to try to pull together what we've covered today, in the last three hours, and put it out there for your consideration and ask for any suggestions or other issues.

I would say where we are today is that we can say that it is good that HHS has liberated the data. We all agree that that availability is tremendous opportunity. Second, that we have seen interesting examples from challenges from the Datapalooza, demonstrating new observations that can be drawn from the data alone, or from merging the data from multiple sets.

We also observe that the use of data, at least based on the hits on the website that we've seen today, the use seems modest and could be higher, and that the challenges toward higher use are three. One is knowledge of the data availability. The second may be the usability of the data, the ease of use of using it. And the third might also take into account formulation of priority issues that ought to be addressed that would drive someone to those data sets.

One other observation is that there is a very strong intersection between the issues we've been addressing in the working group and the full committee focus on empowering communities to access and use data, and we're going to make sure that everybody has a copy of this report.

But just to read briefly from the report and the executive summary of what, based on the hearings that we had, what the communities said that we needed. A key need was infrastructure to provide support, facilitate shared learning, and create economies of scale.

And specifically, they felt important components of the infrastructure include a privacy and security framework to guide communities in using local data. A standardized set of community health indicators. Training and technical assistance, to improve data access, management, and analysis methods and competency. Better data visualization tools and skills, something we talked about today. Support and external facilitation to strengthen local financial human resources, including those for coalition development. Guidance on achieving data informed approvement through effective leadership and mechanisms to enable communities to share knowledge and information, stay abreast of federal and state resources and activities.

And then in the section on envisioning the federal role, there are a number of recommendations. I'll just read six of them. NCVHS has identified ways in which the federal government can support the development and functioning of community base or community oriented learning systems.

One is facilitate and provide resources to strengthen communities' capacity to collect data. Drawing on health indicators warehouse, continue to identify and encourage adoption of standardized community health indicators.

Three, provide local communities with local data on environmental resource factors including economic housing, transportation, and education data that are routinely generated by state and federal entities. Four, promote development use of federal and state web-based query systems to provide small area data, easy analytics ,and visualization capabilities.

Five, expand technical assistance mentoring communities in survey design, data collection, data analysis, et cetera, and convene a summit of local communities to share what they're doing and enumerate a set of barriers that effect all communities working to improve local health.

So clearly there's a tremendous intersection about what we've been talking about today, how to address some of these issues, and these are the issues that we heard in the hearing. So I'd like to suggest, again, that when we meet again, whether it's in person or by phone, but we will certainly be meeting at the next February NCVHS – March 1st, okay.

And at that time I think in order for us to be a reactor panel, I think it might be helpful for two things: one is to get an update from HHS on what they are already contemplating, what they have considered and accomplished or considered and rejected, and why. And then actually to have some of those data folks from HHS come to our next meeting so they can have a dialogue. Because I think that's the way that we can communicate quickly as we did when Todd convened the committee and we just had a meeting, there were no recommendations, we gave reactions and went from there. So I think that's a data reactor kind of venue.

And I think some of the things that we've covered today, and we'll explore more what Lee was suggesting we could bring to that group. And then I think we need to probably marry up the work that could be done by this group and the work of the full committee, on empowering communities, as we said take a copy of this report because I think it very much addresses the issues we have.

So with that summary I'd like to invite any additional comments, suggestions, recommendations.

DR. COHEN: The one additional piece would be I'd like us to explore additional web access technologies to learn more about those that aren't necessarily HHS or government-oriented, that use health data or other vehicles that haven't used health data yet.

DR. CARR: Now, I think what I am going to do is confer with Marjorie and Jim – oh, Vicki.

DR. MAYS: I was just going to ask, is the committee interested in other vehicles besides the web? Because there are partnerships that people are doing with like Wal-Mart's and Walgreen's where they have video screens and they're looking for health messages. There is a big movement right now in the drug stores and they're looking to get the health information to it, so I just want to put it on your radar.

DR. CARR: My thought would be that we stay grounded in what our charge was, and I think that the prominent in that as I read it today and I read it before, is the feedback to HHS on what they can do to get this out. Well, I heard the charge is grounded in the data, the usability of the data, the knowledge about the data and kind of priority issues.

So what you suggest might be something that we take up more in the community data initiative. But I think for now we're still trying to get our hands around getting the feedback configured properly to HHS.

DR. VAUGHAN: Susan just whispered in my ear and it amplifies with what Vickie was just saying, but it's an excellent point. Looking at that as maybe a larger subset of what's called mHealth or mobile health, or kind of reaching folks with good health information, including data, these alternative systems.

DR. CARR: It is pushing out data?

DR. VAUGHAN: Yes. One of the most interesting, for me, is text for baby which pushes out pregnancy wellness data based on gestation. So that has been immensely successful, very low cost, and is being replicated across the agency and other instances. So it doesn't always have to be something that's expensive or fancy, sometimes it's just going to where the community is and what they can use and putting it in a framework that they can use it.

DR. GREEN: Justine, question about scope again, and staying in the charge. But where do devices like asthma inhalers with geospatial devices, do they fit into the charge or is it really just internet?

DR. CARR: I know Jim's priority is getting the feedback on the HHS data. I think that the asthma obelisk and so on, I mean obviously we're going to intersect, in fact the call right now about the Datapalooza is going on.

But I think this committee, even though we're asynchronous, we're not quite where we want to be, we're still in the learning curve, I think the amount of learning that we've had these last couple of sessions can inform. And perhaps we ought to be thinking about for the Datapalooza, back to Josh's point, of how do you incentivize specifically community data or something like that for that interception.

I think what I'm going to do, Susan Queen and I, I think we'll arrange a conference call and/or webinar, certainly before February, probably early January. But the other thing is we'll get the summary of this meeting out to everyone and invite input for the folks who were unable to be here today.

So I think with that we'll conclude and adjourn, and I thank you all for coming and wish you a safe travel home. Thank you.

(Whereupon, the meeting adjourned at 5:10)