[This Transcript is Unedited]
National Center for Health Statistics
3311 Toledo Road, Auditorium A
Hyattsville, MD 20782
DR. CARR: Good Morning. Welcome to NCVHS. The NCVHS Service is a statutory public advisory body to the Secretary of Health and Human Services in the area of health data and statistics. In this capacity, the Committee provides advice and assistance to the Department and serves as a forum for interaction with private sector groups on a variety of key health data issues. Within the NCVHS Charter there are a number of roles and responsibilities. The Subcommittee on Quality focused primarily on the first function outlined in the Charter which is to monitor the nations health data needs and current approaches to meeting those needs, identify emerging health data issues, including methodologies and technologies of information systems, data bases and networking that could improve the ability to meet those needs.
A second area of focus it to identify strategies and opportunities for evolution from single purpose, narrowly focused categorical health data collection strategies to more multi-purpose integrated shared data collection strategies.
So I would like to pause here and ask the members and guests to introduce themselves and then I will give an overview for the next two days. When you introduce yourselves, could you also add a line about your role in your organization or field as it relates to todays subject? Please also state if you have any conflicts.
I am Dr. Justine Carr, co-chair of the Quality Subcommittee and member of the Full committee. I am an internist and hematologist and chief medical officer at Caritas Health Care System, the largest community based health care delivery system in New England. We are in year two of our EHR implementation across six hospitals and 1200 physicians. Other than that, I have no conflicts.
DR. TANG: Paul Tang, chief medical information officer of Palo Alto Medical Foundation. We have had an EHR implemented for ten years and I am co-chair of this Committee on the Full Committee and no conflicts.
MR. QUINN: I am Matt Quinn from Agency for Health Care Research and Quality. I am staff to the Sub Committee.
DR. MIDDLETON: My name is Blackford Middleton. I am from Partners Health Care in the Brigham Young Womens Hospital. At Partners, I am the corporate director for Clinical Informatics Research and Development and I chair the Center for IT Leadership and in both roles have an abiding interest in quality measurement and the measurement of IT impact on quality. No conflict, member of the Population Sub Committee and the Quality Sub Committee.
MS. JACKSON: Debbie Jackson, National Center for Health Statistics, Committee staff.
DR. FITZMAURICE: Michael Fitzmaurice, Agency for Health Care Research and Quality, senior science advisor for Information Technology to the Director, liaison to the Full Committee and staff to the Subcommittee on Quality.
MR. REYNOLDS: Harry Reynolds, Blue Cross Blue Shield of North Carolina, Chair of NCVHS, a visitor to this Committee. I spend a lot of my time on Health IT policy and Health IT in general, both at my company in North Carolina and throughout the country.
DR. CARR: So the ability to measure our nations health care system and health will be crucial to addressing such national priorities as improving care coordination health statistics. Todays hearing is about meaningful measurement, the supply chain of functions of choosing and building a measure and the gaps that exist in the measurement of our four major national priorities including coordination of care, health care disparities, health care value and efficiency and population health.
The ability to measure key aspects of our health care system will be crucial to addressing such national priorities as improving care and reducing disparities, driving efficiency and value, and fostering population health. Too often, however, ease of measurement has taken precedent over measuring what matters. The availability of new data sources including electronic health records as well as the heavy reliance on measures of meaningful use to allocate distribution of funds under the HITECH provisions of ARRA, underscore the relevance of understanding supply chain for measure development and improvement.
So NCVHS Quality Subcommittee seeks to gain feedback from measure developers, endorsers, system developers and reporters to articulate the process whereby meaningful measures of quality are created, introduced to the field and refined. In addition, we seek input and recommendations on the state of meaningful measures in national priority areas.
So the goal of our hearing over the next two days is to address the following four questions. One is, How do we approach building meaningful measures? Two, What is the current process for developing measures and does it adequately address measure development for key national priorities and sub-populations? Three, How do we introduce new data sources, clinical data from EHRs, user generated data, et cetera and how do we exchange them for old measures based on administrative data? Four, How do we maintain and update measures and what are the health IT system implications?
So based on the testimony, the NCVHS Quality Subcommittee will summarize the findings, identify gaps and develop a set of written recommendations for consideration by the Full Committee for transmittal to the Secretary.
I am grateful to our speakers for sharing their time and thoughts with us. We are also very grateful to Matt for putting this all together behind the scenes. So I encourage the Subcommittee members to listen carefully to the speakers and engage in dialog that will bring meaningful recommendations on meaningful measures.
With that, I will turn to my co-chair, Dr. Paul Tang, to share his unique perspective. Since I read everyones bio, I will give yours. Paul is an internist and chief medical information officer at Palo Alto Medical Foundation, consulting associate professor of medicine at Stanford, vice-chair of the federal Health Information and Technology Policy Committee and chair of the Meaningful Use Group. He also chairs the National Quality Forums Health Information and Technology Expert Panel and is a member of NQF Standards Approval Committee.
DR. TANG: Thanks Justine. I thought what I would do is go into a little bit more detail on the HITECH context for our hearing today.
As everyone knows, the Recovery Act provides an estimated maybe up to $46 billion in incentives to accelerate the adoption of HIT, in particular EHRs. And there are four criteria to meet in order to earn the incentive. One is that you use a certified EHR. Two, that you use it in a meaningful manner. Three, that you exchange health information among EHRs and Four that you report clinical quality measures as opposed to quality measures that are defined based on administrative and claims data. So the HIT Policy Committee that was set up in the statute is to recommend to the national coordinator, criteria against which we would evaluate whether hospitals are eligible professionals would qualify.
So we have a number of options. One is that we could take a structural approach, meaning does the EHR have these features? Two, we could take a process approach; are you using these features? And three, we could have more of an outcomes oriented approach; are you getting any benefit from use of this technology?
Because one of the criteria says that you need to use a certified EHR, we left to the certification program to decide the structural approach. That is, are you using a product that can quality you for meaningful use? And we focused on the Meaningful Use Work Group of HIT Policy on two and three that is, process wise, are you using these features and more importantly, outcomes wise, are you getting any benefit from the use?
We focused on these clinical quality measures. We found that there is a paucity of clinical quality measures meaning most of the measures, and there are over 500 that are endorsed by NQF at this point, are defined using administrative or claims data. So in fact, they are not using clinical data that might come out of an EHR.
Another point is that most of the measures apply primarily to primary care and do not cover specialty care very well.
Third, although the statute asks us to measure quality and stratify by characteristics of an individual that would allow us to assess disparities in care, often times we do not collect that information or properly report on using that information.
Another problem is that there are national health priorities such as coordination of care, or assessing and improving population health, yet we have very few measures that focus on those aspects of care.
Finally, efficiency is very important, certainly in the context of health reform and there we also have a lack of quality measures or a way of assessing that. So in short, we lack meaningful measures. So that was sort of the motivation for having this Hearing and hearing from hearing from the very stakeholders in this quality measure supply chain. What is the current state of the practice? How do we encourage more measures to be developed that use clinical data from EHRs and how can we assess the things that are so important for health reform or for the current health priorities.
So that is sort of a setup for how we invited folks to come and testify on these matters.
While Helen is getting setup, let me introduce her. Dr. Burstin is the Senior Vice President Performance Measures at the National Quality Forum and I think all of us recognize that NQF is the endorser of quality measures. Currently they have, as I said, over 500 measures that are endorsed and I am sure Helen is going to cover the criteria by which measures are endorsed and how they are trying to encourage measures that pertain to the health priorities of the country. Prior to joining NQF, she was the Director of the Center for Primary Care and Prevention in Clinical Partnership at AHRQ and she oversaw the HIT portfolio which invested over $166 million dollars on research at the intersection of HIT and Quality.
I think most of us know Helen quite well and appreciate her taking the time to present to us. Thanks.
Agenda Item: Setting Priorities For Measurement
DR. BURSTIN: My pleasure. You guys were early. You are throwing me off. It is so unusual to start five minutes earlier than unusual -- apologies for the delay.
So I think I am actually doing two talks. I will talk about criteria for measurement and our industrial criteria second. I think what we want to do with setting this up is talk a little bit about the national priorities at a broad level. Where we would like the measurement field to go and as you have already seen, obviously, from the Meaningful Use documents that came out of the Policy Committee, a lot of emphasis already on at least several of the big national priorities. So I will just go through the process and leave a lot of time for questions.
So those of you who do not know NQF, in addition to endorsing national consensus standards, which has been a traditional part of our role for 10 years -- actually this is our 10th anniversary -- and publicly reporting on performance over the last couple of years, has added to our mission statement to the goal of setting national priorities and goals for the nation, thinking that if we had a set of aligned goals, we would perhaps make more progress than we do with the 500 measures, for example, that continue to grow each year.
And lastly, trying to think through the piece around how we would actually get these measures used and the fourth piece, and you will hear from Floyd Eisenberg, Senior V.P for Health IT this afternoon, is we really see ourselves also playing an important role as the bridge between the quality measurement community and the HIT community and thinking through EHRs can really be brought to bear to get us closer towards useful HIT based measures.
So about a year ago, NQF updated our evaluation criteria and the first one here is the one that I just want to highlight. I will talk more about this in the second session with David specifically focused on what makes a measure meaningful? But one of the important things we did was really make the case that not all measures are equal. We really want to get at measures that are, in fact, truly important to measure and report. Some I would say meaningful, or perhaps that term is used at the moment. But really trying to get at the fact that we only want to measure things for which we can actually make a significant improvement.
So what is the level of evidence for the measure that has to obviously be clear? Is there an opportunity for improvement? We do not want to be measuring things that are tapped out or fairly close to tapped out. We really want to be measuring things where the act of measurement can actually be an important force in terms of improvement. And that would either be in an absolute gap or a significant gap across the providers who are being measured by the measure.
And then lastly, is it related to a priority area and the MPP areas I will talk about in a moment are high impact areas of care. And the high impact area of care, we are getting far more structured on in the short term with some work we are doing under our HHS contract, which is evaluating all the top 20 conditions that are currently considered the highest priority for Medicare and aligning them across nine different criteria to begin understanding issues of cost, prevalence, impact, morbidity, mortality, complications, the whole gamut. So from our starting point, that has got to be an absolute and, in fact, this is an absolute criterion and if you do not pass this criterion the other NQF endorsement criteria do not even apply, we will just stop evaluating the measure.
So that was a lot of the basis of the relation to the national priorities. It is that same thinking of only measuring what is important. So I will come back to these other criteria in the next session.
We have been trying to move the field also towards higher performance and Paul knows this well from sitting on our ultimate approver of endorsed measures, our Consensus Approval Committee, that we have been trying to push the field away from very narrow process measures that feel like interim steps to an outcome, often very distal from the ultimate intermediate or final outcome and drive towards higher performance. Really what is more proximate, if it is a process measure, proximate to the ultimate outcome? And I will talk more about that.
Definitely a shift towards composites trying to think about a comprehensive view of measurement, measuring disparities in all we do. Paul mentioned that in his opening remarks and I will talk more about that towards the end.
Harmonizing measures across sites and providers - we continue to have measures that are not harmonized between the physician level or the hospital level, plan level and we really see that as an incredibly important role for NQF and the field in general.
Then promoting this concept of shared accountability, the right measures often times cannot be assigned to a given entity. For example, readmissions are perhaps the best example here. Hospitals who argue they cannot be held solely accountable for reducing readmissions and yet community providers will also say the same thing, I cannot be held accountable, I do not get any information from the hospital that my patient, was first of all was even there, which I can attest to for most of my patients or that even if I got it, I do not have adequate information for a really effective transition and hand off.
So measures that really get at what is the right measure for the population regardless of how it ultimately gets attributed or accountable to an individual provider has been an important focus for us as well as beginning to think across patient focused episodes. This is not necessarily episodes in terms of groupers of, you know, you stopped having billing for this particular event, but actually from a patients point of view the longitudinal view of where we are going.
And so we are increasingly moving towards seeing measurement being, in a sort of two dimensional framework, across these high priority conditions across episodes to get the most comprehensive view as well as across the national priorities and goals as I will mention.
These episodes are increasingly moving towards getting us closer to measuring outcomes, measures of appropriateness, and then in the next six months or so, also adding to that cost and resource use measures coupled with quality measures.
This is a schematic of one of the initial forays that we did into this episode framework of beginning to understand how, if we just focus on the way we have done measurement to date, all of our measures are in Phase II. They are all, for the most part, in the acute phase bubble. Did you get PCI within a certain period of time? -- Things like that, but we kind of miss the larger picture.
And so it begins with the concept of a population at risk, so the population health goal that has also been embraced by the Meaningful Use listings. A population at risk for whom an MI could have been prevented, and then moving across the concepts of, well you know there is an acute phase, a post acute phase, secondary prevention, but then also recognizing that not all patients will wind up equal at the end of an MI.
Some will have had an early intervention and wind up needing quality measures that would look quite different than a patient who winds up with congestive heart failure or for whom may have multiple co-morbidities that complicates it, in which case you would want to be sure to include measures that get at functional status, quality of life, advance care planning, appropriate for all, but especially important for the patients who have that trajectory, as well as certainly for the relatively healthy person at the end of the day -- a strong emphasis on those secondary prevention population health kind of oriented measures.
We would also like to begin seeing care that is longitudinally assessed. So a patient with an MI perhaps you would see their care in those first critical 30 days to get at that acute phase in acute rehab as well as perhaps looking one year out, to really begin getting a really robust picture of the care we are providing rather than these very narrow slices that we have been doing to date.
So that has been the background for the way we have been thinking about the conditions piece, but there has also been a strong emphasis on seeing the cacophony of hundreds and hundreds of measures is not getting us where we want to go. And so MPP served as the convener of a group called the National Priorities Partnership. And the logic here was that if we could agree that everybody would focus on the high leverage areas, where we think that this harmonization across the multiple groups, these effecter arms as we like to think of them, around common goals for improvement, could actually significantly and fundamentally drive improvement in a more rapid way.
And so the goal of this group was to establish national priorities and goals for public reporting, to focus measurement improvement efforts on those goals, and it was lead obviously quite ably, by Don Berwick from IHI and Peggy OKane from NCQA. There are now 32 leadership organizations who sit around the table.
The initial set of priorities and goals have been completed, as I will show you, and the next steps are also thinking about what they could collectively do across a set of drivers, of which measurement is truly just one driver.
We often talk about it because it is the easy one to kind of grasp. But in fact, regulation, accreditation, paid for performance, the various ways we could drive improvement, are what these groups are trying to do by coming together.
So as they set the national priorities, the goal was saying, what are those high impact areas? They focused on the areas for which we would achieve these four key aims: providing effective care, eliminating harm, removing waste, and eradicating disparities. And although disparities are not one of the six national priorities and goals, it is considered the fundamental cross-cutting area that we want to ensure across all the national parties and goals.
So let me just run through the six national priorities here. So the first is patient and family engagement, one that has also been included in the Health IT Policy Committee list of important areas to move forward on. And the idea here is we cannot make significant progress until the engaged patients and their families, in managing their own healthcare, are making better decisions about care.
And so the specific areas of focus, the goals underneath this priority area would be: patient experience of care assessment in all settings of care, with feedback to the providers of how they are doing. I think we have made significant progress on the measurement side here and have endorsed patient experience of the care measures in almost every setting.
The second is what is really involved in patient self management and some of the measures should really get at patients having the tools they need to better manage their care. It is a whole lot easier to manage your diabetes if your hypertension, if you in fact know what your readings were from your clinic and when you were seen. So huge opportunities there, I think, for a bi-directional flow of information to dramatically, hopefully, improve patient chronic care management.
And the last one that, I think, is going to be in some ways the most challenging, because the measurement field is probably at the earliest, is the idea of making sure every patient has an opportunity for shared decision making, before they make a decision for a treatment or a procedure.
And the field of decision science is still evolving into what that looks like. There are some very nice condition specific or procedure specific, ways of doing decision support for patients. But this is probably the area where I think we are going to see the greatest growth over the next couple of years because we just have so little to date.
The second one is population health, which so much of our measurement focus has been in the silos of our healthcare providers and entities and here it is really taking a more global view. How do we improve the health of the population? Another really important area, I think, that Health IT can just do dramatic things for, in a way we have not been able to do.
The three specific foci here are; improving healthy lifestyle behaviors, really focusing in on those behaviors that we know have a significant impact. Ensuring that all Americans get, at least as a starting point, all the evidence-based preventive services that are indicated for them by age, gender, and risk factors. And the third one, which is, I think, probably the most innovative and the one I find most exciting in terms of thinking about the potential for Health IT, is this concept of a community index. Such an index would allow us to assess the health of a community, which would likely be a composite of many different kinds of indicators of community health. But the kind of thing very few of us -- even as a practicing doc in a community health center for decades, I have never seen, for example, the health of a community that I serve in although I know it is pretty poor -- it would be very helpful to be able to get that.
And the ability, for example, of using geographic information systems, GIS and other indirect methods, to in fact even be able to target in and say it is this particular sub-part of this community for which A1Cs are alarmingly high, figure out what the issues are and that is what I think the capacity of bridging some of the community registry data, the population health data to the personal healthcare system. Quality measurement data is very exciting.
Safety, obviously one of the foundational elements of the healthcare system, has to be there. A lot of focus specifically here on healthcare associated infections. We have been trying to think about what that looks like across sites of care. The ability, for example, to have Health IT connected facilities here is especially exciting.
If you think about the example of surgical site infections and how hard it is to be able to do the 30 day assessment, or if you have a device, the one year assessment to see if there is an infection, can really be driven by this ability -- again focusing in on some of the key issues, like serious adverse events that we are beginning to look at, as well as mortality and then a broader view of the different causes of mortality.
Care coordination is probably the one, I think, could be most significantly impacted by a really interoperable Health IT system. Ensuring patients receive well coordinated care across all provider settings and levels of care, with specific foci here on ensuring that medication reconciliation can be done in a way that is logical at the appropriate transitions in care and not a huge burden without impact.
I am still the kind of doc who likes the brown bag at every visit filled with every pill at home because I still do not trust what I get most of the time. But hopefully the systems will catch up so that in fact the systems will allow us to do what I can only currently do with a -- actually they are almost always Target plastic bags now because so many patients get the $4.00 generic. So emptying them out all on the table - but boy to be able to do that in a way that is IT enabled and sharable across sites is really the goal here; preventing hospital readmissions and preventable ED visits of trying to think about how to use care better.
Palliative care is one, I think, that is probably the one, I think, may have been the most surprising for folks who saw the national priorities and goals. This was the idea of looking and ensuring that we have appropriate and compassionate care for all patients with life limiting illnesses. And that is not just end of life, that is not just hospice care and the end of life but patients for whom relief of physical symptoms, whether it is from COPD or dementia, are getting the appropriate care they need as well as help with the psychological, social and spiritual needs that patients and families will face.
And then finally, thinking about what is access to high quality palliative care and hospice services and how we ensure and measure the quality of those services.
Overuse is a critical one, I think, that all of us agree and anybody following the healthcare debate, it is hard not to focus in on this one - we want to ensure that we are eliminating waste, while ensuring the appropriate delivery of care - the delivery of appropriate care.
So there are specific areas of foci that the national price partnership identified including inappropriate medication use, unnecessary lab tests and the data is just resplendent with evidence of repeating laboratory tests, repeating diagnostic tests, because we cannot get access to what was done at another setting; again an opportunity, hopefully, for some of the IT to come to bear.
Unwarranted maternity care interventions, diagnostic procedures, other procedures, unnecessary consultations, preventable ED visits and hospitalizations, inappropriate non-palliative care services at the end of life, and then potentially harmful preventive services with no benefit.
And this is the classic example of the D-list from the U.S. Preventative Services Task Force that I used to oversee at ARHQ, where we know the risks exceed the benefits. These should be the easy ones to try, to at least as a starting point, move forward.
We do have a lot of work to do on convincing patients in the communities that more is not always better. Certainly not something I have ever convinced my mom of, not for lack of trying for years and years and years, who views if she comes home with several referrals and procedures to be followed up, that that is a really good thing.
Lastly, those are the six national priorities and goals. I also just want to at least emphasize, I think, it is really important, although disparities was not listed as I mentioned, as one of the six national priorities and goals - there is a huge opportunity for us especially in an IT enabled environment to ensure that disparities measurement is not an afterthought, is not something we do after the fact to see if there were disparities. But in fact, we should be routinely assessing our quality of care by race, ethnicity, language, and socio-economic status as part of routine measurement. And we are obviously exploring both the direct methods for collecting these data from patients in a way that is patient centered and effective, but also some of those indirect methods I mentioned earlier using GIS or coding.
And finally we understand we probably cannot stratify everything, but we at least want to ensure that measures for which we know there are known disparities get stratified.
So we have come up with an initial set of measures that we have classified as disparity sensitive in the ambulatory care setting, where a set of criteria were applied to say, if you are going to stratify any measures, at least start by stratifying these with a focus on the prevalence of the condition, the impact of the condition for the disparity population, the impact of the quality process - are there known interventions we can do to, in fact, reduce these disparities and the size of the quality gap?
Our plan moving forward is to, in fact, go through the entire portfolio of measures, at least the ones we think are important, to say which of these across all settings of care should always be stratified for disparities.
And so just putting it together very briefly, this is the last slide, just a visual for us of where we think we are going here. It is that same slide of the acute MI across an episode, but with the overlay of the cross-cutting national priorities here in gold.
And so you would, for example, assess those patient preferences at the acute phase of what they would like done. You want to ensure care coordination across the entire arrows that cross our various sites of care. You want to get at issues of overuse, in terms of cardiac imaging or procedures; you want to get at population health at the starting point to ensure that your population, who is potentially at risk or could be at risk, reduces that risk for an MI. You want to understand and ensure we have got the palliative care for the patients coping with end of life, or if nothing else, just coping with the need for relief of symptoms. And then you want to have safety, obviously, as a system property that goes across the entire thing.
So it is this vision of where we are hoping measurement will go, that allows us to get that comprehensive view across conditions and multiple conditions for many of our patients, which I think will be one of the challenges in a measurement way of trying to think about how you take these episodes across patients who have multiple conditions. And then finally, overlaying these national priorities and that is, at least our vision, of where we are hoping we can move forward in the field. And I will stop there and take questions. Thanks.
DR. CARR: Thank you, that was terrific. Could I start off with one question? In terms of socio-economic disparities, I am wondering if incorporated into that, is the factors that have to do with low income folks who cannot take time off from work. In fact, let me refer in particular to the Massachusetts experience, where health insurance is held by most folks, but that health insurance may have a very high deductible or co-pay. So have we found a way to think about the kinds of disparities that result from -- I want to get preventive care, but I cannot take time off, I cannot afford the co-pay, I have no one to babysit my kids, things like that?
DR. BURSTIN: It is an excellent question. I think some of this is the limitations of the dataset, if we have the capacity to include those kind of data and I think those are probably difficult to incorporate into some of our IT systems, for example. But patients ability to self report, some of this might be particularly important.
I think the other thing is that this is one of those areas where that might be an appropriate way to at least be able to pull out -- and I will talk about that in the next session -- understand the issues around exclusions. And there has been a lot of discomfort, for example, about excluding patients from the denominator of a measure for which there were financial issues.
This is a classic example for me, it is very difficult to get ARBs although I can get Ace inhibitors for $4.00 from Target for my community health center practice. Getting an ARB is pretty difficult and if somebody has a reaction to Aces I cannot get an ARB unless I try to go through the prescription assistance program. Should I have an exclusion, for example, that allows me to say I cannot get my patient this drug?
I think this is a philosophical issue and I think it is one of those issues where I think, from our perspective, the key here is transparency. So if you are going to be excluding patients like that, at the end of the day we should be able to see the percent of times I am excluding patients based on patients ability to pay, whatever the case may be.
So first of all, we can begin to understand where the issues really are and second of all, the last thing you want it to do is become sort of an easy way out. So I view myself as a safety net provider. When I see patients, I should be kind of going that extra mile to try to make sure I can get those things done. But we also want to have the transparency to allow it, so that if I can routinely not get my patients in for colonoscopys, which is quite difficult to do in D.C. if you are uninsured, there is at least a way to track it and see it over time.
DR. CARR: Right, yes. I think your point is well taken. Not necessarily excluding, but even - and actually we were speaking last night just about saying -- what is someones deductible. If you have a $50 deductible and you are in one cohort and you have a $3,000 deductible and you are in another cohort, that that represents two populations.
DR. TANG: Thanks, Helen for really an articulate description of the national priorities partnership and the way it is folding into the measurement developing is really very nice.
Do you have a sense of the timeline of getting from where we are with this big bulk measures to the kinds of measures that you are talking about? What is the timeline for the development and what is the timeline for adoption? How long will it take to get there?
DR. CARR: That is an excellent question. I tend to be a fairly impatient person, so I hope not very long. We are fortunate in that we did get a sizeable HHS contract that allows us to do some very wide-ranging projects. We have just launched one with the Steering Committee next week on outcomes across 20 conditions.
I do not know that we are going to get all the outcomes that we want, but we see it is something iterative that at least we can identify where those gaps are. What are the most important measures and kind of provide a menu of what needs to be developed going forward.
We are going to do the same thing shortly around resource use. I think some of the measures around care coordination are in development. I know you will hear more from some measure developers later, who have those in the pipeline, at CQA for example.
So I think we will fill out those national priorities within the next one to two years. I do not think it is going to be a very long period of time. I think the challenge is going to be if you overlay that with what we are trying to do in Health IT, it is not clear how many of those can be jump started, by developing them from scratch in an EHR enabled environment.
So we often have this concept of developing a measure and at least as you will hear later on from Floyd, we have been trying to think about how we then retool them to make them work in an IT enabled environment to get at issues of meaningful use for example. I think the challenge is going to be to have measure developers kind of reframe it in their mind and say stop, if you are going to build for the future here, build a measure built off of these interoperable HIT systems, even if we do not have them in our hands yet, so that we do not go through that secondary step of retooling. That, to me, is most exciting.
Somebody just told me about a very exciting safety measure and he said, Well the only problem is you can only do it if you have an EHR. I was like, great, bring it in. That is the way to go. We do not need to just wait and say, do this on pen and paper. Bring the right measures in and I think, hopefully, if the other incentives move us forward we will be able to do that.
But the capacity to pull in registry data and things like that, I think, especially you asked the question earlier about specialty measures, I think, one of the biggest incentives to move the specialty measures forward is going to be the capacity to pull in registry data, good clinical data that clinicians view as being critically important to understanding outcomes with the data from an EHR. If we can do that we can really move on specialty measures in a way that we are not going to be able to do quite as easily with the typical data that is within the EHRs we have now.
DR. FITZMAURICE: Helen, that was a great presentation that really gives a good grounding to what NQF is doing and why. I noticed an awful lot of the demands site, here are the priorities, here is what we consider important, not an awful lot on the supply site. Can these quality measures be supplied? What is the cost of producing them? Are the data available? Does it take a lot of physicians time, expensive time versus staff time? Is that a consideration in the quality measures that you choose to go forward with?
DR. BURSTIN: Absolutely. And actually it is a little strange, because there is a second presentation I will do shortly, on what makes a measure meaningful which would have been logically pulled together, and in there I will specifically talk about an online what is feasibility, for example. What are the bars for which we would set a measure to do in terms of how difficult it is to collect the cost of measurement, the feasibility of measurement, the capacity to pull it off of IT enabled systems?
But I think the other important consideration is the fact that there has not been a lot of dollars out there for measured development. So we can only go so far, we can help with some off the retooling, but there still is a gap in terms of some measure developers wanting to move forward with some of these newer, more complicated measures. Most outcomes require pretty significant risk adjustment, for example, that is not inexpensive to do in tests.
So I think the ability to pull that stream of funding in, which is part of some of these proposals going forward, would be very exciting.
DR. FITZMAURICE: I remember when NQF was first set up, it was set up as a consensus body a standards body so that, among other good features, a federal agency who wanted to adopt quality measures, could adopt them by reference. They did not have to go through and develop them themselves since it was a United States standard, could put them into regulations. Indeed, I would expect something like CMS maybe VA and DOD, to be very interested in that feature as part of the National Technology Transfer Act and OMBs has got good directors. We are all executive directors; we are all encouraged to do that. Do you see a lot of feedback from that in interest of CMS, DOD, VA for just that reason?
DR. BURSTIN: It is an excellent question. It is actually a huge part of what we do and it is particularly on the CMS side. We have not had a lot of interaction specifically with VA and DOD. But CMS, for example, routinely looks to NQF endorsement as a requirement to move forward, unless there is a compelling reason not to with a measure set.
DR. CARR: I think why dont we just move into our next section and then it is you again.
Agenda Item: What Makes a Measure Meaningful?
DR. BURSTIN: So just continuing on that theme, I will skip some things that may be duplicative. Their specific question was what makes a measure meaningful?
It is amazing how many layers of definition meaningful have taken on in the last couple of years. Specifically, I have already gone through this concept of trying to move towards where we are hoping measurement will move us to.
I think it is important to consider where we are. We have had a huge growth of measures, based on several important drivers. The need for measures for pay for performance, specifically at the individual physician and clinician level, disparity sensitive measures, patient experience of care measures, cross-cutting measures, but we still from our perspective from where we sit, have a couple of key questions. Do we have too many? Do we have too few? And are they the right measures? And I think that is a lot of what I talked about in the first presentation.
The availability of data sources for measurement becomes critical and then, obviously, this whole transition to EHRs, I am hoping will be transformative in the way we look at what is a meaningful measure.
Just to focus in the criteria a bit, I talked already about importance to measure and report and I will go through each of these in a bit more detail. The three other criteria that are especially important are scientific acceptability of the measurement properties, which is really about the measure itself. What is the reliability of the measure going forward? Usability, can the intended audiences understand and use the results for decision making? The ultimate be all and end all for NQF endorse measures, is to be publicly reported and used to make better decisions, something that we are certainly in a transition phase for at the moment.
And lastly, feasibility, can that measure be implemented, Mikes point earlier, without undue burden, capture it with electronic data or use EHRs to capture it?
So I will run through each of these because I think at least the lists of bullets are identified. Some of the key issues are already captured within these criteria. So just to compare the old and the new, the key issues that I mentioned are importance to measure and report is now a must pass criterion and then feasibility now has a much stronger emphasis on Health IT. And it is probably not a surprise that Paul Tang has his hand in developing these criteria as well for us. Then, lastly, the issue of usability.
So importance to measure and report, I mentioned some of this in the earlier talk, but essentially, is the juice worth the squeeze? Is the effort extended to produce these measures worth it because it allows us to get it measured and reporting in an important area in which improvement is possible?
The specific sub-criterias I mentioned is it related to a priority area? Is there evidence to support the focus and opportunity for improvement? And this, I just pulled this in, I have seen this certainly many times in the Meaningful Use slides from the Health IT Policy Committee, but this is really what we are attempting to do here with these new criteria of trying to move towards the outcomes piece. The advanced clinical processes here, I sort of view as the process measures most proximate towards the improved outcomes, as you will see in the way we think about how measures that are meaningful are developed.
So specifically, we would want to have evidence for each of these kinds of measures, and measures are very, very different so on an outcome measure, for example, you want to have evidence that that intermediate outcome, for example blood pressure control, leads to improved health or avoidance of harm.
On a process measure, as I mentioned earlier, we want to specifically know that that process measure is proximate enough to an outcome, that it actually has an impact on improving the desired outcome. And those can be intermediate outcomes. But we have had measures submitted to us, for example, that say, was the patient assessed as to whether they needed a flu vaccine? As opposed to, did the patient have a flu vaccine? So we no longer want to deal with the measures of assessment and things like that, unless they are closer to the ultimate end game of being able to track the outcome.
Structural measures continue to be very important and actually Blackford co-chaired a committee for us just about a year or year and a half ago, specifically thinking about Health IT structural measures. So what is the evidence that that structure ensures consistent delivery of effective processes or access to get to avoidance of harm or improved benefit.
And lastly, efficiency, an area of increasing emphasis for us, is thinking about the association between the measured resource use and the level of performance. So again, as I mentioned, we are only interested in resource use or cost as it is coupled to quality, so we can see the two of them together and get at the concept of efficiency.
I want to talk a moment about clinical guidelines because I think as we think about the evidence base here; so much of what we do is driven by the clinical guidelines. As much as we can complain about the state of quality measurement, in fact much of that is driven by the state of clinical guideline development.
The clinical guidelines are often not developed with quality measurement or clinical physician support, our ultimate goal of improvement in mind. There is often a lack of specificity, for example, we may know what services we should be doing, but there is no emphasis on the periodicity of that testing because often the evidence base is not as clear to make that determination.
There is lack of precise definitions. High risk patients are a classic example, if we cannot specify that sufficiently, a good quality measure cannot follow it. And then as a corollary there, the decision support rules cannot, in fact, pick up the right patients either. And then the lack of imprecise action terms, you know, may consider doing something does not lend itself well towards translating to a measure.
So the consideration of appropriateness, for example, makes this very difficult. It is hard to take the three inch tolm of an appropriateness guideline developed by some of the specialty societies, which may be elegant, beautiful work, but it is hard to distill that into what becomes a meaningful measure.
We also tend to have a focus on those measurable branch points. They may not be the most important, but they are the ones we can kind of grab the data on.
I used to have the pleasure of working for John Eisenberg for three years and John used to love to talk about the drunk looking for his keys under the lamplight, you know under the lamp post. And I think, unfortunately, a lot of the way we have been doing measure development has been focusing on, well I can see those data on the lamp post; lets make measures out of that. As opposed to saying, what is the most important thing to measure and then trying to find the data to do that? -- Again, obviously, a pretty big IT proponent. I think this is where we can hopefully get at the right data to get to the better measures.
And then, ultimately I think, as we think about standards on the quality measurement side, we equally need standards on the clinical guideline side to ensure we are getting clinical guidelines that are computable and useable for both measurement and improvement.
This is just a slide that Danny Rosenthal, who works with us who is a medical informaticist has put together, just making the case that all of what we do is a case of shared evidence. We all rely on the same evidence in terms of measurement and improvement. Guidelines really become the trunk of that tree and ultimately those clinical decision points are those branch points. Clinical decision support should be focusing in on the things that, if you can remind somebody to do something or encourage a different clinical path, you can improve outcomes. Ultimately, I think we are hoping that quality measures will move from those very early narrow branches where there is something measurable, towards those leaves at the outer point of getting towards outcomes.
Scientific acceptability of the measure of properties, we want to ensure that a good measure going forward should have precise specifications so that you could replicate it from site to site and do effective comparisons. You want to ensure that there is some level of testing.
This has been a challenge for us, as measures have moved into the field with such rapidity to meet the needs of various programs. We are not getting a lot of measures that have been adequately tested. We are now in the process of this year beginning to see all the testing data for that first year of measures that came in under our time limited endorsement option. We will begin to see how well they work going forward.
We need to see some demonstration of comparability of different data sources we are using. I do not think we are going to get this is the short term. I think one of the real challenges for us going forward, will be that as we have EHR enabled measures, it is not clear that they could be compared to measures off of administrative data or compared to measures off of chart review data.
And I am hoping some of the research work will help us begin to see, in fact, how often we have measures that allow that comparability across different data sources. We often think of the quality of the data sources, again some work Paul led with us in our Health IT Technical Expert Panel, of thinking about the quality of the data that is within the measure itself.
I think as we begin thinking about measures that may be based purely on administrative data, probably the quality of that data, certainly on the outpatient side, is not great. But as we move up that path towards getting at clinical data from an EHR or other clinical registries and moving up towards that, we are sort of assuming that chart review data and EHR data should be fairly similar. We do not expect a whole lot of comparability to measures built purely off of administrative data. But the jury is still out.
We want to make sure that the specifications should allow us to look for disparities, as I mentioned, should have risk adjustment, certainly if it is an outcome measure, and this issue of exclusions is the one I think is going to be a really critical issue for us moving forward.
We all know exclusions significantly increase the complexity and the measurement burden on what we do. It limits our ability to use electronic sources, we often times have measures, for example, especially at the hospital level, that require you to go to a chart, or pull an EKG, or pull a vital signs sheet to get at exclusions. And it is a real barrier towards getting at harmonization.
So we in our updated evaluation criteria, and this is an area where we have not been as strict as I think we need to be going forward, need evidence presented that the measurement results would be significantly distorted without the use of that exclusion.
We oftentimes have measures where there are 25 or 30 exclusions, and if you actually did a sensitivity analysis, one could argue probably 20 of them may increase the confidence of the provider being measured but, in fact, in terms of the actual impact on a result, are really quite small. And we need to have a better sense of what happens with those going forward.
We need to have this issue, we talked about earlier that Justine raised about SCS, but also patient preferences. We need to have transparency to understand when a patient preference is potentially the reason for an exclusion.
And we have very strongly made the case that the last thing we need to do, as we are trying to move towards measures that are more feasible, is require additional data sources beyond what is needed to do the actual measure itself to get at an exclusion, unless really without it, you would be significantly hurting the validity of the measure. This is a challenge for all of us.
And I think going forward, it is not clear what the best approach is going to be in an EHR, in fact, from talking to folks about, do you want to have a great deal of specificity on some of the clinical exclusions, for example, that are contraindications and embed those into the EHR? Or do you want to allow for more open ended exclusions with the ability to do back and audit those fields to figure out what were the most likely exclusions and what was appropriate?
Usability, I mentioned really from our perspective, requires evidence that those measure results would be both meaningful and understandable to intended audiences. And we really do, again as I mentioned earlier, focus on measures that are appropriate and usable for public reporting and in forming quality improvement.
So measures purely used for internal QI do not need to come through NQF. Those are great. But if they could not pass the four NQF criteria, we do not necessarily need to bring them through our process. They may be very meaningful, but they are not going to be meaningful for the comparisons between providers and public reporting.
And we have also now specifically honed in on this idea that we have to have measures that are harmonized, to ensure that we are adding measures that have distinct or additive value. This is probably the toughest to implement. But I think is the one, in some ways, that is the most important.
CMS supported our project for us about a year and a half ago on immunizations for flu and pneumococcal vaccine. Because we had this cacophony of measures across nursing homes, and home health and clinics and hospitals and, in fact, when we began looking at the project, we had 35 candidate measures for flu and pneumococcal vaccine.
One could immediately go - that makes no sense and so the idea is, we came up with a set of what we thought were the appropriate specifications that we think all the measures should align to. We recognize there may be different data sources, oasis or MDS, but at least the science should not be modified or different, based on the kind of measures you are using going forward.
Feasibility, obviously probably the most important from the perspective of where we are sitting today, in terms of thinking about getting at the data without undue burden, and as much as possible, trying to use data that are routinely generated as part of care. So again, EHRs, clinical registries, whatever the case may be, and we want to ensure that the required data elements that are there, and I will mention some of the work that we have been doing on our quality data sets shortly, are either in electronic sources or there is a least a credible near term path towards getting to electronic data collection.
We are probably expecting, I would guess, within the next one to two years depending on who you ask and I am not sure exactly what that curve will look like, but we are going to require specifications for EHRs at submission.
As well as the fact that all measures are now going through measure maintenance, thanks to our HHS contract, we were able to do a much more rigorous job on measure maintenance. And I suspect many of those measures that are currently endorsed, will not pass through measure maintenance going forward.
And the requirement will be, going forward I suspect again and probably around that same time period of one to two years that all the measures up for maintenance, will have to submit their EHR enabled specifications. Again the appetite for truck based measures except in very limited areas has really dropped considerably.
And the last issue from our perspective on feasibility is since we are now applying proprietary measurement systems to submit to NQF, one of the feasibility considerations is the cost associated with the use of that proprietary measure system.
This is Pauls slide, you have probably seen it, but again as we are thinking about the HIT enabling of this and we want to get at that shared data element, that sweet spot, as Paul likes to call it, between quality measures, decision support and clinical guidelines.
And so I will just end with just a couple of thoughts about some of the work we have been doing around this quality data set. And trying to make the case that from where we sit, the way to make measures more meaningful going forward, is to ensure there is harmonization and ensures we are getting at the right kind of clinical data. This would enable us to, for example, be able to very clearly always know the code set, the code list that is required. For example, if it is an active diagnosis of diabetes that is required, we will specifically indicate, that has got to come off a problem list -- that cannot come off an ICD9 coding in the outpatient setting.
So this is a real transition for us. And ultimately, as you will probably hear from Floyd this afternoon, we are envisioning this quality data set getting built into a measure authoring tool, that will allow measure developers to have again a publicly available measure authoring tool, that will allow them to immediately pull up the right code list. Pull up where you would find the data within an EHR so that again we are trying to build measures that are more consistent and get at what are the clinical areas of importance.
This was a list of some considerations we had come up with early on as part of our Health IT Expert Panel around what would make measures especially meaningful from the IT perspective. And we have already talked about national priorities or high impact, but we also talked about, does the measure reflect leverage of something really important in Health IT?
Are you using your system to get at what is most important? Can you, for example, pull aspirin off your medication list in your EHR? Something notoriously absent from most things because it is an over the counter drug, it is the most important preventive services indicator for adults in terms of impact and prevalence and yet most of the time we cannot get at it now. So if you leveraged your IT system to get at that, would that be another consideration of a meaningful measure?
Are you getting a more credible representation of quality because you are using good clinical data as opposed to assuming what comes out of administrative data for example would work adequately.
The next one here, I think, is more future tense but I was really pleased to see it in the 2013 and 2015 considerations for Meaningful Use, which is the measure of innovative patient centered data sources. So as patients begin to submit some of these data, do we get a different set of measures potentially through their input?
And lastly, is the measure sensitive to effective coordination of care and data sharing across sites and providers. We know just having the measure within your own setting is not sufficient if you really cannot share across settings of care.
And this was your work, again some of the HIT Policy Committee had done, but I think we are moving towards trying to think about what is a meaningful measure, I think is in evolution. And I think getting to the point where are have those advance care processes with decision support on the path to outcomes, we really see much of our work focusing in on getting us towards that path of the more meaningful measures at the end of the day. We have already talked about the goals.
And the last slide I wanted to show is a wonderful slide that the RWJF Aligning Courses for Quality group have put together about thinking about the comprehensive data that is needed to generate performance information. And I think if we really begin thinking about what makes a meaningful measure, we are going to have measures that are a whole lot more meaningful if we can, in fact, have these data streams that allow you to pull in the information from pharmacies, labs, EHRs, hospitals, registries and I would also add to this patient report, PHRs.
That is the way to get at that comprehensive integration of data, to get at what is truly a meaningful measure. With that I will stop and take questions? Or whatever you would like.
DR. CARR: Thanks, that was great. David, why dont you start now? David is the Archstone Foundation Chair and Professor of the David Geffen School of Medicine at the University of California, UCLA, and chair elect of the American Board of Internal Medicine Board of Directors.
DR. REUBEN: Let me introduce myself a little bit. I know a number of the people in the room and some I do not. I am Dave Reuben and I am geriatrician. All my patients are Medicare patients. My oldest is now 100 and my youngest is probably in her late 60s. So that is one of my pediatric patients. Most of my patients are in their 80s and 90s. Part of my day job is running the division of geriatrics at UCLA, but another part of my day job is working on both quality measurements, developing quality indicators and improving physician performance on quality indicators.
I have been part of the ACOV Team which won the Eisenberg Award last year, so I have a little bit of experience with this and I will draw in some lessons from that. My nights and weekends job, and where I am actually supposed to be right now, is in South Carolina because I am the chair elect of the American Board of Internal Medicine. And Cris Cassel apologizes for not being able to be here. She sends her regrets and as a public disclaimer, I will say, I am no Cris Cassel. But, glad to be here.
So I am going to talk a little bit about meaningfulness criteria. My take on this is a little bit different and I think complimentary to what you heard from Helen. We could call it the gospel according to Reuben. But I will talk a little bit about validity, importance and longevity of measures. Then I am going to talk a little bit about physicians organizations and how they relate to quality measurement and improvement. And then specifically, I am going to speak about efforts of the ABIM, with respect to board certification, rote learning into quality measures and unrelated quality measures. And then finally I will close with talking about how these board measures align with other efforts.
So in terms of validity, one of the things you always want to be concerned about with validity, is does the measure capture what it is intended to? So, for example, a pretty common measure is smoking cessation counseling. Indeed, if your patients are smoking, you want to tell them to do smoking cessation counseling.
So, I have two patients left that still smoke. All the other ones either have died or I have convinced them to stop smoking, so I have two that are left. And every time I see them, I tell them to stop smoking. I just tell them to stop smoking - if they are still smoking five or six cigarettes a day, stop. And sometimes I sit down there and I go with a very scripted routine of, lets set up a start date, here are the different patches and different ways of doing it, I will call you on certain dates, and that is more or less what is meant by smoking cessation counseling. Other times I say, you know, you ought to stop smoking. It is really important to your health. And that is it.
For smoking cessation counseling, those would be valued as equal in some senses. I would satisfy the measure either way. But it is not always the same thing. So does that measure really capture what was intended? What was intended by the measure, is me sitting down and spending about 10 minutes with a patient going over how to stop smoking.
The second, does measurement discriminate performance among providers? And here you have to decide, what is a reasonable sample to distinguish one physician from another physician in terms of how they are behaving? We at the American Board of Internal Medicine and I will show you some examples later, think that probably 25 is a reasonable sample size, with 25 diabetics, you can get a reasonably decent measure of somebody.
Then the question is whether you are measuring at the individual level, in other words, the individual provider level versus in the practice or the system, so some of the things that are measured, in terms of quality measures, are things that are very easily measured, because they are physician behavior. Some are actually really dependent upon how good the infrastructure of the practice is, so what are you really measuring, the individual provider or the system?
The third question about validity is does improvement on the measure result in the improved outcomes? So what you really want, and this is the way a kind of measurement works in general, is if you have a randomized clinical trial and it shows that a new therapy is effective -- Beta blockers for MI. And then you will have a professional organization say this is a guideline now. And then you will develop quality measures that reflect whether that guideline has been implemented. Then you will do a quality improvement step to improve the quality on that. And then, what you should see is better outcomes. So you are linking RCT data to better outcomes.
And we have a fair amount of information on this first part up to about here but the back translation going from clinical trial data to quality improvement to better outcome; sometimes we do not have those data. In fact, sometimes we can improve the process of care and not improve the outcomes of care. So this is the linkage that you really want to have but we always do not have especially this final link.
And finally let me just say a word about forced responses to move on to the next screen. In a lot of the electronic health records, to get to the next screen, you have to answer something. You cannot get out. You cannot escape. And there is some data actually on pain control where they have taken a look at quality of care in practices that had to have a forced response. So you had to click to get onto the next screen versus those that did not. And the quality was actually worse when you had to click to get to the next screen. So people who would do it would say, get this out of my way. I need to move on and do other things and they actually did not do the process.
Importance, how much impact does satisfying the quality measure have? So here you have two quality measures. One would be weighing the patient at every visit to see if they have weight loss. And the second would be providing nutrition counseling. They may be two quality indicators that are both valid, but they are probably not of equal importance. You could say that weighing a patient, which takes about 10 seconds, may not have a whole lot of importance compared to providing nutritional counseling which may take 20 to 30 minutes, but they may, in fact, be treated equally.
Related to this also is the value of individual measures versus composite scores and I think Helen was mentioning that the field is really moving much more towards composite scores. Some of the work we did with ACOV, we actually showed that improvement of quality resulted in better survival. This is really a nice thing to find. So we said, gee we can take it down and find out what were the real drivers of that. We dug and we dug and we found that the one thing that consistently showed better improvement in survival was pneumovax. Now rationally, it does not make any sense. But, in fact, it was that surrogate marker.
So trying to rely on a couple of markers and say if you move these, it is really going to make a difference. It is probably not the way to do it. You probably need a composite outcome.
How long does a measure remain current? Well one of my colleagues, Paul Chakel looked at this a number of years ago and said that based on the evolving science that a guideline lasts about three years. So if you build in quality measures and they are there forever, guess what? They may be old and may not be measuring the right thing. Witness the example of estrogen. One year it was a quality measure that if you did it, you got credit for it. The next year, it was a quality measure that if you did it, you got penalized. So in fact, the science moves on.
How long does it take to game the system? There are a lot of electronic health records now that are tailored towards quality measures and if you do not do anything, actually the quality measure is satisfied because the default goes to satisfying the quality measure. This is some scary stuff --that if you do not do something, you can actually satisfy it.
I am also very impressed by the marked capacity of the American free enterprise system to quickly respond to economic incentives. Indeed if it is being measured and if you are being paid for it, they will find a way to solve the problem.
So I am going to shift gears entirely here and talk about physician organizations. This is the view from 30,000 feet. The first are medical societies such as the American College of Physicians, the American College of Cardiology, the Society that I belong to is the American Geriatric Society and these are membership organizations.
And they are designed to advance the field, to advance the field of discipline that would help professions and also for the general good of the public. They promote education, they typically provide CME, they publish clinical guidelines, and they publish journals. They are clubs and they are somewhat parochial. They really represent their members.
The second is licensing boards and licensing boards are generally state based and they are required for practice. And they are very bureaucratic, they are just state regulated.
The third level is certifying boards. The overall certifying board governing body is the American Board of Medical Specialists. But within that, you see the American Board of Surgeons, the American Board of Family Medicine, the American Board of Internal Medicine, Orthopedics, there are about 27 or so boards here. These are not for profit. They are oversight organizations and they are not membership organizations.
You do not become a member of the American Board of Internal Medicine. You become a diplomat of the American Board of Internal Medicine. They do not accept support from any kind of pharmaceuticals or device companies and one of their responsibilities is to define the field. What is a cardiologist? When does a new specialty become a specialty? So they are in a sense, a very trusted agent.
So a little bit about the American Board of Medical Specialists, actually 24 boards -- The American Board of Internal Medicine is the largest of these. We count for about a third of all practicing physicians, many of the subspecialists, the cardiologist, and the endocrinologist, the rheumatologist, the geriatricians are all internists. An important fact is that about 85 percent of American physicians are board certified.
So whatever the boards do and whatever you have to do for maintenance of certification has a lot of teeth, a lot of reach.
The ABIMs mission is of the professions and for the public. We feel that we are accountable to the public. And we work through the profession to benefit the public.
So how do we improve quality? Improving quality is a very important aspect of the mission of the American Board of Internal Medicine. We have something that is called maintenance of certification. And basically, in lay terms, maintenance certification is keeping up. It is keeping up. It is to make sure that the physician you are seeing is as capable on the day you are seeing them as on the day they first got certified. This has been required since 1990.
So the majority of internists out there have to maintain their certification continuously. There are four parts. The first is maintaining a valid license.
The second is a process called self-evaluation. And the self-evaluation is basically, you would receive questions you would have to answer, you would look up the material, you see how you do, if you do not meet a certain threshold you have to take the test again. But it is a self study program. It is designed that you would evaluate where you are in terms of knowledge and improve it.
The third component is a written examination of knowledge. This is a test that is a high stakes examination. The way it currently works is that you drive or fly to a testing center, you have to strip naked essentially, you cannot bring a pencil in there, you cannot bring a piece of paper in there, you have to get fingerprinted to get into the room, you cannot bring anything with you. And you sit there for four to six hours and you take an exam on a computer - very high stakes. And you can pass or you can fail it. There is no in between.
The fourth is something that may be the most relevant to todays discussion, and that is evaluation of performance and practice. Looking at what you do in practice and improving it.
So board certification is not to be taken lightly. It is important. It is important on the quality landscape. A series of studies has been published on some of the value of being board certified versus not being board certified. Better outcomes and more reliable care, better quality of care, patients who are being treated high blood pressure. Actually the time since your certification correlates with worsen care. So if you were certified a long time ago and you had not maintained your certification, the quality of care you deliver for high blood pressure declines.
15 percent lower mortality, myocardial infarction, higher rates of preventive services, lower rates of mortality for colon resection, and fewer low birth weight babies. So by in large, maintenance certifications is a quality measure in itself for the physician. And those of you who have physicians, next time you go there make sure they are board certified.
So lets talk just very briefly about parts two and three which is the knowledge assessment and the examination. And these are very complimentary to the performance measures. First of all they test diagnostic acumen. So one of the things about quality measures, if you have the wrong diagnosis, if you do really well on the quality measures, it really does not make a difference. So if the diagnosis is acute MI, and you get all the quality measures right on acute MI guesses what? You probably have not helped the patient much.
So this is really important that you are making sure that you are working with the right diagnosis. The other things that the examination and the self-evaluation modules do are they test clinical judgment which is really important. And also allows us to really explore conservative management, things that should not be done maybe.
That said we believe very strongly that performance measures matter. So the ABMS as the parent organization requires all the boards to implement assessment to performance. And we will give you an example of the ABIMs practice improvement module, which is web based, uses NQF measures when available and it includes a rapid cycle PBSA to address areas.
Also included in these practice and improvement modules, are patient experiences. The voice of the patient is, we feel, exceptionally important. We also assess practice infrastructure essentially using the NCQA PPC to see what is available there. Also included in this are peer surveys of how a physician is doing? So these are very broad.
So when you think about the overall landscape here and where these fit in, if you go down quality measurement. That first of all national priorities are not set by the ABIM. They are set by governmental agencies the IOM, NQF this is the goal line. This is where we are headed. These are our priorities. Then guidelines are done frequently by medical societies, researchers, develop guidelines, volunteer health organizations like ADA develop guidelines. And they are operationalized and endorsed. NCQA, PCPI, think tanks like Rand and then NQF endorses them.
The assessments are developed to see whether those measures are being completed appropriately. NCQA does that and the boards do a lot of that as well.
Finally providing reports and feedback, NCQA does as well as the boards and finally reassessment to see whether improvement has occurred and to our knowledge only the boards are doing that.
I am going to show you kind of an interesting example of that. So this is the anatomy of a practice improvement module. It has three components to it. It has the patient survey, it has the practice survey and it has a medical record review, actually digging into the medical records. Most of this is done now manually through manual, hand-written records because that is what mostly are out there. But these can be adapted to using electronic health records. There is the performance report. The diplomat has to have an improvement plan which is a plan, do, study, and act. And then impact on what was learned. That is the basic principles of driving a practice improvement module.
So let me give you an example of one. This is the Diabetes PIM and a lot of work has been put into the Diabetes PIM. And I am going to show you kind of a testing sample of these. This was a sample of 957 physicians, most of whom were general internists. And to do this practice improvement module, a total of 20,000 patient charts, or roughly 21 patients per physicians and almost 19,000 patient surveys, about 20 per physician were required.
So one of the things that was done here was the composite score. How well the docs did on these diabetes measures. And the way that was done was to convene an expert panel - what we mean by an expert panel we mean internists, ophthalmologist, who actually are practicing docs and who are academic experts -- who review the performance on the measures.
How well did these docs do? We review the reliability of individual measures, selected clinical and patient measures, weighted the importance of these measures, reviewed the reliability and reproducibility of the composite measure, looked at the actual performance on the composite measure, defined a borderline candidate. So what is a borderline candidate? A borderline candidate is the person who is the cut point between a pass and a fail.
We do this all the time when we certify physicians on the examination. We use a process called the Angoff Process that asks the people who are on the committee or the expert panelists to say, what would the borderline candidate do on this? And finally, setting a standard for performance.
So this is an example that goes from NCQA data and these are measures for diabetes. An eye examination -- the criteria for passing this would be 60 percent of their patients, and they would get a pass for that if 60 percent of their patients had an eye examination. Similarly, a foot examination, 80 percent of their patients would have it. And then these have been weighed in terms of how many points you get based on how important they are. So there are 100 points possible and those are the measures.
So here is actually some data, using the data on those 957 physicians and their patients. And this shows both the mean and the reliability of the abstraction, but the mean score with 58 percent of them, had eye examinations done for 60 percent of their patients or more. And you see the overall clinical measure score was 73 percent. And this is simply going through the NCQA types of data.
And you see how they did. They did pretty well on some, like smoking cessation, of course you do not know what they said was smoking cessation. But they did not do so well on others like the foot examination.
So the reliability, how that was done? No, I am not going to invent something. Rebecca Lefner(?) can tell you about that.
PARTICIPANT: (Comment off mic.)
DR. REUBEN: I cannot tell you the details of the reliability measure, I am sorry.
So here is where the really interesting stuff comes. Those were simply using the NQA, how they would score it. Here they used the expert panel to actually do a couple of things. Here is to set the criteria for the minimum acceptable candidate. What would you expect the minimal competent internist to be able to perform? And on an ophthalmology exam, 28 percent would be the pass rate essentially.
And then they weighed the relative importance of that, which are a little different than the NCQAs, but they still add up to about 100. So this is the minimal threshold. So that if you overall score on your diabetes patients with less than 47 percent, you would fail this threshold. You would be below the minimally acceptable candidate.
And here is how these 957 physicians scored. So you will see that the mean score is about 66 percent here. And that the standard, the minimum threshold, was 47. You have these physicians here who scored below what we considered minimally acceptable. So 89 percent were considered competent and about 11 percent were considered not to be competent. And so this tells us, even with a self-abstracted instrument, that you can differentiate physicians who are doing pretty well with those who do not do well.
And guess what? They looked at these physicians, who we like to call the bottom feeders; these are the people who are not doing so well. And they looked at their exam scores and they looked at everything else that is measures in maintenance of certification, and they do not do very well there. So this tells us, this is a population that needs to be focused on.
So this is just the Diabetes PIM, the most popular ones are diabetes, preventive cardiology, and hypertension but there are a whole slew of these PIMs. And other ones that are under development. So just about any condition, any specialty, there is a PIM for.
And PIMs make a difference. There are 11 articles published in print, that show the validity of these and they have clinical meaning. I will show you a couple of them.
Five studies, including two controlled trials, have demonstrated positive changes in care. And guess what? The re-measurement aspect, the one we talked about there before, is that the PIMs make a difference. If you get your data back and you develop a plan to improve it and then you re-measure, these are for hypertension, but this is the mean 28 percent improvement when they re-measured -- for blood pressure or lipid control, medication adherence, non-pharmacological treatment, 50 percent improvement. So, in fact, this makes a difference.
So what about the docs? How do they respond to the PIMs? The experience of over 5,000 physicians, 73 percent with completed PIMs, says it changed their practice. 82 percent would recommend it to a colleague. Now it does take time. Let me just tell you, it takes time to do a PIM. And doing hand abstractions of your records is painful to say the least. But in fact, the docs are doing them. If you would like to test drive a PIM, the slide here has the demo site where you can walk yourself through it.
So finally, I would like to end a little bit with board alignment efforts. How are we working with other organizations? How do you change behavior? You can regulate, you can use economic forces, or you can rely on professionalism. And the professionalism is where ABIM comes into place.
So how have we aligned? We have aligned with health plans. Names of certification have been integrated into some reward and recognition program. So, for example, Aetna, Cigna, a number of the health insurance companies, if you are board certified, they recognize you as being so. You are on their preferred list; sometimes it actually includes increased payment.
The second are Bridges to Excellence programs. What the Bridges to Excellence program has done, is take data from our PIMs, and we are working with them in terms of providing some help, but they are using that to provide bonuses to physicians for better performance, using the PIM platform.
Alignment with other quality improvement efforts - now let us just say that you have a great electronic health record and you are generating these kind of data all the time and you are giving feedback to docs, there is something called an Improved Quality Improvement Pathway, where we can actually certify a healthcare organization, like a Kaiser Permanente, to do this themselves.
And in fact, for Mayo Clinic, we have given them; essentially, a five year blanket to continue their quality improvement organizations and those will be recognized for Part 4. So the idea here is that, if an organization here is really stepping up to the plate with this and doing it well, we are not going to force them into this widget where we are going to make sure that, in fact, those standards are being met.
With the public sector, we have been very aligned with PQRI, the board modules function registry, and in the Senate Finance Bill, the participation and maintenance of certification is currently included as a pathway to PQRI. And we strongly believe that it should remain there -- this is a way of really putting some teeth into it.
Discussions are underway currently about alignment with Meaningful Use. Once again, we believe, as Helen was alluding to earlier, EHRs that are going to be marketed need to have the components of maintenance and certification in there with the measures that are going to be used. Because what we have to do and this is critically important for physicians, is to reduce the redundancy. We cannot have them doing 50 different things to get credit for 50 different stakeholders.
So those are some people who have, including the VA System, they can do these practice improvement modules of data collection component in just a few minutes. Because those measures are in the electronic record and can be spit out so they can see what they are doing.
Modifying and building new MOC assessment tools that align with and support meaningful use goals -- the MOCs are always a work in progress.
So just to sum up, where ABIM is and where the boards are - they are aligned with where the quality field is headed. The efforts at ABIM are complimentary; they are consistent with everything that you will hear from NQF or NCQA. We are all in the same direction.
The board requirement for maintenance certification, reaching those 85 percent of physicians which is going to even be higher, engages physicians in improving quality of care. We have this hook, a very powerful hook to physicians, to get them to participate. So we are really part of the solution here.
Names of certification and other tools are comprehensive. They include performance measures, but more, the judgment, and the diagnostic acumen. We have data now showing that PIMs change physician behavior for the better. They are readily adoptable and not too burdensome. I am going to say not too burdensome - they are somewhat burdensome at this point, but in the future, hopefully, they are not burdensome at all.
And finally, public and private payers can leverage this existing, well regarded infrastructure to align QI efforts and accelerate improvement. We want to work with other organizations to make this work. So I am going to stop here and ask for any questions. Just do not ask me about the reliability.
DR. CARR: Thank you that was just very, very thought provoking, very interesting, very great work, so I think there are a number of questions. Mike, do you want to start?
DR. FITZMAURICE: A very good presentation and thank you for coming. I really got a lot out of it. The first one is -- are PIMs equivalent to board certification or are they separate from board certification?
DR. REUBEN: They are a component of board certification. So there are four components of maintenance and certification. The PIMs are one, the test is another maintenance of licensure, and the fourth is a self-assessment.
DR. FITZMAURICE: You mentioned 85 percent of physicians are board certified and so immediately it comes into my mind, why arent the other 15 percent? And when I look at docs, I know I can go online and find out if they are board certified. So why arent the other 15 percent of the docs board certified?
DR. REUBEN: So there are a couple of reasons. One, are there some that are not eligible to be board certified? A lot of the international medical graduates cannot sit for board certification. Others not so much now, but in the past in the 50s and 60s and early 70s, did not think it was very important, and never got certified. And then there is that last category of people who wanted to be certified (actually two other categories) and could not pass the exam - just could not get certified. And then there is another category and that is the lapse certification. So as I said from 1990 on, and in family medicine this has been since the inception of the field, is that you have to maintain your certification and if you do not you lapse and you become uncertified.
DR. FITZMAURICE: Did I see that a physician has to be recertified every four years now?
DR. REUBEN: Well that is a really good question. It varies by board and where the field is moving, is instead of saying that you have to re-certify every ten years, every five years, every seven years it is really what they are calling continuous maintenance certification. So you have to do something all the time. So, for example, you would have to do a PIM every two years. You would have to do a self-assessment piece every two years. But you might not have to do the exam but once every ten years.
DR. FITZMAURICE: So if you do a PIM, then you have got something like an automatic reminder. I was low on this from last year, so every time one of these things comes up, have a reminder somewhere to let me do this so I get a good score the next time. Plus, it is better for the patient.
DR. REUBEN: Yes. And the way I would think about this, is that every couple of years I would do a different PIM. So for a couple of years, I am going to work on my asthma treatment and then I will work on my heart failure treatment, because you are not only making a quick fix, you are making a permanent fix. You are doing this PDSA cycle to change how you approach these patients in general. So if you do five over ten years, you have got five big conditions that you have addressed.
DR. FITZMAURICE: Thank you.
DR. CARR: Just one question. Do you have an assessment of what you said that 85 percent are certified, and that includes those that are grandfathered in, prior to 1990? Do you know how many there are in the group that are grandfathered in?
DR. REUBEN: I think it is 25 or 30 percent. Yes, 30 percent.
DR. CARR: I think that is powerful because basically it lowers that number down significantly.
DR. MIDDLETON: Thank you, David, Blackford Middleton from Boston Brigham Womens Hospital. I guess the one sort of obvious question on the table is, if we look at certification, 85 percent being certified, and then we look at Beth McGlynns data, suggesting 54.6 percent of the time, we are not actually doing what we think we know we should be doing. So where is the disconnect? If we are certifying, getting up to snuff on the knowledge base, but it is not being applied, how do we address that disconnect?
DR. REUBEN: Well, essentially, that is where I spend most of my time. The same thing if you take a look at the ACO table, the exact number for older people. And if you look at geriatric conditions like osteoporosis, and falls and incontinence, it is even worse. So why does it happen? There are a variety of reasons. One is knowledge but many times it is not knowledge. It is not knowledge. It is systems and time.
So there was a study that was published five or six years ago by one of the family docs, and they said if you followed guidelines for ten conditions, just followed guidelines, and you did not do anything for anybody else, it would take you 10.2 hours every day just to meet those guidelines for ten conditions. And that does not include what the patient came in for. So that tells us a lot -- if you are a primary care doc, then you are on this continuous treadmill.
So what we have to do, we have to keep physicians knowledge up but even more so, you have to make it easier for them to do the right thing. And part of that are actually better systems. It may be electronic health record, its team care, it is delegation, and there are many, many solutions to that.
But the PIMs really get at that. The PIMS get at how you are going to change your practice, not just how you are going to prove your knowledge? Because knowledge generally is not the issue, it is generally being able to implement appropriate behaviors.
DR. MIDDLETON: I think you are right on target. I guess the challenge is, that when we think about kind of overcoming the inertia of clinical practice in the heat of the battle, and you are acknowledging it may not be a knowledge based issue, how do we actually address then the clinical requirements for the physician to be a knowledge broker, more than a knowledge manager if you will. So on behalf of the care team, on behalf of the patient, he or she becomes more expert in processes of information management, knowledge access use and application, as opposed to the repository of knowledge. Where will a certification go to address those kinds of issues?
DR. REUBEN: Yes, that is a terrific question. And I have to tell you, based on my own practice, we have a group practice, there are some docs in our practice who are really great docs but I will never be able to teach them about systems. They are never going to be leaders in system redesign. And I think what is going to happen is that there are going to be docs who are good at that, who are going to be leaders within groups to do redesigning of systems and the others are going to be just good docs and they are going to follow these systems. But you cannot expect every doc to be a system change leader.
DR. MIDDLETON: Maybe just this last one, if I may, you know, I think in this line of reasoning, you have to think about what are we measuring with quality measures and certification? You eluded to this I think appropriately in your talk, but I think sometimes about the concept of sort of attributable qualities across a care team, borrowing from an attributable risk in epidemiology, how would you actually say well, this internist versus this nurse versus this care manager versus this sister, brother, mom, dad, cousin at home, is actually contributing to the diabetes outcomes of interest? It strikes me that, in a way, we have kind of a myopic view in the traditional, paternalistic medical model. We are assessing the physician component, but perhaps ignoring, currently, the rest of these components.
DR. REUBEN: I think you are absolutely right. I think the way we have to think about this is working backwards. So working backwards is -- what is the outcome that you want? You want the process to be done, you want the better care and how you get there. And it is both. You have to have a competent physician working in a well-functioning practice.
Sometimes you have a really good doc working in a system that is not so good. So the really good doc can offset some of the deficiencies. And sometimes you have a really good system that can offset some of the deficiencies in the doc. So you have this kind of wiggle room here. But when you get to the extremes, when you have a totally dysfunctional healthcare system or practice, or you have a really bad doc, then you have got huge problems. Then this doc has got to be either remediated or eliminated or this system has got to be scratched. But within that kind of a little bit out of range, you can compensate one way or the other.
DR. FITZMAURICE: Thank you.
MR. REYNOLDS: Now this is for both of you. Excellent, I am not a physician and so I learned an incredible amount this morning, so thank you. But as you know, we were involved in a Meaningful Use Hearing and then I am sitting here with a document in front of me, the latest one, and I look out at 2015 and it talks about clinical outcome measures, efficiency measures and safety. And then I hear these presentations on measures and then you have somebody who has got to set them as a group.
In North Carolina right now we are working on all of this obviously, and about 60 percent of our small docs do not have electronic health records, and so I am sitting here thinking about them as I listen to this whole ecosystem. So you go to accepted use and both of you mentioned understanding, people have to understand it. And that is everybody involved.
And then it has got to get in their work flow and it has got to get in their capture, it has got to get in their system that they are going to buy, which, oh by the way, we are talking about where we are going to be in 2015 and they may have to buy that now.
And then you talked about how measures might not last more than three years. Now again, that was just a statement, that is not an indictment or anything, it is just a statement. And so you find yourself in a situation that the measures are extremely important, but also how we move this entire environment to a place, and as we make recommendations and other things, it is almost like you have a structure of measures so that the right information could be captured this year and then next year and if one goes away, another one comes in, so that you are not chasing a ghost, literally, in being an implementer of systems.
I see a lot of ghosts flying around. Because about the time I figure that that would be important, what you are saying is that this might be important. And the way to grab it might be important and this percentage might be this or that. And so help some of us that are not as attuned to this as you, how are you guys going to be, when we think about really meaningful measures?
How do we build an ecosystem, that whatever you decide is a meaningful measure can take it and use it and use it not in ten years and not use it in five years. And oh, I got involved in this and got my incentive yesterday, but oops, it does not work quite so nicely tomorrow. A long winded question, but it concerns issues.
DR. BURSTIN: Incredibly important question though. I think two responses on the big picture level. I think one of the keys issues is that I do not think it is so much of an issue on embedding the measures into electronic health records, as it is embedding the key data pieces, the key data elements and data types. The key data elements and data types are not going to significantly change.
We could add to that quality dataset going forward, but no one would argue that the injection fraction with patients with congestive heart failure will not be an important piece of a measure going forward, regardless of whether there are varieties in the kinds of medications you may use going forward. So I think that is the first thing.
It is less about embedding the measure of logic into the system, as much as it is about making sure you have got the right data in your hand. And I do think that quality dataset should evolve to take on the additional important measures that come forward to capture that. I mean outcomes for example, may be difficult if it is just the EHR on my desk as opposed to an EHR that is interoperable to other systems and registries and things along those lines. So I think that is the first piece.
So I think the second issue is that you know we increasingly are moving towards the outcome measures piece of it. So I think the less we rely on the narrow processes, I think increasingly what we are going to see, is that those process steps become clinical decision support. At least that is my hope. Remind me to do the key steps along the path.
Build that into decision support systems, the knowledge management piece that Blackford was just mentioning, but in some ways allow the measurement systems to rely on whatever the data source may be. And I think we are going to have to accept that fact that probably depending on your optimist/pessimist view of the world, the next five years, for example, I am an optimist, we are going to live in an environment that is going to be uncomfortable. We are going to have lots of different data sources measuring people, docs, and hospitals, whatever the case may be. I think we are just going to have to live with that.
And I hope there are sufficient dollars around to, in fact, research and help us to understand what those key differences are. But I have no doubt that we need to build the right kind of measures we need for the systems that are coming, even if we do not have them in our hands yet, Harry.
But I think that at the same time, we have to acknowledge the fact that we are going to continue to have to have a library of measures that say, if you can only do charts, this is what you do. If you have got administrative data and you can pull in a couple of key pieces of clinical information electronically, this is what a clinically enriched administrative measure may look like. And this is what it is if you have an EHR.
But this next five, X number of years, I think, is just going to be a difficult time period that we are just going to have to accept. And those 60 percent of providers in your community probably are not going to be ready by 2011 or 2012; even to turn the button on and immediately produce their EHR based specs. But hopefully, they will be able to do a combination of clinically enriched administrative measures or, as necessary for some of the outcomes, do something short based.
MR. REYNOLDS: Yes, I think I agree with Helen in that perspective. Think about a launch to the moon - which is what this is, basically, you know where you are at Cape Canaveral and you know where the moon is and you are going to shoot for it. But there are going to be a lot of mid-course corrections. You do not just set on one trajectory. And as long as the goal is there, and the techniques and the equipment are there to be able to make these mid-course corrections, we are okay. But we are not going to get it right the first go around. There are going to be a series of revisions and revisions and revisions until we get to the goal.
So as long as there are mechanisms and data elements, then you can revise these things. If a big study comes out saying that estrogen is bad for heart disease or bad for breast cancer, you can switch it from one to another. So those are possible.
I would like to say a little bit of disagreement about outcome measures because I am always a little nervous about outcome measures because the relationship between process measures and outcome measures is not one to one. It is not one to one for a variety of reasons. One it takes a long time to go from a process measure to getting an outcome. And the second is that sometimes they do not occur. Sometimes you can do everything right and get a bad outcome. Sometimes you can do everything wrong and get a good outcome, so some of these outcomes are beyond our control.
So, for example, my oldest living parent died at 62. My dad died of terrible heart disease and this and that and he was doing everything wrong. He just did everything wrong. And I am doing everything right. But there may be something in my genetic code that at 62, I am going to flip off. And that I cannot do anything about.
So outcome measures are a little less controllable. Process measures, either by getting the physician to behave well and the system to behave well, you can do something about. So obviously, you want to get them as closely related or as distal as you can, but those are things that you can change. You cannot change some outcomes.
DR. BURSTIN: I think the whole point that there is also an important interim step as we get towards getting to look at more outcomes, which is also just the process of assessing outcomes. It sounds a little odd, but for example, just a clinical example from where I practice, you know, we have got a small community health center, and just got an EHR about a year ago, actually. We now have our nurses aides, who have at best a high school education, doing a mental health assessment of patients, to screen for depression. They walk in the door, it is color-coded right in my EHR, and it is color-coded in red if somebody does poorly on their NHQ. I know exactly who I need to intervene on.
Again, the outcome may not be at the end of the day, that I have significantly improved the outcomes for patients with depression, although ultimately that is what I think we need to be held accountable for, but boy, if nothing else, you are in fact doing the outcomes assessments of patients, the screening, the functional status assessments, to allow up to even see what patients are on that trajectory. Until we know what patients are on our trajectory, it is just a difficult situation.
Our end-stage renal diseases project, a couple of years ago, had two patients on the Committees who insisted, insisted, we had to have a functional status measure for patients on dialysis, because you feel terrible. But a lot of the docs say, I cannot be held accountable for how patients on dialysis feel. But as an interim step, at least, there was a requirement that every year, all patients on dialysis have a functional assessment done. So I think there are steps on that path that get us closer towards what is meaningful to patients, in terms of outcomes.
DR. MIDDLETON: Just following up that line Helen and David, one of the things that is sort of akin to the attributable quality idea, is to recognize that actually the terminates of premature death only are impacted to a small degree by what we do in healthcare, of course.
Larry Green not being here, I have to speak on his behalf, that he would say sometime about now, that 10 or 15 percent of healthcare has in fact a premature terminates of morbidity, but the rest of the determination is based upon community and behavior and genetics and, et cetera.
So I guess what I am suggesting is that one of the things I would love to see research focus on, is how to then take this case-mix idea and really account for the heterogeneity of the patient population, based upon all those other determinates so the doc does not feel that he is doing the right thing and getting a bad outcome. Or the patient is doing the wrong thing, but still getting a good outcome -- all those kinds of vagaries.
If we could better, actually tune the measure, to account for not the typical connotation of case-mix, but a more subtle connotation if you will, that accounts for genetics, for behavior, for community or social exposures, et cetera.
DR. CARR: Thank you both. These have just been tremendous presentations. And interestingly, they come from, in my mind, I am seeing them on two ends of the continuum, where if we work hard and less is more to measure certain things that NQF endorse, and yet I am impressed with the engagement of physicians in this practice management. My observations are often that as we are pulling together, whether it be in-patient or out- patient measures, maybe you are a physician who is involved in one case or one off here or one off there, and there is a little work around to kind of make sure that that discharge summary does not go out without that input or whatever.
And I am impressed by the process of looking at your cohort of patients and looking at what you wrote, and how you manage, because implicit in that, is your ownership of the continuum. So that when you get to the one off measure, you have a context to put it in. And I think there has been an asynchrony with the physicians being involved in measurement and they are often just hearing about what you forgot, what you did not do, what you should have done. And I love this idea of really owning the cohort.
The second thing is the whole P4P campaign in Boston. People spend an inordinate amount of work getting from 98 percent compliance to 99 percent. And similarly, groups are tearing practices in Boston and those that fell below 98 percent might be tier 2 and those above 98 percent. So there is a, perhaps I am exaggerating a little but not that much, the point is that we are wasting effort worrying about that one thing that was perhaps not even preventable, when in fact this kind of in-depth analysis is very rich. And I think as we do these measures, we have to give some thought to how they are used in P4P, because when we talk about waste that is a waste. And it is at the expense of this kind of rich look.
DR. TANG: Well I think part of what Helen and David mentioned about the goodness of a measure, was a reflection of an opportunity and 98 percent represents very poor opportunity. It also, I almost think, defectively says high gaming because it is just not humanly possible to achieve that kind of quote performance. So can I go to the next topic?
DR. BURSTIN: One brief response and I think it is an excellent point, and again something that does not have an option for improvement, would not at least bring forward the NQF endorsement. It just would not be a measure that we would consider important enough to make the effort to publicly report. You may want to do it internally for QI to make sure you do not fall off the cliff, as soon as you stop publicly reporting it. But at the same time, it probably does not rise to that level.
I do want to make one another, I think important point, that I think has not had a lot of discussion, which we often talk about just the absolute value of a measure as opposed to the trajectory for getting there. And I think, particularly for a lot of providers who may not be at the 90th percentile perhaps, the trajectory of recognizing a huge improvement over time, is something that should also be rewarded. I do not think we have seen a lot of that. It is a bit in the value-based purchasing program from CMS, some hospitals on that path towards the trajectory, the absolute threshold, get payment as well.
And as a safety net provider on my day a week, I have the same perspectives. I can have somebody walk in my door with an A1C of 15 and I can get them to eight pretty quickly. Getting them from eight to seven can often be difficult. So I think again, understanding the trajectory, as well as the absolute threshold, I think is another consideration.
DR. TANG: Part of what I felt was so exciting about both presentations is the direction towards better quality indicators of performance; let me put it that way. So in the NQF way, it is looking at the quality of the data and what David talked about, I think that is what is in the reliability column. But let me check my understanding of the vision, particularly on the certification side, the ABIM, is just like in CME we know how ineffective it is to plop somebody in a seat in Hawaii. And there is a trend towards going towards incremental CME, using adult learning methods of, gosh the best time is when I have a question with a patient in front of me could I get credit for looking something up right at that time? That would be reinforcing good behavior and better care. Your counterpart was the PIMs, because in a sense, what you have done is, I think, is created a notion of self-examination and doing something about it, versus sitting down for a written test in front of some computer.
Now, I think you talked about continuous accreditation or certification. If I interpreted you correctly, you are trying to instrument peoples practice to figure out whether they are still practicing up to date, good quality medicine. And if to the extent that we can both on the NQF endorsement site and the accrediting site use the same measures that depend on the same data in the same EHRs, to essentially like, Carolyn Clancy F7, would get me both my quality reports and my certification, if you will. Is that sort of the vision?
DR. REUBEN: That is the vision. And the vast majority of the PIM measures and we have I think 525 PIM measures or something like that, are NQF endorsed. Now not all measures for all conditions have NQF endorsement yet. So there are some that are not. But, in fact, we are trying to align with NQF, so that it is the same measures. We are all working towards the same goal.
DR. TANG: So that is where I go to, is I really am interested in the reliability, because it stuck out that that high gaming smoking cessation counseling measure, I believe, has a low reliability and we should probably go away from things like that. Because it is not clear to me that the evidence says forcing something into the documentation about smoking cessation counseling, improves the smoking rate.
And so why create the gaming because with EHRs as you know, it is instant verification. You can cause it to happen. And the other things that I looked at, I noticed in your diabetes PIM, I think you showed, is the A1C greater than nine measure, so as the not bad measure. And in a sense we are trying to go, I think, away from those kinds of measures in NQF endorsement process, because it does not align well with what physicians think about their own performance. So you do not think you want to be not bad. You know what the guidelines say and you want to head towards those guidelines and to the extent that your mental model of what it is to do good quality measures, lines up with the measure, that, essentially I think, is positively reinforcing in terms of the good behavior. So in some sense, it may be instead of going after the current NQF measures, we want to go towards where the puck is going, for the measures that are aligned with practice and the way physicians think, and I think can really create a positive reinforcement move.
DR. REUBEN: Yes, we have actually thought about this a lot. Not so much with ABIM, but certainly with the ACO measures which is really a floor. That is, if you are not performing these - they are not a ceiling, they are not the real goal. But if you are not performing these things, you probably are not a very good doc kind of thing. And as you get higher and higher up that aspirational ladder of what you really would like to have, you get much more push back. You get much more push-back because other things happen. You cannot achieve that 98 percent, that theoretical 98 percent. But if you say, you know this is a floor and if you do not meet this floor, that you are not doing a good job. How can you argue if you are in that group that is the bottom feeders, that less than 44 percent? How can you justify that you are doing a good job as a doctor when the floor is low.
DR. TANG: So maybe that is my real question. Philosophically, is a certification organization like APIM going after the floor to eliminate people or going for the goal that motivates people. And I almost think that you are going to capture far more docs on the latter.
DR. REUBEN: So you are absolutely right. And if you take a look at our strategic plan, our goal is to continuously set the bar higher. So here we are today in 2009 the bar is here. In 2011, the bar is going to be a little higher. And the bar is going to be higher in 2013 so you keep dialing it up.
DR. TANG: So the question is are you setting the bar for the floor or the top part? I feel better about getting 60 and then 70 and then 85 towards my real goal, versus passing over the 20 percent goal.
DR. MIDDLETON: I think Paul and I have a similar vision or aspiration here. In many ways, I would much rather be pulled up than smacked from behind. In a way, the low bar is the smack from behind. The aspirational goal or the stretch goal might be that in fact, certification takes on a new flavor. It is actually continuously monitoring, continuously educating and providing instantaneous feedback at the point of care, so that I actually know, is my diabetic population within the guard rails or is Mrs. Smith actually within the guard rails of her diabetes care.
There is a problem though, I think, that we have to sort of recognize here. If history tells us anything about the Internet, I am concerned we may see some cognitive substitution in healthcare, in ways that have happened in other industries. That is, many more decisions may be made by many more different types of people. So rather than raising one bar, I would suggest maybe a spectrum of pro-active certifications, if you will, in the manner in which we just described, that might actually delineate a range of professionals rather than an increasingly god-like physician.
In other industries, of course, many different types of folks are making many different types of decisions based upon tools and utilities and knowledge and the access on the Internet, all the disintermediation stuff, et cetera. So they can gain knowledge, rather than accessing the provider, per se. So it is just a thought, thinking about the high bar and a range of professionals, all perhaps warranting certification under the ABIM, but in different roles.
DR. REUBEN: We have talked about this. ABIM is famous for talking and being very careful and ruminating and considering many things. And in fact, we have talked a lot about whether currently it is a dichotomous threshold you are certified or not. But the whole idea of caring physicians and having physicians who are exemplary and fine and good and whatever, we have not crossed that threshold yet.
It is not that it is out of the question for the future, but it is not thee in 2009. And part of the reason, it is really interesting, this came up yesterday at the board meeting, is do all physicians have to be superb physicians or can they be fine physicians? And is it okay to go to a fine physician, or do you have to go to that person who scores in the top 10 or 20 percent. For me, I am fine with my physician being certified and being competent. I do not need that top one or two percent. I do not gain that much from it. So these are really good questions, but the idea is to move the bar, continuously raise the bar. And that is right in the strategic plan.
DR. BURSTIN: I want to follow-up on that. I think that beyond even certification, the capacity of that magical F7 button, allows providers within a group, and I say providers in the broadest sense of the word, Blackford. I mean, all clinicians sitting down together at a systems level reviewing the data out of your practice. I mean to really have that in real time obviously, with my time at the Brigham, when I headed quite a measurement, I had that. I would hand out to all the docs our entire internal medicine practice, here are the measures we initially had it A through Z. That lasted about ten minutes, because we all unblended in about 30 seconds because we wanted to see who was best.
But that is that kind of real time feedback that allows for really dramatic systems improvement and helps you figure out -- it really should be the medical assistant who just routinely does flu shots without an order from me. I mean that is the kind of stuff that, I think, becomes a systems piece of it and becomes so apparent when you have the ability to rapidly report. Not a year later get the reports from an external body saying this is how you did. But in real time and even without necessarily having to do your own chart review, seeing this is how you are doing on a continuous basis.
DR. REUBEN: And that is the Holy Grail for us. The Holy Grail is that doctors in real time can monitor the care for a number of different conditions. That said that is not going to be possible without electronic health record. You just cannot get there.
MR. QUINN: Thank you both for your presentations -- This sort of gets into one of the thoughts that I had. The PIMs and the MOC is a really compelling lever for motivating clinicians. And many well-intentioned and well-designed quality improvement efforts, have not really gained physician buy-in for a variety of reasons.
It seems to me that if you can align these two, the PIMs as well as the measures that gain buy-in, there is an opportunity for a synergistic effect. Is it the measures that have to gain buy-in from clinicians themselves or how do you design this so that it just makes sense, and that you do gain maximum uptake?
DR. REUBEN: I do not think the measures have been a problem. The measures are based on randomized clinical trials. They are based on guidelines. They are not the problem. The problem is getting the docs to be able to do it.
And there are a lot of issues why you know, for example, if you have an old person who has got 12 problems, for some reason or other, doctors feel compelled to do a lousy job addressing all 12 at each visit and then spend an entire visit addressing one problem, doing a good job. It is just somehow or another, how we were trained as doctors.
And there is a tremendous amount of inertia in what goes on in a doctors office. A little quality improvement project I did a few years ago, I sat and watched every one of our docs interact with patients and saw what went on in the visit. And there was a lot of time with talking and a lot of time with counseling there was a lot of time addressing those issues, but the kinds of things that would be measured under quality performance just were not there. They did not have the time to do that.
So I do not think knowledge is the issue. I do not think what the measures are, is the issue. I think being able to get that behavior, the knowledge into practice is really the limiting step.
DR. BURSTIN: I agree with David. And just to add to that, I think that while the measures themselves may be okay and they are evidence based, I think, many clinicians we view, just do not go far enough. They are not even thinking about the 2013, 2015 kinds of measures that are envisioned for meaningful use, care coordination, patient and family engagement, does take us to a different place.
And those are not the traditional measures that most docs have had fed back to them or even the ones that we have had as part of our assessments, other than patient experience, which I think many would argue, has been really transformative for a lot of health systems to have that HCAPs data or other kinds of patient experience data.
But I think going beyond that to get at even other levels of how often do in fact, do you get a discharge summary back in a timely manner? Are all the labs followed up at discharge? I know what happened. I mean, those are the kinds of things that I think providers, and I do not want to say docs, because I think it is bigger than that, it is really the team, would find so meaningful.
So it is not just measurement for the sake of measurement, but it is actually measurement that is meaningful to me as a clinician to do a better job because it gets more at the systems.
DR. FITZMAURICE: You have painted a picture of really how difficult it is and yet how far we have come from NCQA to NQF to the board certifications and holding people more and more accountable. I sense it is not going to get easier. We have the quality measures and I work with Floyd at the HITSPE population technical committee, where we worked on quality measures and surveillance measures and since then there is this continuous, this is not precise enough, we need to send it back to NQF to the HITECH Committee that was chaired by Paul to get more precise measures of just what was meant in the quality measure.
Now as we come up into 2010, 2011, 2012, 2013, 2014, we are going to be moving from ICD9 to ICD10 and we are going to be changing some of the claims information. Beyond that, we may be moving more and more to SNOWMED as underlying coding system. What that means regarding difficulty for quality measures and for meaningful use measures as well, is that the kinds of things that the electronic health records are massaging are going to be different. They are going to be split into parts and some things will have to be aggregated.
Is NQF working with the quality measure developers to bring some realization to this? I have seen some really good work that Floyd has done in seeing a piece of the ICD9 codes, these are the exclusion codes. It is all going to have to evolve into something better and better as we get into better and better coding systems. Is this on the radar screen, Helen?
DR. BURSTIN: It is completely on the radar screen and in fact we are also doing some additional work under our HHS contract that is an expert panel that is meeting shortly to begin understanding what are those transition issues and how do we even envision them? When all the measures come up for maintenance, for example, by 2013, they will have to be in ICD10 or SNOWMED, part of the determinations of what happens with the HIT Policy Committee. So that is definitely the trajectory.
And again, this idea ultimately of saying, if you are offering a tool from an EHR to NOVA in 2011, 2012, 2013, and you have the capacity to go to a measure offering tool, it will automatically pull up the appropriate code set.
I think part of the challenge is that we do not often have all the crosswalks we need between ICD9 and ICD10 and SNOWMED. But I think, to me, those seem like technological issues that are achievable, as opposed to, I think, the harder stuff which is actually around capturing some of the logic and are we actually staying true to what was intended through the guidelines, through some of those transitions.
And I think some of the measures that have been developed, again, have been based on clinical guidelines that are not as precise or pristine as we would like, and so some of the logic of trying to translate some of that to an EHR is actually the harder piece than the coding. I would argue, last time, known well of when stroke systems began. It is pretty darn hard to put in an EHR. So you have to really rethink from scratch, how do you develop a measure that gets at the timeliness of stroke treatment as opposed to starting with the measure that we have, which was developed for a very different environment that is not IT based?
DR. FITZMAURICE: Could I ask another question? And that has to do with preparatory quality measures. If I made a health plan a group of providers, and I want to measure quality, I may say, this is really a good quality measure, but do I have to pay 25 cents a patient every time I apply it to the quality measure developer for 1,000 patients that is $250?
Is there some way to handle the economics? We did it in one case with SNOWMED. We bought out the license for five years, here is a $5 million payment and so much per year, to use it and then at the end, we can use it as it stands in perpetuity. That is one subscription model that could be used for paying for quality measures. There may be others. Do you see having to pay for the use of quality measures to be a large or a fairly small barrier to the use of quality measures?
DR. BURSTIN: It is an excellent question, Mike. To date we have all the current NQF measures are available without fee. There is definitely a move towards some more of these proprietary systems, especially if you look at the complex proprietary systems where there have been years and years of investment in a complex risk adjustment data base. Some of those coming through may have associated charges. I think one of the challenges going forward, if for example, the grouper methodology is something we want to move forward with, maybe should that ultimately become a public sort of analogous situation to SNOWMED? I do not think we know the answer yet. But I think you know, increasingly, we are seeing that without the public support for the measurement side, and that I think has been a challenge for us, you know as the clinical registries have been developed, oftentimes by specialty societies with specialty society funds, as they transition that to being for public reporting, how does that evolve? And I do not think we know the answers yet. But I think it is something we are all, kind of, keeping a close eye on. Hopefully, the additional support, as I mentioned early on that is hopefully in some of that health reform, for measure development will get us to maintain more of that critical measure development expertise in the public domain.
DR. REUBEN: I would just like to echo that. I think the way to do this is really to have these measures in the public domain. And to do that, you are going to have to pay for the development and the maintenance of these measures. And rather than doing that on the back end after they are developed as a for profit industry, that this is a wise investment for the government, is to invest in the development and the maintenance of these measures. It is just going to make it much easier to distribute and much more widely accepted. Otherwise you get into the terrible issue of the best measure may be proprietary, but you cannot use it.
DR. CARR: Thank you so much for this very rich and exciting discussion. Could I ask that we could get copies of your power points? Thank you and we will distribute them. And so I think we are going to break now actually, for a long lunch break. A number of folks are flying in and will arrive at one. So we will reconvene at 1:00 p.m. Thank you.
(Whereupon, the meeting was adjourned for lunch at 11:32 a.m.)
Agenda Item: Current Measure Development, Endorsement, and Adoption Process
DR. TANG: On behalf of the Quality Subcommittee of NCVHS, I would like to welcome you back to the second part of todays program and we had really good testimony this morning. I am sure we will be continuing this afternoon with really excellent panelists. I appreciate them being here.
We talked this morning about the process of identifying high priority conditions, NQF for example, and the vision of sort of continuous certification from David Reuben, the Chair Elect for ABIM. Now we are going to move into sort of the measure development. We talked about there are lots of measures out there but maybe not enough good measures and so we are trying to hear from this panel what is the process for getting a measure developed and endorsed and how can we even encourage, promote, and pull some more good measures out of this system so that we can ultimately get - now that we have this window of opportunity with the HITECH Act and saying we are actually going to pay for some of these measures in a substantive way in terms of relief or compensation for implementing EHRs. This may be a big moment of opportunity where we can put some more effort into this area because there is money on the table so to speak for the adopter side.
With us today we have Karen Kmetik who is the Director of Clinical Performance Evaluation at the AMA, and works with Bernie the PCPI, Physician Consortium for Performance Improvement. Sarah Scholle is the Assistant Vice President for Research Analysis at NCQA, a major measure developer, and Dr. Bernie Rosof is the SVP for Corporate Relations and Health Affairs at North Shore Long Island Jewish. I think you are the hospital that made news saying you even up the ante even more, almost doubled the potential incentive.
I think we are really at an inflection point where we can use if we had good measures, we can really change the face of quality and quality improvements. And Frank and we will let Justine do because you are such good friends.
DR. CARR: Frank Opelka superb surgeon and President Executive of the Louisiana State University Healthcare Network, physician executive, and recognized national leader and patient-centered healthcare for the surgical patient. Thank you all of you for being here today.
DR. TANG: Is the order that we introduce you a satisfactory one or you have a different order that would make sense?
DR. ROSOF: Thank you very much, Paul, for inviting me and for the opportunity to provide some comments, which I hope will be useful to the topic you have just introduced. From my perspective performance measurement as a science has many purposes and you have probably heard a few of them already this morning. It should be an integral part of all efforts to improve the quality of care. It should encourage performance improvement with the ability to benchmark individual or a group performance against regional and national standards. It should advance efforts to support quality improvement at the point of care specifically by integrating measures into electronic health records and electronic health record systems, which you will hear a little bit more about in the a few moments. And when the data is valid, when it is tested and it is risk adjusted where appropriate, it can be used for public reporting to help share decision making and ultimately choice.
Now the Physician Consortium for Performance Improvement and other organizations, are beginning an effort to advance the alignment of performance measures to be integrated into maintenance certification programs. You probably heard that from ABIM this morning. All of this clearly will enhance our efforts to provide patient-centered care, the object that we are all after.
The tools necessary to accomplish this include commitment from the profession, further research into the science of performance measurement and quality improvement, education of healthcare professionals providing care as the value of performance measurement, adequate funding, health information technology integrated into the work flow of the providers of care, and a true commitment from the academic educational community to incorporate into the curriculum of medical students learning about the principles of quality and measurement. No longer is it appropriate to ask the question is it necessary to teach quality. It is.
Assistance for some of this is already underway with funding appropriated for comparative effectiveness research, information technology, and a clearly stated commitment from healthcare professionals, from professional societies, and of course from the American Medical Association.
How does one go about selecting a high priority clinical area to develop quality measures? There are competing pressures to accomplish this; CMS and the PQRI program, consumer purchases for public reporting, well-defined gaps in measured development as articulated by many medical specialties, particularly as they become necessary for PQRI and other pay for performance type programs, and of course accrediting bodies and maintenance certification programs among others.
Now recognize also that the smaller specialty societies perhaps have a little bit more difficulty in this area having not had the resources that other large specialty societies do and so this may add to the problem additionally.
The selection process must include the necessities of developing specifications for multiple data sources including EHRs are protocol to test measures, are compendium of clinical guidelines and best practices to facilitate measured development, and truthfully a plan for implementation never should forget the necessities as we begin this process for a plan for implementation.
A hierarchy to enable decision making in this complex setting as determined by PCPI and others, including the National Quality Forum, includes importance, includes scientific accessibility and acceptability and the evidence becoming available, usability, and not only usability but interpretability of that usability, feasibility, and to determine where gaps in measurement exist.
Now the gold standard would obviously be that the measures would be generated from a strong evidence base, that they be clinically rich data, and that they would employ strong risk adjustment mechanism.
Now a real concern and somewhat of a danger that I ought to mention at this particular point is that we move forward conceptually this kind of a group and leave behind the essential drivers both absolutely require to affect and create change and will maintain the public trust. Those are the practicing physicians, nurses, and other healthcare professionals. We need to be certain in that what we do we dont lose them as we move forward in the agenda that we are about to move forward in and be sure that we incorporate both education and learning at all levels so they are involved in this decision making in addition that we dont make the decisions without them being involved.
If we look at the first slide, what we are interested in doing is increasing these numbers. Only one physician in five receives process of care data. It kind of builds on what I was saying just a few moments ago, and less than one physician in five receives clinical outcomes data. We all know that physicians like to respond to data. They like to have accurate data. They do not want to be outliers and they respond very, very effectively to data if it is provided appropriately.
Also, in terms of meaningful use of electronic health records meaningful use needs to be meaningful to all healthcare providers and that includes all healthcare providers. The goal probably would be to engage all physicians and healthcare professionals from varying specialties in meaningful use of measures and meaningful use of electronic health records.
Now the PCPI is convened and staffed by the AMA. Let me go through this although many of you may know this already. Membership consists of more than 125 national medical specialty and state medical societies, ABMS and its member boards, CMSS, the Council of Medical Societies, AHRQ, CMS, Joint Commission, NCQA, and NQF, are also active participants and represented at PCPI meetings. Experts in methodology are an integral part of the PCPI and 13 non-physician healthcare professional organizations participate actively at PCPI meetings. In addition we now have a healthcare consumer/purchaser panel, which I will chat about in just a moment.
Our current measures portfolio consists of 42 measurement sets, 260 plus individual measures, and approximately 70 percent of the measures in CMS PQRI were developed by the PCPI.
In terms of the consumer/purchaser panel, fresh out of the press we had a meeting with the consumer purchases, Peter Lee, Debra Nest, David Hopkins who were part of that meeting. As part of that discussion we brought up the issues of measures that matter, consumers and purchasers views of uses and users. There was some consensus related to this, so I thought I would bring that to you this morning and the consumer purchases felt and that would be a good opportunity to make this part of the presentation.
The consumers and purchasers want measures that are useful for accountability and for quality improvement, performance improvement. These include outcomes, for example, functional health status, morbidity, mortality, et cetera, composites of multiple process measures, resource use, care coordination, patient experience, measures that taken together provide a comprehensive picture of providers care that has become more and more important as we move this forward, measures that show gaps and/or variation in care, and also measures that show disparities among different populations. We are the gaps in equity that impact the delivery of care and impact specifically, outcomes. A very important point, maybe a little difficult at the moment to include as part of all measures but has to be a goal as we move measurement forward.
In terms of the criteria for a topic selection the required characteristics I mentioned briefly, but important also are gaps and variations in care. Remember that the measures we have been able to accomplish up to the present really are measures that are more appropriate for large specialty societies. The small specialty societies have been not involved as much, but must be and those gaps are obvious and we will move forward to correct those gaps. Evidence base and high impact.
In terms of high value characteristics, care coordination is extremely important. Patient safety is involved where care coordination is eliminated, specifically when it comes to transitions of care. So care coordination is an important aspect, patient safety and appropriateness and overuse. As you know I chair the Overuse Committee for the National Priorities Partners, and part of the work flow for the PCPI going forward is to coordinate the effort between the National Priorities Partners and our development of measures as we move this forward.
Our portfolio of measures, again, to make mention as you can see. There are measures across all specialties recognizing once again that there are clear gaps that we will make efforts to fill as we move forward, but we have tried to create measures that are cross specialty and cross cutting so that it can be used by a variety of practitioners.
Now the current work plan is to fill those gaps to include specifically appropriateness topics. Now we can say appropriateness. We can call this in addition, overuse and appropriateness. For example, the surgical and nonsurgical management of back pain, which is a target that we have with the National Priorities Partners, a percutaneous intervention for chronic, stable coronary artery disease, maternity care, for example, induction of labor and C sections, sinusitis with antibiotic prescriptions and sinus radiography, and diagnostic imaging. Diagnostic imaging seems to be a theme as we move forward in the overuse arena.
Important to recognize when we discuss overuse, it is always overuse while delivering appropriate care. Its not overuse in itself as a single word, but it is overuse in the delivery of appropriate care. We dont want to misuse the word overuse without understanding that this is for the delivery of appropriate care.
Currently also in the work plan is care coordination. Phase I is transition from hospital discharge to home or other facility, transition from emergency department, this area of transition, which is so important, and phase II transitions across ambulatory care, outside of the hospital environment where transitions are as important.
I think if we look at the current work plan to round out our measurement sets, let me give you an example for the Heart Failure Measurement Set because that is exactly what Karen is working on or has been working on recently, they want to round that out by including outcomes, intermediate outcomes, and process. Once again we get into the area of appropriateness. But we cant always talk about appropriateness and overuse without considering under use in addition. There are many areas. A quick example is in the treatment of pediatric asthma where under use of medication is a specific concern and so under use is sometimes as important as overuse. And also we want to be certain that we include both the inpatient and outpatient arena.
As we move forward the current work plan is also to integrate into EHRs. Karen is going to talk to you about that in a moment. We are working collaboratively with NCQA. We have a very specific collaborative relationship with NCQA in developing certain measures particularly the measures of appropriateness, and with NQF once again to try to coordinate and align our efforts with their efforts in the National Priorities Partnerships goals.
We are working with the EHRs vendors to incorporate the use of measures within the electronic health records, and also we havent forgotten the physician users of EHRs. Actually here is the group that is going to implement and make things happen we hope.
That is briefly the overview I would like to present. Karen will go into a little more specifics related to some of the work we are actually doing. If there are any questions we will be happy to answer that in addition. Thank you.
DR. CARR: Thank you. That was a great overview. I think what we will do is go through all speakers and then come back with questions.
DR. KMETIK: Thanks very much for the opportunity to share some further information with you about what we are trying to put in place and operationalize so that we do have those measures that are meaningful to different stakeholders and we are getting them into electronic health record systems. On the next slide is where I want to overview with you today is a model that we put in place into our measured development process to try to move forward in this arena.
I also wanted to throw out a proposal about how we might take some of the lists that are out there now of measures and maybe make it a little more tracking of progress, a readiness kind of list, and then lastly, to go into a little more detail about Bernie mentioned about specific areas that we are focusing on to try to fill the gaps.
When we thought about this activity of building on the core foundation of the PCPI, which is we have all those specialties around the table. We have those 13 other healthcare professionals around the table. We have a consumer purchaser panel. We have consumers, patients, and employer groups on every work group. We have that foundation. How can we move it forward in the right direction?
We put this model in place where we said for thinking about the measures and putting them into electronic health record systems and taking advantage of that new rich clinical source that we have all been hungry for so long, we said first we are going to start with our core base that we have these folks around the table and we have this growing portfolio of measures across specialties and subspecialties.
We want to make sure then for the measures that they work on that they are covering those critical areas. We want to then make sure we are developing the appropriate specification so that those measures can be integrated into EHRs. We want to vet that with those physicians who already have the EHRs, as well as EHR vendor community, sit down around a table, and roll up our sleeves. Does this work? And then our notion as part of our model is to have a series of what we call incubator groups. I will explain a little bit more about that. That is sort of the model we have in place to make sure we have the right people at the table where advancing, evolving the types of measures that everyone is looking for and we are building them in a way that they can be integrated into EHRs.
If I just go to the next slide, this is just a visual representation. Now, I am from Chicago and we lost the Olympics, but we are in denial. We are using rings in everything that we do because we are just in denial. But this is just a physical representation of the model I talked about where in the center there the one and two, that is our body that we work with to develop the measures. We specify the measures. We then vet them with the EHR vendor community in circle number three, and then around the periphery there the fours, are these incubator groups, which we think are so important and again I will explain those in a minute.
In the next slide if I just take an example and I am building off the heart failure example that Bernie mentioned. This is a measurement set that we developed with the ACC and AHA and it has been around for a while now and primarily up until now included process measures.
What you will see coming out soon then is a new set of measures for heart failure that includes inpatient and outpatient, as Bernie said. It includes measure of overuse. It includes a functional assessment type of measure. We are building them out as we said and it is really gratifying to be able to pull those groups back together who have learned and are now ready to go in that next direction.
We are also benefiting from some of the researchers. I dont know if you have seen the article by Steve Persell from Northwestern who is saying, lets take of that data. Maybe its not just what is the blood pressure but how many drugs is that person on. What are the doses of those drugs? What have you done recently to try to control that blood pressure? We are trying to take advantage of the research as well. If I look at one measure out of that set just by way of example, is the ACE/ ARB measure for LVSD.
What do I mean by saying we are trying to develop the right specifications? In working with Floyd at NQF, we are using this language so we can talk to each other and not get tripped up too much. We have level 1 EHR specifications we call them. That means we are going to take that measure in words and we are going to turn it into all of the code sets, the algorithms, the calculations, the rules, that one would need to be able to use that measure in an EHR environment. And I will be honest. Right now we are going to provide every code set option because the nation has not exactly landed on the code set nor have these code sets been built into every EHR product yet. We are tracking that with HITSP and others, but right now we are going to put it down there so everybody can see. We are going to look at are there SNOMED codes that work, are there the ICD-9 codes that we need, are there CPT codes in the area of drugs. We have got to give both NDC and Rx Norm right now. Happy when everybody jumps on one bandwagon we will be right there with everyone. We are giving the CPT-II codes also, as an option as well as some of the codes there that can be used for the exception reporting. That is what we call level 1. That is now a product of the PCPI and we are doing much of that in concert with NCQA and NQF.
Level 2 specifications we call something different. That is putting it into - I just call it an IT savvy format and that is for the programmers at the EHR vendor companies to be able to take our specifications and put them into their products without having to rewrite code. We came up with a prototype for that. It is now going to a standard development organization with NQF sponsorship and we are tracking that and we are ready to take that step when again everyone lands on what that should be. Those are two parts.
If you go to the next slide then, the evaluation. We think this is an important part and that is why we built into the PCPIs model. We want to talk early and often with the EHR vendors, not that we want to be limited by what the products are today, but to be able to have that conversation to say you know, we really need to someday have in that system not only prescribed data but dispensed data. If you are doing the ePrescribing can that be connected. We are trying to have those conversations again toward being able to move toward a next generation of measures. We are having those conversations as Bernie said, with physician users right now to say how are you using the EHR now? What pops up on your screen? Where in there are you reminded about the aspect of this measure and can you report the data? We think that is an important part of the model if this is going to be successful.
The next slide then talks about these incubator groups that I mentioned, which are just something that I love personally because it gives us a chance to really sit down with a group of physicians who have electronic health records. We give them the actual national specifications. We give them those level 1 specs I talked about. We collect data. We have it sent to warehouse and we can analyze where are the issues. And again, we think this needs to be part of the model of measured development that it needs to be yes, the expertise, the evidence base as Bernie mentioned to have the measures, to put them into the specifications, to vet them with those who are going to use them, but then we need some real live groups that can give us that rapid feedback to say you know what, you described it this way and that aint working or how about this way.
We feel honored that we put this group together actually four years ago now with some grant funding, different practice sites, different specialties, different EHR products and that is critical, different EHR products.
On the next slide just share with you a little bit about what we have been able to find from that. Again, if you just stick with that measure as an example, the ACE/ ARB measure, discrepancies between the NDC codes and our measure specifications and the NDC codes that are in different EHR products in those practice sites. Even if we could have the exact same coding, first of all it depends on the EHR vendor sending the NDC update to the practice site, which is on all different schedules, and then the practice site actually installing the update.
We found that we had errors not because anybody was doing anything intentionally wrong but because we may have had what we thought was the most up to date NDC list for ACE and ARBs, but that list had not yet been uploaded in those practice sites. That sheds a great light on something. RxNorm could really help with that and conversations I know, are going along nationally about that.
But we would never be able to articulate this without those incubator groups, without actually trying it seeing what happens. In this scenario for this incubator group for the cardiovascular measures we have data now being sent quarterly from these practice sites to warehouse. They are able to use the data themselves of course, first and foremost in their practice. We can calculate performance rates. We can calculate exception rates and we can validate the data.
In this scenario for exceptions reported to the warehouse, we went in and took a sample, manually abstracted the data from the EHR, had a hundred percent agreement. It was reported we found it. The same for if the data warehouse said the measure was met, we were able to validate that 90.48 percent of the time. Then we also wanted to validate what appears as a failure. When the warehouse said this measure was not met, we went back in and actually there we see a big disconnect. In only 19 percent of the cases when it was a failure in the warehouse reported to the warehouse, was in fact a failure when we looked at the data and the mismatch was things like the NDC code. The drug ACE/ARB actually was prescribed, recorded in the medical record, but it wasnt exported to the warehouse because it didnt pick up the right code.
These are things we certainly want to figure out and solve before we go national with this and put different aspects of importance on the different data. I wanted to share that with you and that is what we mean by incubator groups. We have one cardiovascular care. We have one HIV/AIDS measures. We would love to be able to have a dozen of those in different disciplines. That we believe would be very powerful.
I will move now to another item I wanted to mention, which we want to build into the PCPI process, which is tracking the progress. It is one thing to say these are the measures that are important and we have covered all the bases of a set and that is down the left hand column there. You want process. You want intermediate outcome, cost/utilization so we can track that. But I would suggest that we needed to also then track the readiness for sort of going live in a national meaningful use program. This is just a simple way in which we are trying to keep track of it to say we got the measures. Have they been NQF-endorsed or is there a chance to go through NQF. Do we have the level 1 and level 2 specs? Have they been vetted by the vendor and physician communities? Have we tested it in the incubator group? And then are all the fields that we need in X percent of EHRs today? There has to be some comfort level we have that on the measures that we land on, these are ones we want, have all these steps occurred and then we at least have a comfort level. I am suggesting that maybe it would be helpful to us as we move toward defining more the measures we want to define in meaningful use we start to track it this way as well. Everybody is a comfort level and its not well thats a great measure but we are nowhere near. Well, lets see. What pieces along this continuum do we have or are missing?
Lastly then, there was a comment about we got a lot of measures but maybe we still dont have the compliment that we need. Again, we feel like we got the people around the PCPI table. We are going back to all the groups and saying where we have sets lets round them out as Bernie and I use that term meaning as we did for heart failure, lets add what is needed there to make that a full set, bring in the inpatient as well as the outpatient, et cetera. Some particular areas that we are putting a focus on in addition what Bernie said just to put a fine point on it is, the care transition measures that we have done and are through NQF right now going through the process are from the hospital side to another location or to home. And a big part of our effort going forward are ambulatory care transitions, just the hand offs, the referrals to specialists, et cetera. We think that will be very valuable to a lot of different specialties and sub-specialties.
We are also putting emphasis on pediatrics. I think everybody would agree a big shortage of measures in that area, and then again looking at the different sub-specialties that still dont even have that core yet that is going to be meaningful for that patient population.
I will wrap up. I am sorry if I have taken too much time, but again our thought is for the measure development side, to have a model in place to get where we all want to go and we feel that it includes working with the specialties and sub-specialties, building on the measures that we have, rounding out those sets, new measures where we need them, making sure we have the level 1 and level 2 specifications, vetting them with the vendors, testing them in these incubator groups, and tracking this progress. I think it is very transparent then for everyone to know where are we and where do we need to focus next. Thank you for the time.
DR. GREEN: I am Larry Green. I am the member of the committee, member of the subcommittee and no conflicts. Is that the routine?
MS. SCHOLLE: Good afternoon. I am Sarah Scholle from NCQA and I am happy to talk with you this afternoon about the work that we have doing in thinking about how we can expand our measurement opportunities using EHRs and health information exchanges. What I wanted to talk about today were the steps in creating eMeasures and some of the activities that we are working on to try to develop new measures that take advantage of the capabilities of electronic data sources and also our process for updating.
Today what kinds of measures do we have and what data have we been working from? Our measures have focused on retrospective review where we are looking at care after it has occurred, usually with a single point in time over a set of period of time where we use a specific threshold; for example, is blood pressure less than 140 over 90, and where we are thinking about multiple levels of healthcare but that means different data sources for the same measures because we are looking at different organizations or people.
The data sources that we are using are most often claims data, visits, procedures, and labs. To some extent we have electronic lab results available. Sometimes less frequently, we have clinical data, like the results of labs and radiology or CPT category II codes or medical records data or patient survey data. The measures that we have today are framed in our data sources today, and what we could do with those existing data sources.
But as we look to the future we are thinking about a different measurement setting and different capabilities, different data sources. We are hoping that the measurement will be concurrent with clinical services so that we will be able to influence care at the same time that we are measuring and monitoring. It will be linked to real time clinical decision support so that we will be giving clinicians an opportunity to look at the guidelines and say, what should I be doing. We will be working with the data source that is not dependent on who you are measuring. We are not using health plan claims data to measure health plans and medical records for physicians, but trying to think about electronic data sources that can build up and be used at multiple levels, and it brings us the opportunity to look at more clinically relevant measures. We can look at change over time. It is really hard to do in a chart review or with claims. We can look at actual levels and the amount of improvement, instead of trying to say did you pass the threshold and then if you are one point below the threshold we dont give you any credit.
We can look at multiple values. We can look at treatment intensification. We can try to stay not just is the blood pressure at goal but also we can try to take into account the different situations where well we tried. We did everything the guidelines said we should do and this person just hasnt come to goal. We are giving more benefit and recognizing the efforts of clinicians to really do well by their patients.
The data sources that are available in the future we think will be claims combined from multiple health plans. There are some real advantages to claims data. That is where you get the payments and some services that are better from those claims data but also electronic records, electronic patient surveys, personal health records. In the future we think the data sources are going to be richer and that we can even dream of an environment where it is all linked together and would allow us to really look at care over time, patient in a patient-centered and population-based way.
What does this mean for measured developers and evaluators like NCQA and PCPI? Immediately what we need to do is convert our existing measures into measures that can be used in this electronic environment. At the same time we need to be thinking about creating new measures that can really capitalize on what electronic data can offer. And moving to evaluation models that take into account the electronic data collection and outcome measures, so that we are able to look at the full range of care.
But there are a number of issues that we need to be thinking about as we try to move forward. What are going to be the formats for EHR-based measures? Where do we look for the data in the EHR? Does the diagnosis come from the diagnosis field in the problem list? Does it come from the medications that were prescribed from the lab results? And in EHR you can use any of those. What is the hierarchy for data searches? Does problem list trump medication list whatever? What code sets should be used? Should it be concurrent or retrospective? Should we be thinking about care for this individual patient at this point in time? Did you do the right thing today or should we be looking retrospectively? One of the reasons that will get complicated if everybody has sort of a different look back period because we are looking at one patient came in May and another patient came in December and then we are trying to put that information together. Should it be visitor population based? How do we update the measures, the codes? Research changes. The guidelines change. We need to change the measures. What is that whole process? As we move forward these are going to be some new challenges for us.
How does meaningful use in an EHR setting in this new world how does that change our measure development process? This slide shows you on the left the traditional measure development process from the review of evidence, develop clinical logic, data sources, evaluate feasibility, field test, specs. In that traditional measure development we have different specifications depending on where it is being implemented. We have heat is for health plans and we have heat is for physicians.
In the meaningful use environment a lot of the steps, the basic steps are similar, but we are going to have to after we review the evidence and develop the clinical logic we are going to have to identify the data elements that we need, where we want to pull them, what are the source codes. We need to put those into an XML format, a format that is machine readable, test with EHRs the way that Karen described the work that they have done with these incubator sites and then provide vendors standardized and encoded measure specification, machine-ready specs. We are really looking at moving into that process.
There is a proposed draft standard for what this should look like, the eMeasure or HQMF which is Health Quality Measurement Framework, and that is the model for how we are would get to this electronic measurement. It is a structured representation of the performance measures using XML to tag the elements and this is what would allow us to import data elements and measure logic into EHRs. This is the specification that needs to happen to allow the EHR vendor to say I have taken this logic and now I can spit out a report of a performance measure. What it will look like in practice is that you go from what is on the left here, which is the HEDIS specification for A1c poor control. It has all the instructions. And then on the right side that is what the XML code would look like that. I cant explain it to you, but Shane can read it.
What is our path for retooling our existing quality measures? Right now we have a number of quality measures that have been developed. They have been endorsed. They are in implementation now. Our steps are we need to make these ready for pulling from EHRs. We are actually working with PCPI and with NQF to convert the specifications to basic EHR value sets and logic using the level 1 kind of EHR that Karen described. We will be reviewing those converted specifications early next year with NQF and beginning to test those measures and incorporate them into EHRs. This depends on vendors working to incorporate those measures and for vendors to be able to report those measures out in a standard reporting framework. That would be the path to get from the measures we have today into measures that can be reported out of EHRs.
Right now this support from HHS and NQF we are actually doing this. We are beginning to convert our high priority measures to EHR ready measures. We are looking at 35 existing measures that would be available for use in 2011 going through those steps I just mentioned.
That is sort of taking what we have and making it EHR ready, but what we are really excited about are the new opportunities. With these electronic data systems we have an opportunity to really think about measurement as enhancing working on a number of different fronts. If you think about the measurement process I said, it started with the evidence development and with guidelines and then we develop measures. But what we have here in this middle of the chart you have clinical decision support, performance measurement, patient decision support, and patient education materials. With electronic systems our vision is that these things would all work together and that the guidelines and that information would be available to the clinicians and to the patients and it would be used as the basis of performance measurement and that you would use the information to track the results and track outcomes, track what happens because a lot of those guidelines arent going to apply to every patient and a number of subgroups of particular importance of low income or different racial and ethnic groups are not going to be represented in all those randomized controlled trials that are used for guideline development.
So having information that comes out of the electronic systems to help us understand and learn what is happening in the real world, and then being able to feedback that information into updating guidelines and again updating performance measures, clinical decision support, patient decision support so that you really get to an ongoing process of quality improvement that builds on as we are learning.
There are a number of areas where we are really interested in trying to use the capabilities of electronic systems to support new measurement and development, overuse, care coordination, treatment intensification, and I will talk about these in a minute. I am going to hold the discussion of care coordination because I will be talking with you about that later this afternoon.
But we wanted to talk a little bit about the priorities for meaningful use measures that have come out and where our work or the work that we are doing with the PCPI is trying to address these issues and where it is building on the opportunities that EHRs bring to us. In imaging that is where we are looking at overuse measures building on the kinds of appropriateness criteria that have already been developing. Patient experience, how can we use electronic systems for that? Some of these items actually may better fit in tools like the physician practice connections, patient-centered medical home so they become structural requirements for how EHRs are used. That applies to some other things like home monitoring and comprehensive patient data. But we are also working on readmission measures, care coordination, and things like preventive services, having comprehensive preventive services.
I wanted to just touch on a couple of these activities that we think are particularly important. First, overuse and appropriateness. With support from AHRQ, we sponsored a national working meeting in June to think about the issue of measurement of overuse and appropriateness. This is a difficult topic. Some of the early work that RAND did on measuring appropriateness of care that was done 30 years ago and it hasnt led to measures primarily because of the difficulty and the lack of feasibility of doing the detailed chart reviews and then also the reliance on an expert consensus database and the challenges of doing that.
Nonetheless given the concerns in the economy and the national priorities we heard a lot of interest in moving in this direction but that we should proceed with caution and focus on overuse measures in a particular area and we are working with PCPI on that, but also thinking about how we might take those appropriateness criteria, building on research like the work that American College of Radiology and American College of Cardiology have done, that required detailed clinical data, often detailed clinical data both from ambulatory and inpatient settings and to be able to develop new measures and think about the opportunity for measurement and decision support.
Another opportunity that we are involved in is looking at the Archimedes model, which maybe some of you are familiar with. What is interesting about this model and it is very different from the way our current HEDIS measures are set. This approach combines clinical decision support with measurement of outcomes. Instead of saying the guidelines say everybody should have a blood pressure less than 140 over 90, you calculate a specific risk score for a patient and you adjust and it provides decision support to the clinician based on that patients particular characteristics. We are working with Kaiser in Hawaii to test this.
Then some other opportunities for measurement relate to developing new outcome measures that take into account risk adjustment at the physician level and then looking at treatment intensification and the opportunities for using electronic survey models to get to patient experiences.
Finally, I wanted to mention the idea of updating measures. In some ways having electronic measures there should be some opportunities for making updating more feasible because you can make it electronically available to vendors but it depends on having a process for updating. NCQA formally reevaluates all HEDIS measures at least every three years but if there is new clinical evidence we might update it more frequently. We have updated our diabetes measures every year for the past three or four years. It is going to be important that there is support in opportunity to do that kind of updating. It is an important role for metric developers. Thank you.
MR. OPELKA: For those I havent met, Frank Opelka with the American College of Surgeons at Louisiana State University in New Orleans. Going last is great because all the important things have already been said. I will try and shorten this and hit some highlights that might perhaps open up some dialogue in some other areas for measure development to enhance specific issues for quality improvement and to help patients with patient decision making as they move forward through the healthcare system.
I am involved with several different aspects of performance measurement at home and I am in a 10-hospital safety net hospital system. We have a very elaborate ambulatory patient measurement system that pulls clinical information together on about 8 to 10 chronic diseases and then in house, in the hospital, the inpatient side, we have a very elaborate system that looks at current measures that are being used and that whole program is in a learning network across all 10 hospitals for quality improvement. That is separate from some of the other aspects I will talk about today, which is from the College of Surgeons standpoint, where we have multiple registries, some of which are very old. We have the National Trauma Registry. We have the National Cancer Data Bank that the college runs in conjunction with the American Cancer Society. We have a new registry that is out there, the Trauma Surgical Improvement Program. We have an old system that is about 10 years old for the college, but it is about 20 years old given its VA history, and that is the National Survey Quality Improvement Program. Those are all measurement systems that we have that are up and running today. Some of them have high validated, highly reliable data. Some of them have very poor data entry points and thus are problematic when you get to the point of performance measurements and quality improvement because of the problems with the data itself.
I will speak more about the procedural-based care because we have talked about global aspects of measurement and I want to just focus a little bit since there is a whole different realm of I think, performance measurement when we get into procedural-based care and it has a much harder drive for outcomes. This is much more active, aggressive, short care that needs to have some kind of aspect of outcome assessment. We are very strong proponents of those being risk adjusted.
The National Surgical Quality Improvement Program currently over 30 days captures about 130, 135 data elements. About 30 or 35 of those are just part of the risk adjustment. We have now actually taken that system and deployed it. It is in 300 hospitals across the country. We find a lot of problems with that kind of intensity of measurement.
First of all we have learned that we dont need 35 risk adjustors. We could probably get by with - in fact we have shown we can get by with about six to eight risk adjustors in the care of the surgical patient. That is a big help.
Secondly, we currently sample multiple procedures, but the sampling is only around maybe 20 or 25 percent of the total procedures that we are looking at in a hospital. One of the things we have done is said, can we assess a hospital and its physicians by decreasing the number of samples we get and increasing the number within a given particular condition. If we did 80 to 90 percent of all colons, rather than 25 percent of all colons, and picked up a whole bunch of other background noise of very low volume procedures, we have actually intensified the high volume, high risk areas in their sample size and shortened the number of data elements that we are collecting for risk adjustment, which allows us to actually create a much more I think, detailed view and a less expensive view from the hospital standpoint for performance measurement.
I think the second key area once you get into procedural-based care, and I didnt include patient safety on here because I think that one is already out there, but the one area we havent put enough focus on is appropriateness. There are huge challenges in appropriateness of care, but perhaps we are trying to take too much of the elephant. Perhaps we need to take a smaller bite and get down to something that is actually more easily collected.
Then finally I think if you are going to put this into some meaningful aspect on the patient side and perhaps other stakeholders in this, it is to create a composite and put composites together to try and bring it all together on a particular condition for a patient in the procedural-based world.
I have mentioned the NSQIP. It is facility level measurement at this point in time. We have done some modeling at provider level measurement, but you really run into sample-sized problems. If somebody has only done 8 or 10 of something in a given year, can we reliably talk about performance and can we reliably assess where quality improvement efforts need to be trained in one individual? We have not found that to be successful. But if we take the entire group of people in a given hospital who are doing this we can literally make movements for huge efforts in quality improvement. We have demonstrated that now by creating several different learning networks across many of our hospitals that are in the same region. We have a learning network in Tennessee. They all collaborate. They see their own data and then they see the group aggregate data and it has made huge efforts towards quality improvement and enormous cost savings.
If you were today to take all the complications that NSQIP has prevented in 300 hospitals, and put those in the major hospitals across the country, you would be saving somewhere in the neighborhood of $50 billion to $80 billion per year in this country in the cost savings from quality improvement. That is not hitting every hospital. That is just hitting some of the major hospitals. The potential savings are astronomical.
What comes from this though most importantly, is this is trusted and meaningful to the providers. The reports they get and put in their hands are actionable. In measure development we really have got to be thinking in terms of actionable measures. We have measures out there now in perioperative antibiotic use and then we try to correlate that with SSI, Surgical Site Infection. Well, there are so many other drivers for surgical site infection, perioperative glucose control being one of them, but the patients other comorbidities, all of those factors, the surgical skill and judgment, which we dont have a good way of measuring today.
When we look at antibiotics and SSI and we try to make that correlation since it doesnt correlate to the provider there is not a huge effort in there to correct it. It is more to the hospital get this standardized and reliable fashion but it may not have the impact we are looking in SSI. In looking at the current CMS data the fact it is showing there is no correlation. Great antibiotics still had significant surgical site infections. Poor antibiotic usage didnt necessarily have significant site infections. We have to make sure that what we are measuring meets that level of appropriateness and meaningfulness at the provider level.
We talked about a lot of the comments have been about updating all the aspects of measures and how we put them together. The risk adjustment actually requires updating. We have looked at the NSQIP today and we went in and we looked at the beta weighting of all these different data elements that are collected. They were based on a VA population and there in the VA population albumin was a huge clinical driver of outcome, but it is a poorly nourished population that has a lot of alcohol abuse and it certainly does show up that albumin is a key driver and it gets heavily weighted. So that beta weighting needed to be changed to meet a particular market that is being assessed for that improvement and those kinds of changes are ongoing if you look at the different variables that are a part of the risk adjustment.
A couple of comments on appropriateness. I think the best way to approach this is to be condition specific, evidence based. From a procedural basis I have thrown up two examples just to think about this and Bernie already mentioned this. Spine surgery for low back pain after failed medical trial in patients with no urgent triggers. There is a definition of appropriateness and we could enhance that through a panel of experts with the evidence behind it, but that is the kind of thinking that if we put that out there as a measure of appropriateness that we think that that will actually improve the quality of care that people get at the right care at the right time for the right reason.
Cholecystectomy, a very common operation. But if you look from laparoscopic surgery forward, there has been a change in cholecystectomy. We used to operate on gall bladders that had gall stones and if you were diabetic and symptomatical. If you were diabetic we did it if it was asymptomatic because of the problems in discerning symptoms in the diabetic. If you were in the ICU and you septically ill and you had gallstones and an inflammation of your gall bladder, we called this acalculus cholecystitis and it was an indication for surgery.
Today a lot of patients present with right lower quadrant pain and they have biliary dyskinesia. There are no stones present. There is some mild thickening of the gall bladder. They might or might not have a gall bladder ejection fraction determined and they might or might not have that gall bladder ejection fraction related to reproduction of their symptoms. But there is a lot of cholecystectomies performed for this acalculus cholecystitis and in fact that is an area that we should explore for - is that most appropriate. Do we have the parameters for defining that in a most appropriate fashion? How does this level of appropriateness merge and fit into the issues of comparative effectiveness and how do you then push that into patient shared decision making. These measures are being developed for the entire downstream effect and the point of care and I think it makes a clinical difference for patients and it is a big gap in what we do.
Patient specific factors, however, really confound us. That is where appropriateness gets to be a mess. If somebody has an inguinal hernia, you could say well it is appropriate to operate on someone with an inguinal hernia, but then you start to change the story. They have an ejection fraction in their heart of less than 20 percent. They are 90 years old. They are bed ridden and that hernia never bothers them. Perhaps its not appropriate to operate on that hernia.
Those factors in trying to develop measures for that level of detail I think will fail. There are just too many confounding variables to try to put that together. But a good beginning is to start with condition specific evidence base appropriateness. I think that can be done in a meaningful way.
I also think that when we start with developing measures particularly in areas of procedures, we have to look at this as it all comes together so that when we are beginning the measures we can see how it is actually going to complement and fulfill what that patient needs and their decision making, what other stakeholders are looking at and want to see for the patient. There is value in structure from a procedural basis. That value may be volume driven. We know that there are some procedures out there that have volume association with them and we have to be mindful and respectful of that on behalf of representing that to patients.
The process measures. There is not a lot of enthusiasm for these because we cant cleanly tie them to outcomes, but it is still a good important step in creating standards and reliability and care. It has a role but I dont weight as heavily as I would risk adjusted outcomes. However, I still think that measure is at the facility level. When you build the composite the first two structure and process are provider level and some facility level but outcomes are more facility level based.
We have completed and just put into the public domain or it is about to be launched into the public domain through AHRQ, a surgical CAHPS, which is much more specific for the surgical patient. It was developed by those patients through a proper facilitator to represent those things that they are seeking and not the general CAHPS that you see in the hospital, which if I tell you much about the parking and the painting but not so much about the expected outcomes and did we achieve that and were you apprised of what you thought you needed to be apprised of when you went into your operation now after your operation. And then the appropriateness I mentioned, and then finally efficiency.
Currently under the NSQIP forum we can measure efficiency. We can add in appropriateness either within the registry itself today or we can add it in a parallel registry that is about to be launched in conjunction with NSQIP, which is referred to as a Surgical Quality Alliance registry. But these are different pathways than these other pathways we are talking about for measures, and yet all these pathways are important. We have to bring all of this together to represent the structure and process with these risk-adjusted outcome measures.
That completes my remarks about this. The last thing I would say is that I think from a procedural basis as we develop the measures not just based on priorities that we set but this has to be in conjunction with the payor community. We have gotten a lot of information and help by sitting down with payors and asking them specifically even market by market, where are they seeing variance, where are they seeing gaps, and how do we partner with the measures that they have to complement those with the clinical measures?
We are working with the vendors who run the current clinical databases and all of those now are dropping patches into the EHRs that run through the practice management system, to at least bring in some of the initial data. We can pull in all the demographic data and reduce the burden of data aggregation through those patches. That is working. To get it to the point where we can pull in necessary clinical information such as in NSQIP, we have actually had to use certified nurses who are certified to the process to do that data entry so that we have auditable, reliable integrity in the database. Thank you.
DR. CARR: Thanks very much. A really excellent presentations and remarkable synergy in terms of where we are headed. I would like to open it up to questions now and I will start with Paul.
DR. TANG: Thanks to the panel for excellent discussion. Maybe starting with Bernie in terms of some of the measures, what do you see the timeline or maybe the process of transitioning from the kinds of well, actually I think Karen addressed that in terms of you are working on the EHR-derived measures.
One topic that came up earlier from Helen Burstin actually was the whole ACE/ARB. Lets say you have the ACE/ ARB measure. How would you propose we account for lets the ACEs on the formulary but the ARB is not and they have an adverse reaction to the ACE? Is there a way to account for those kinds of things or how are you viewing that in terms of measured definitions?
DR. KMETIK: I dont have the specifications in front of me but I am pretty sure we acknowledge that with ACE/ARB there is a way to document intolerance.
DR. TANG: I really liked your matrix where you talked about checking off where a measure was with respect to the meaningful use. Have you by any chance done that for all of the MU measures?
DR. KMETIK: We are in the process. I would be happy to share that. I think it makes an important point -if I could elaborate on that a little bit, which is we all appreciate the need for deadlines, but to arbitrarily sort of and I dont know if it is arbitrary but to pick a date and say thats when we go live is troubling because we want to all be assured that we are there and I think that tracking exercise helps to acknowledge where are we and we can focus our efforts on the things that still need to happen and let that drive the appropriate dates of things.
DR. TANG: I got to go back and say fine, that Medicare trust fund is going bankrupt and it is fine to say lets wait until we are ready. How do we make that compromise with when do we think at the current pace we can be ready versus when could we be made to be ready?
DR. KMETIK: Im not saying we sit back and just wait until it all happens. I am just saying that laying it out that way I think, makes it pretty transparent to all of us and in some cases we are going to need to set some dates and say this what we need to achieve by then. But without that specificity, you tend to get people a little frustrated sometimes and they will throw up their hands in frustration. I think it is better to articulate this is the goal this is the goal we want to reach. Here is where we are in different things so lets try to put some aggressive but reasonable dates to each aspect of that because without the realities sort of tracking mechanisms, it is hard to engage in a conversation.
DR. ROSOF: I can also say that this is a resource intense kind of work especially with the extensive number of measures we have on our plate, plus the measures that the consumer purchases would like to put forward, et cetera. Adequate funding obviously is an issue to help Karen along in this quest.
DR. MIDDLETON: Let me join Paul in thanking you all for really great presentations and thanks for all for coming. I guess I am sort of sitting here wondering where is the break through. If each of you could respond perhaps with what you see as the top one or two or three, break through opportunities for us to develop measures, to accelerate the development of measures, to implement measures. I am feeling a little under whelmed so far, in the sense that we have a lot of activities going on. We certainly know that we havent impacted to a large degree yet, behavior change and change quality in outcomes in cost of care. Try to be a little bit provocative, but gently, what are the top three breakthroughs you would like to see in your own work or the work of others?
MR. OPELKA: I think that is a great question and in fact I think we have asked that of ourselves many times. I think that one very important breakthrough that we have to have is harmonization across the payors. Its just not working to have multiple different payors hitting these providers with multiple different quality programs. It is highly inefficient. If we want to talk about removing efficiencies or improving efficiencies in medicine, let us not be the cause of more inefficiency at this process. We lose credibility with the providers immediately. They already see these unfunded mandates coming in as one more burden and its not an opportunity for quality improvement. They really want to give good care. I have never met anyone in medicine who didnt want to give the best care everyday and if these things come in and they are just burdensome and they get one from United, from CIGNA, from Blue Cross, from WellPoint, from all the different plans and then there is CMS. They are not harmonized. That is not helping the American patient one bit.
I think that is going to take government intervention to tell everyone you are going to get in a room. We are going to get this together. It is very important.
Secondly, just from the surgical perspective, we do have - its not compelling evidence, it is overwhelming evidence, that we have improved the quality of care in the NSQIP hospital. It is huge. It is not longer denial. The return of the investment is also there. Business case. So the up from funding is returned. Good quality care is cheaper care and not only that, there is massive buy up that the surgeons love this. There is information that comes back to them is scientific that they see risk-adjusted data that is being shared. They see themselves. They see their hospital, and then they see here are the high-performing hospitals, here are the low-performing hospitals, and there is this gaggle in the middle. But that starts the learning network and the quality-improvement process.
I would say there is an upfront investment to make that happen and I think that is also an all payor responsibility. I dont think it is a government responsibility alone but I think it is an all payor responsibility.
MS. SCHOLLE: When I think about the breakthrough I think about two pieces. One piece is having the information that we are collecting be something that a clinician can use and can explain to a patient. That is the piece of trying to look at how these measures can tie to decision support and could be reported out to patients. Choosing a small set of measures that you can really build into, get out of the problems of free text and other stuff so that you can actually get a report that is meaningful to the clinician that there is decision support that can help guide that clinician and that they can also use that to help patients understand what their care needs to be I think is an important piece.
That can only happen when you have an HIT system that can support the collection, the use of these data elements into a measure and when you have a workflow process that supports that and the work flow that staff that are trained that have responsibility for printing out the report or for entering the data and using that. Our experience as we have been looking at measurement with the EHRs is that those are the critical first steps and so that happens by having structural measures that say this is what the system should be capable of doing and these are the tasks that the team needs to be able to perform and need to be assigned so that you can get those reports that they can build into your decision support that they can be shared with people so everybody is working on the same page and maybe trying to do that on a few really good measures that makes sense to everybody would help us to understand how to do this bigger and better for different commissions.
DR. ROSOF: So as Frank said that is a very good question and it is a question that I asked when we started doing the work on overuse not knowing where to start. I started with the clinicians and specialty societies. Their response to me was try to eliminate the confusion. Try to develop national standards that would be agreed upon by the providers, by the payors, the consumers, the specialty societies, and the boards. Yes, you can build that into my workflow. That would be terrific so that I dont have to do other things during the day but it would harmonize current performance measures and perhaps I could use the same for my maintenance of certification as I would for my pay for performance, PQRI, et cetera.
That whole issue of confusion amongst the already overburdened clinician is something that we would really like to accomplish. If I could look at one thing, I would say lets try to develop national standards agreed upon by all those stakeholders.
DR. MIDDLETON: This is really terrific because I think you guys are laying out kind of the inner thinking behind your presented thinking, if you will. I wanted to follow up Frank with you. In a way we had a conversation this morning with David Reuben from the UCLA and ABIM, and it was very interesting to think about how in fact measurement of physician performance through quality directly or indirectly through other means, process assessments, what have you, may raise the issue of whether or not we are establishing a floor of competency or whether or not the process can be turned on its head to establish a goal, an aspiration or reach or stretch goal, which then changes the dynamics, not only the human dynamic and the psychology, but also the measurement process in responding to measurement and assessment as a tool for me to continue self improving and lifelong learning and a learning network as you described, as opposed to the kick in the butt when necessary if underperforming or over performing untoward ways.
I wondered if you could comment on the learning network and whether or not there is an individual component to this, which has this feature or this characteristic or whether or not you think something like that might be possible?
MR. OPELKA: I have had this discussion many times and with many of our stakeholders and it is fascinating, one of the stakeholders Peter Lee, he and I came to the same conclusion that at a national level we needed a baseline. We needed the floor. But at a personal level, we needed the upper 10 percent, so you have to do both. I dont think we can escape that. I think all of us want to know where the floor is, that you have to be above this. If you are the outlier on the low end for a particular condition or treatment of that condition, it has to be identified and the public has a right to know. This is their health. And at the high end, we need that so that we can get into those systems, look at those systems, and see what is happening.
What is so different about that system because most of those successes are not one individual who is doing something absolutely extraordinarily great and miraculous. It is the fact that they put together those systems that created the standardization and reliability. If you look at what Geisinger does that is so good, all their goals and all their targets are standardization and reliability. It is nothing more than that and they let performance of the individuals distinguish themselves beyond that. I think you have to have both.
The reason you have the learning network isnt so much to ferret that out because that is intuitive I think in everybody. Nobody wants to be on the bottom. Everyone wants to be on the top. But we all are coming to work already thinking we are doing the right thing. Medicine moves and changes so fast, particularly in this day, that we have to teach each other on a continuous basis going to a monthly or even a semi-annual society in listening to some of the latest and greatest doesnt translate into getting back into home.
I think the quality improvement efforts have to be driven by these measures that these are specific to where our issues are and then we have to own them. We need a team effort to own that to figure out how we are going to change. It is very hard for an individual who fully believes in what they are doing to change a law. You need the learning network.
DR. KMETIK: I want to just echo the themes and maybe say in terms of a breakthrough. I do worry that we are missing in our vocabulary today in our conversations about meaningful use et cetera, both these pieces that I think you are getting at Blackford, and that Frank has articulated. I worry that we have had past national programs where the emphasis has been on get the data somewhere and we have done that. We have done that exporting.
That is done pretty little to get timely data into the hands of physicians. I think the vocabulary needs to make sure we are giving those two things equal weights. We got to get the data in the hands of the physicians. They are going to use is as Frank said. If it is good data they are going to react to it. We understand the need for exporting data and sending it to others but if that is all we do and we havent built this first, we have missed the golden opportunity but we have just replicated what we did with claims and that would be so frustrating I think to all of us if we look back five years from now.
DR. TANG: I am picking up on Blackfords and I really liked this last discussion. I think Frank answered the question about why the floor because I challenged ABIM on lets stop raising the floor and lets start shooting because I find that individuals and is to your point of facility versus individuals, maybe the floor is for the facility but the individuals react very positively embracing on even transparent measures about themselves and that has been our experience.
The breakthrough might be to make sure that our quality measures align with the physicians line, which is towards the individual. I am finding less and I can understand why we developed failure quality measures. I just dont see much room for them anymore. An example is the hemoglobin A1c greater than nine. It doesnt tell me any way - it doesnt tell me how close I am to the goal of less than seven. It seems like we want more measures that help individuals because I think the public reporting and that is what the evidence shows is that consumers maybe because we dont present them in a usable way, but that is not what is changing their choice. It is actually changing the docs behavior to more improve it. Maybe that is an example of an improvement. Design the quality of measures to help docs perform at a higher level.
The second potential breakthrough and this in reversed order because Floyd is going to talk about the quality dataset and I know some of you are familiar and some of you arent. Do you think that kind of - because you talked about efficiency and each health planning to have their own definition, et cetera, and the problem of getting that into the EHRs and clinical decision support. Do you think the idea of a QDS, a quality data set, that all your organizations could draw from would be one of those breakthroughs to the extent that you understand what is going on there.
MR. OPELKA: Let me just ask a question to help me answer that, Paul. If I were to get a feed that gave me a list of data metrics, that feed could then be mined locally even further.
DR. TANG: It is more that the - somebody showed the supply chain and I dont know if that was Helen from before. Was it Helen? Okay. You start out with creating the evidence, to creating the guidelines, to creating the things that would affect behaviors like the rule sets, clinical decision support to creating the quality and the performance measurements and then feedback. Each one of those steps is done by different part of our sector and they all invent their form of the data definitions and the measured definitions. Wouldnt it be great if we had one place where everyone could go and draw out from the very start of the design of the protocol and the data you collect all through and spit out the quality measures. In that sense that would facilitate the data and hopefully would agree on 6 instead of 35 data elements to feed into your registry, and case adjust these things and spit out the quality measures that would go back to the EHR. Maybe that is the bi-directional aspect. The outcomes that you get from your registered data could sit in front of my face at the time I am making these decisions.
MR. OPELKA: I am thinking of two things with this right now. I have to measure data sets that one is this LSU data set on, take my diabetics. Our diabetic measures are pulled from the lab. The hemoglobin A1c is pulled from the lab and we track less than 7, 7.5, 8, 8.5 all the way up to 10. When we report we dont say anything. We give an individual doc all their scores and the system has all their scores, but the system then was allowed those interested parties to mine the data further. When we mine the data further is where we found the answer.
What we noted in our own population was -- just as an example, if you were over 65 in our system your hemoglobin A1c tended to be pretty well controlled. You were more mature. You paid attention. You read other problems. Your doctors were important to you but for the group that was under 35, which just happened to be the musicians of New Orleans, the lifeblood of the city who drink and smoke all night long but make beautiful music, none of them cared much about their diabetes. Then we put in improvement programs focused. In this instance performance measurement to physicians meant something because it came home to treat and make a patient better.
Looking at that data if the data allows me to re-mine it and the NSQIP data does give you a dump and you can do a lot of mining and re-looking at things. If there was a system whether it had the parameters you could pick and choose from allowing you to get the other analytics so you can do a deeper dive. As you know that is how physicians think. Okay you give me this but now answer this, this, and this.
DR. TANG: I think that is exactly right. The QDS would allow you to have this is an A1c, this is what it means, this is how I calculated it and how you interpret it so you can do that mining that you suggested. The breakthrough I think I am thinking about is our quality reports the quality measures we currently report on actually take away that ability and that is the fallacy. I think the breakthrough is to report differently. In our organization where we are improving quarter by quarter because of the timely reporting back to the docs we have a separate team to create the measures that we use and then a measure that we have to report to all the various definitions that the health plans have and that seems counterproductive. Maybe the breakthrough is pay attention to what it is that will change the physician behavior. I think the rest will follow versus the other way around. Report to health plans and then expect physicians and patients to change.
MR. REYNOLDS: Frank, you commented earlier about the payors. The quality of whether somebody has quality shouldnt decide on whether they are a blue patient or an patient or something else. As I sat and listened to this and I do a lot of standards work, I sit and listen to this and I hear we a lot around the table, and every single presentation has been fascinating to me, but each one of them has a different slant and even the most recent discussion in Pauls comment about what they do in their institution.
How do we take what all of you are working on and again as you look at the chart from ONC and you look at the states that everybody in every state is asking for money to go ahead and implement these things for all their docs and everything else. What is the process that you all see that allows this to be raised above whether it is payor provider or anything else, and you set the quality standards and you have it in an open forum and as you say it is agreed to like a lot of the other standards that go on and then once that is done then - so for example as a payor, Im not speaking for our company, but it would be awful nice if there was a set of those out there that everybody agreed on that was the right thing and oh by the way it played for Medicare, it played for Medicaid, it played for everybody else and you can use it and improve North Carolina one patient at a time, regardless of who they are covered with.
But in listening each one of you is doing something really good and even some of our members on the panel are doing good things, but left as it is then others are going to come and say, here is what we are going to do or here is what a state is going to do and everything else. That is what worries about getting to the meaningful use and getting to the things that are going on is that really, really good work but no process to actually vetted in an environment. Being NCVHS we get stuff all the time. Not recommending that there would be it but it is vetted. It is vetted in one environment. It is vetted in a way that is agreed to, and then once it is agreed to it could be adopted by HHS or others, and that is how we start rallying around.
I chair a committee of all the payors on eligibility claim status, the same kind of thing. You know you get the one. Any comments you could make because that could be a way NCVHS could make some recommendations as to how to help you get to that point, if that is a worthwhile point. If not then it is probably going to be one state, one payor, one issue, one situation at a time, and thats not going to get us any of us I think where we all trying to be when you put up any of these thoughts.
MS. SCHOLLE: I think that is the major challenge. We have had with HEDIS, is that everybody wants to make a little bit of change specific to their state or their population, and it is really hard to get people to agree to a standard and working with NQF to get to endorse measures should really help that. But I think this idea of having standards and then getting vendors to agree to put it in because we have had the vendors dont want to put the resources into any kind of measurement standard unless they think this is going to work for everybody.
MR. REYNOLDS: Where do you see the process that says vendors you should do this? I know some of you talked about meeting with vendors, but where is the one body that says okay, this is it guys, go do it?
DR. TANG: We had a discussion of this at lunch. This moment or opportunity where the vendors all said, well our customers arent asking. Well all of a sudden because of the meaningful use dollars the customers are asking the vendors, which the measure developers could never get the vendors to do. Now the vendors have to listen to the measure developers because
MR. REYNOLDS: If the measures are accepted across the board as thats what it is and lets go.
DR. TANG: That is our moment of opportunity and I think maybe this hearing is a plea to the measure developers and the people on that supply chain, hey, this is time to hit the breakthrough measures and meaningful use is the excuse to do it.
DR. ROSOF: This is not a new discussion, obviously. When I talk as a call for national standards it was the discussion. At the beginning of the effort at health reform this discussion began. Who is going to be the organizer of national standards? Who is the recipient that is going to make the rules that this is the national standard for X, Y, and Z or for the vendor situation?
That led to a lot of confusion, internal concerns, et cetera and still not resolved. There are some answers that could be accomplished. Organizationally this could be accomplished within the framework that is already established; however, it would have to be agreed upon by so many stakeholders that unless there is somebody who says you must do it. Unless there is somebody who says national standards are the gold standard, its just not going to happen meaning the discussion started perhaps with national quality forum being the individual who would be the group that would handle that that created a lot of discussion and didnt go to where it was going to go. The same thing with the national standards related to development of measures. PCPI could have that role. There is no question about that. It has the background with the medical specialty society, the clinical side, the methodology side, et cetera. It is a group that should have that responsibility. It would have to be agreed upon by so many stakeholders that it becomes it is not a forum where you say yes or no and 50 percent you win or 60 percent like Senate. We will see.
DR. CARR: A quick follow on and then Larry
DR. MIDDLETON: Just two quick thoughts. Just a quick reaction is sufficient. In a way I am struck with the assertion that the physicians of course are interested in providing high-quality care and aim always to do so but of course the result isnt that. I am concerned about we are developing a new measure of framework and all the stuff we have been talking about but absent a sustainable business case for quality. I am just concerned its not going to happen. Not to suggest that docs are mercenaries who are working only for quality or the almighty dollar but could you each react to you what is the sustainable business case for quality post stimulus?
MR. OPELKA: I think you have to stop paying for volume. If you are going to have a quality system and you want this transparency in all the key pillars of healthcare reform, there has to be a realignment of incentives across the board. Until you do that I dont think all of this stuff will go exactly where it has to go and be as successful as it can. I also think you must address the issues of defensive medicine. That if you are going to incentivize me and my practice by $2000 of PQRI money and I put myself at risk for a $20,000 increase year after year in my malpractice because I no longer practice defensive medicine, and I was caught in a court system that isnt educated to quality metrics, it is easy for me to decide which risk I am going to take. I am going to keep my malpractice insurance rather than $2000 on quality. That isnt necessarily something I bought into as the best quality.
You have to have meaningful quality and really something that can at the local level create quality improvement, excitement about the practice of medicine because of the state and this information. You have to have aligned incentives that have to include payment realignment that fits that and you have alignment incentives that correct the disparities that we see in the way we practice defensive medicine.
MS. SCHOLLE: I think you also have to think about the accountability not just being with an individual clinician for an individual patient of individual point in time that care happens over time to patients and they are getting care from different people. The accountability needs to recognize that its not just the surgeon. It is the primary care physician, the hospital, the health plan, that are all taking responsible for the care for this patient. That is the other kind of financial alignment that needs to happen particularly since some of the measurement that you might want to do at the clinician level it is useful for quality improvement, but in terms of knowing it is useful to let that clinician know what is happening with the patients but to really get something that is stable, you are going to need a lot more observations.
Like Frank was talking some things make more sense at the facility level. Some things are going to make more sense at a community level to try to understand how well the community is doing; the community of healthcare providers and the healthcare system are doing for a population of patients.
DR. KMETIK: I will just add a little bit to Larrys previous point as well. I think we should use this moment of meaningful use. We should leverage the heck out of it in a thoughtful way because it has enabled us at least to begin to have a different dialogue with the vendors then before, as Paul said, step one.
Another important step though is I think we can begin to have the different conversation with the payors because as opposed to an old conversation of you want to merge your big databases that you have. It is a new day. We are both getting data we never had before. If done right, the physician has data they have never seen before at the moment of the patient care and the plans, the payors, they never had cancer staging data before. They never had prescription data to marry with dispense to see is the patient picking up the drug or not. Again, our vocabulary maybe needs to have a shift to say what do we all win here if we do it right as opposed to just we are going to get backed and that is what you have to do. What is the thing where we all win and it is information we never had access to before and that is even important to the plans.
DR. GREEN: I want to try something here. I want to ask all of you to respond to questions that in my mind tie back to prior NCVHS work, the work of several of our subcommittees over the last few years. All the questions I am going to ask are capable of being answered with yes or no. The first question also I built off of Sarahs slide for she had the word dream and she also had another slide that had the word stewardship on it. This question would be are you experiencing in the current measured development endorsement and adoption process difficulties that relate to privacy confidentiality or a research community would call IRB issues. Yes or no? Or is this a nonexistent problem. Privacy and confidentiality are not in play here.
DR. ROSOF: The answer to your first question is no.
MR. OPELKA: Yes.
MS. SCHOLLE: Yes.
DR. GREEN: Two yess, two nos. Then we in the last hour and a half heard Harrys we. Does the we share a common data model? The we of the vendors, data developers, the payors, the purchasers, the government, providers, et cetera. As we massage this thing anywhere lays a common data model. We are all working on that type of data model.
MR. OPELKA: No.
DR. GREEN: No, that doesnt exist. Then what was not said in this section is there current active work on measured development endorsement and adoption of measures of function or a patient feeling like they got some relief? Four yess. Thank you.
DR. TANG: As a closing comment I just want to thank this panel so much because I think there was such really good information and discussion. The other part is maybe on an upbeat note. Bernie said that you need this power. I think that you have more power than you realize because of the moment and it really is leveraged through the meaningful use and really NQF because it is a voluntary consensus standards organization, CMS has to use those when they exist. The measure developers have a lot of power at the moment. Maybe that is the upbeat way of saying this. I think the work is really good and to just to accelerate it not because it is comfortable to but because I think the country needs it.
DR. ROSOF: Some need the encouragement to utilize that power.
DR. CARR: With that I think we will just take a 10-minute break and then reconvene for Floyd and Blackford and I add my thanks to this group. I know you made this timing work and I appreciate it very much and thank you so much for enlightening us.
(Break)
Agenda Item: Building Meaningful Measures - Adoptability
DR. CARR: We are running a bit behind but really very enriched for all the discussion. The next session is building meaningful measures adaptability and Floyd and our own Blackford Middleton. Floyd Eisenberg and Blackford Middleton are going to present. We will start with Floyd and I suspect we are going to run over. Do we have Kathy McDonald on the line yet? Not yet. We can just let her know that we are probably running 25 minutes behind but looking forward to her testimony and if that is a problem for her just let us know.
MR. EISENBERG: Thank you very much. I thank you for the ability to speak here today and I am Floyd Eisenberg, Senior Vice President of Health Information Technology at National Quality Forum. I am very pleased to discuss the topic of building meaningful measures and adoptability. Just in our last discussion I heard a lot about vendor participation with using measures and implementing them and also is there a model of information. What I am going to be addressing in this presentation is pretty much a model of information for quality. It was developed by the Health Information Technology Expert Panel, chaired by your own Paul Tang, and one of our work group co-chairs was Blackford Middleton. I feel like I am home here. It was funded by the Agency for Healthcare Research and Quality, where iteration as the health IT panel.
Our original goal, the task we were given, was to develop a quality data set and to identify workflow for quality in a clinical setting. I will be discussing that with you as well as then how do we get this implemented, which is the rest of the presentation.
To start with, in thinking about quality we call this a quality data set thinking of the way if you remember when Helen Burstin presented this morning the quality tree, everything starts with guidelines and evidence and the branch points are where guidelines recommend different components and the leaves are the measures. It starts with the study designer, the guideline developers that start with a concept. This is all about diabetes, as your example. As we look at that we want to understand what do we mean by diabetes and here the measure developers often provide a list of codes to say what is diabetes. In the example provided it is ICD-9, could easily SNOMED, and will be SNOMED at not too distant future. the list of codes that represents diabetes.
In order to provide additional meaning for any part of the measure I need to know that there is an active condition of diabetes. I have now merged together a concept, the codes, and the data type, the type of data that is active. It is a problem or a condition and it is active. All of these together make up a model of what we call a QDS element or a quality data set element. That element then can be stored and properly identified, be reused in a database of other quality data elements so any time we need to understand for a measure in a numerator or a denominator the use of diabetes active diagnosis that combination of code list, data type, concept together as a QDS element can be identified and reused. Understand I am talking here about the quality side. I have had a lot of interest from those in public health for public health reporting and public health identification of illness, as well as a research side either folks working with the same concept concern at National Cancer Institute and elsewhere. The quality data set in these elements can be reused for data out of electronic records and electronic data streams for multiple uses not just quality, but we are going to be addressing quality mostly here.
Once I have done that I now have my quality data element and in the context of a measure I want to know where is it coming and what is its source. What is the workflow of it? When we talked about workflow in the HITEP we were talking about the source of the data, that is, the originator. It could be a device. It is a reading of a blood pressure off of a blood pressure monitor. The recorder could also be the device electronically. It could be a clinician and it could be the source as a patient and the recorder as the patient. We are talking about those two elements.
We also want to identify in the flow what is the setting in which it occurred. Is it in a hospital, in a home, ambulatory care setting, skilled nursing setting? And in what health record field do I expect to find the information especially if I am submitting data from one EHR to another, and in this case we want to know that this is an active diagnosis of diabetes on a problem list. The data flow would identify the location and also the source and health record field. Once I have that multiple ones of these, multiple elements make up a measure and those measures then each are stored in a measured database.
In order to, again, looking at the model I have the quality data set elements on your right, the measured database on the left, which is composing those into measures and each of those measures then can be used to manage measures in EHRs, registries, health information exchanges, et cetera. I give credit to Danny Rosenthal, who is tremendous with graphics and he did a very good job on this.
Once we have identified all that and we have our data types. If I were thinking of something like medication the medication is represented by a set of codes and it is either medication administered, which would be one data type medication ordered or prescribed being another, each one of our concepts has multiple data types. If I look at those as the center bar in this particular drawing, that is the kind of the Rosetta Stone, the quality data set against with measure developers and identifying the elements for the denominators and their numerators for those providing clinical decision support to identify how to define my population and what intervention should have occurred can reuse that quality data set rather than an EHR vendor on the other side looking at a text set of criteria in a measure reading every sentence in every line and trying to figure out what that means in their EHR can identify I know this is a QDS element. This is active problem of diabetes or just active problem. I know where active problem is in my EHR. EHR 1 has it mapped once, and anytime I have an active problem whether it is diabetes, congestive heart failure, whatever, I know I expect to find it in that same location in EHR 1.
That might be slightly different in EHR 2 because of some innovation in a way that is configured but they handle problems in a standard way also. Once they mapped that to the active problem then they have mapped to the Rosetta Stone on your right. The measure developers and users and the seekers of information, map to it from your left. Using that same model without having to individually look for the same element every time there is an element in a measure or a guideline or somewhere else, to go find it individually in EHR as long as you find it in the center on the quality data element and the quality data type then you are able to do that mapping much more easily.
In order to do that we had to link. The next step to implement is how do I link each of these data elements to a model of information that EHRs might be using. Now HITSP has done a lot of work on this effort. The Quality Interoperability Specification, known affectionately sort of by some of us as ISO6, it has a table in the appendix, which we will be going to panel, I think in the next month, that identifies all of the standard elements, the HITEP data elements, a definition for the HITEP quality data types, and identifies where in an interoperable way that information would be found in electronic setting and it identifies the terminology that most effectively could be used to do that.
HITSPs terms for that would be C83, which is the location in a structured document, C80, which is the vocabulary. There will be new numbers for after comment. I wont comment on them. I think it is C154 is going to replace one of those but lets not confuse the issue.
This was to take the HITEP elements and definitions, make sure there is a setting in the record that represents each of those. A measure developer doesnt have to understand the entire model of electronic records but understands the data type. The electronic record doesnt have to understand everything quality developers are thinking about. They know where that information is mapped to and they can deal with that. That is the process. The table is obviously longer than two rows, quite a bit longer, but I have just showed that as an example.
What we did find was in the HITSP side there were some gaps. Some of it was in thinking about transferring data and using data for individual patient care the discussions and standards go so far but when we think about sometimes we need more granularity where it can be free text and it doesnt have to be specifically defined for individual care as long as they know its in a packet that deals with physical examination.
It does matter if which part of the examination should be identified I SNOMED and which in LOINC or is there a difference because right now what HITSP will tell its one or the other and what the EHR vendors told us is which one and when. There is increasing granularity that needs to be built into the standards work and that is where there is some more harmonization and specification that is required that was one example.
There are also areas where there isnt a good clear evidence of where that should be or is in an EHR setting and some of that patient functional status, a functional status survey exam. Where do I include that? Its not necessarily in the same place in any one electronic setting that needs work; the patients care experience, the providers care experience both of which are HITEP data types, communication to and from the patient. Devices. There are standards for devices but sometimes measures are actually looking for the use of devices where there are no standards. One of the examples was TED hose or antithrombotic devices put on the leg. There wasnt a good standard for that. That is where there are some gaps.
To understand a patient has declined without asking a physician to enter a code to say the patient refused this or declined it, how do we get that information one from the patient and two how do we standardize how that is represented so we can reuse it or treatment offered, which should happen before it is declined we would hope.
Another area where there is a gap is we do have and Blackford shared for NQF a team on structural measures so we have a number of those. Most of them are basically indicating I have the capability in my office that I can do this procedure that I can do ePrescribing and sometimes I can identify when each patient left the office. Did I sent any prescription or did I not and the reason?
What our next step and this is one of the panels that NQF will be calling for nominees very shortly is looking for folks who can tell us what kinds of information can I get out of routine EHR utilization to determine that the EHR was used for this purpose. Rather than have a doctor have to put in a code to say I just wrote an ePrescription, which sounds like double work, how do I just get that stream of how many of those happen and how many prescriptions were written directly out of the EHR? The reason for that is to identify if we think of meaningful use, what are you actually doing with your EHR? Are you using it? That was a gap that we are working toward filling.
In order to then now take the quality data types and allow them to be built into measures this is a very early picture of a prototype. This is not what the tool will eventually look like but to try to give an example of what would happen is a measure developer will be deciding on a new measure. They will say I have a new measure, in this case hemoglobin A1c management. They will look through a QDS dictionary where there already are quality data set elements and they can pick one, see that it is there, the hemoglobin A1c. It is a lab test. It is identified by LOINC codes. Here are the codes. If it doesnt exist they can then add it here. All labs in this tool should be in LOINC and then the ability to add them in so that they can add elements that dont exist or reuse those that dont. Once that is selected all of the quality data elements for this particular measure will be down in this box to show here is everything related and there will be a place for describing the logic of what goes in numerator, what gets added, the mathematical operators to fit those together.
The intent is once that is all defined is to put that into an electronic format. In the background this measure authorizing tool will create an electronic measure. There is now going through HL7. It has gotten through ballot. It has not had a final vote to determine if it is accepted or goes to second ballot but is an electronic measure. It has a representation of all the elements within the measure mapped to the reference information model, the model of information within EHRs and all of the measure elements come from the data types. It then provides logic and with it provides a human readable style sheet and an XML version from which EHRs can then extract the components they need to implement the measure. That is now in progress. We will be testing that in the next couple of months with some meaningful use measures that are being retooled.
Let me just mention on that and I think I will talk about that more tomorrow. There are 72 measures that we were asked to have retooled with existing measure developers. The original stewards of those measures who understand the logic, the evidence and what went into it to create them more into a QDS format and electronic format meaning that hemoglobin A1c greater than nine becomes hemoglobin A1c show me the value and not check a code that says it is greater than nine but A1c give me the value.
Now there have been folks who asked us in a measure like that can you tell me hemoglobin A1c six months ago and the one now and what is the delta. There is a lot of value in that but that would change that measure. In order to retool we have to keep the measure intent similar to what it was. To create new measures we are very happy to look at those and send them through the endorsement process but in order to retool we have to keep the meaning of the original measure to keep it the same.
What are some new data types, new sources? Some of these were the gaps I showed you but functional status, care experience, communication. How do we get information from the patient? How do we get information transmitted to manage the different new types of information we need? What we will see as we work on the national priorities and look at the patients care experience, the patient engagement and care, we will find new data types and new elements that we need to add to this and address so we are expecting that.
I mentioned the HITEP data flow. The reason I included those here was new data sources will be coming directly from devices. How can we identify if I see a blood pressure what the source was? The more we can persist the origin of the data with the data as it goes through the system it might be a health information exchange having come from an EHR which came from a PHR which came from a device in the patients home. If I know it is from a PHR I dont know if the patient entered it. I dont know if it was the device. How can we manage and I dont know if it is clear yet how that happens. Here is the new source. How do I keep persisting where it originated so that I know the value of this data to use it for quality research or other purposes?
Now this is purposely complicated because it is. One of the comments I was asked to talk about a little was data collection. In the standards committee we were asked what are our transactions and we tried to look at well we have to look at what this flow. I have a measure source so the measure developer provides this measure. Lets assume it is electronic. It goes to an electronic record. It might go to a personal health record, PHR. It might go to a registry. Perhaps that EHR does everything. It assembles all the data in a registry within it. It creates the report and sends it out, which is why you are seeing all these different boxes that connect it. It may send the information and work directly with feedback with the data assembly system, a third-party warehouse, a health information exchange, a registry like American College of Cardiology, and that might be collecting the data. There is a transaction here that needs to be addressed. Sometimes the processing entity that reviews the data to validate it is correct is separate from the assistant. This is drawn to show there are various architectures that exist today. Perhaps one architecture is best but I dont know that we can get necessarily easily to one. I think what we want to do is address the transaction from the direct care delivery to whoever is assembling the data and feedback has to be frequent so that the care site knows how to improve care and not just get an end of year report. But this one is done to look at what are the different levels of data collection. It can be complex but I think what is really important is this collection of the data directly from its source.
In order to keep them current one of the issues as we talk about the quality data set is maintaining currency of the very atomic particles for those measures. That is the codes. If I am looking at RxNorm, for instance, as a source for the codes for my medications, that changes frequently. Every time it changes that code set or value set as some IT people would call that, is going to change. Does that change the measure and at what point does it? We have a team scheduled to meet in a couple of weeks to help us set criteria for when does that change or make it a new measure. When does it make a new code list and how do we version those and how do we manage them?
How do we maintain the QDS is our next job is to make sure we add to it and modify it as we learn more through creation of measures. Measure maintenance process, I think you heard some of from Helen Burstin this morning, and that what we do have is a regular endorsement process. Every measure has to be reevaluated every three years and every year, if there is a change is reevaluated. All measures now being tooled electronically will go through a modified process because they are not changing the measure. So the process we will evaluate one did I change the measure and if it did change then that means it is a new measure. Did it change the meaning of the measure that it was retooled and two, was it retooled appropriately so address the QDS correctly? That is an alternate process for retooled measures.
I am trying to keep to time but that is a broad overview of a new model for managing measurement, which I think will work well for not only quality but also public health use of data coming in and out of the EHR and also research.
On the code list issue I think what is important is there is currently in public health something called PHIN VADS, Public Health Information Network Vocabulary Access Directory, and they create value sets or code lists for all reporting for public health. NCI has something similar. I dont know the meaning of the acronym as clearly, but I think it is CDAR and they do something similar for research.
What we dont want to do is a create another silo of creation of code lists but our intent is to talk to the National Library of Medicine, CDC, and NCI and get together with stakeholder groups to figure out how to do this best nationwide and very shortly. Thank you.
DR. CARR: That is great. Thank you. It is so helpful the way you organize the data.
MR. QUINN: Mike Fitzmaurice isnt here but there is a thing called USHIK, which I dont know what it stands for but it is a big meta-data repository.
DR. EISENBERG: US Health Information Knowledgebase. Let me just say I did not mean to leave AHRQ and USHIK out of the picture. USHIK actually is doing a lot of work with HITSP to store all HITSP documents and to create a kind of a data repository for all of this and they have offered to do some of this for the quality enterprise. What I think is important is that we all sit together and figure out what is the way to do this. I dont know if it would be productive to have all the quality data sets or the quality information sitting in USHIK and the public health sitting in PHIN VADS and the research sitting at CDAR(?). I think it would be more helpful if there were a way that even if that did happen how they are all coordinated.
MR. QUINN: To put them all in USHIK.
DR. EISENBERG: If that is the answer that is fine.
DR. CARR: Blackford, go ahead. Do we have Cathy McDonald on the phone now?
MS. MCDONALD: Yes, I am here. Thank you.
DR. CARR: It is Justine Carr. We ran a little bit over because of various schedules. Are you okay to stay on the line?
MS. MCDONALD: Yes, I am.
DR. CARR: Great. Thank you.
DR. MIDDLETON: Good afternoon everybody. Blackford Middleton. I am from the Brigham and Womens Hospital, Partners Healthcare, a member of the quality subcommittee and a member of the full committee. I was invited this morning to share a few thoughts with you about these quality measures adoptability. I was jotting on the spot and put together a few slides.
I guess when I think about this and I am very happy actually to have the opportunity to share with you some thoughts really from the implementers point of view. I really tried to take a cut at this as an EMR developer and implementer having worn that hat more than a few times. It struck that may be there are four core components to think about when assessing these measures and their implementability or adoptability in HIT. I just thought I would consider HIT broadly. It might be that these are implemented in a variety of HIT systems, EMR and hospital-based records and other hospital systems as well as ambulatory EMR, et cetera.
Anyway first and foremost the quality of the measure, secondarily implementability or the adoptability, thirdly practicality of use in clinical practice and then lastly the maintainability or as I think Justine said this morning perhaps intentionally adaptability in addition to adoptability. Let me talk about each of these very briefly.
I think many of these issues have already been discussed. This is a big of an eye chart and I apologize, but clearly we have to start with a high-quality measure, well specified, clear conceptual understanding, interpretability as we heard before, measurable and discriminating among the candidate population being assessed, precise enough but not too much perhaps, statistically sound and valid, independent repeat measures and coherent and composite measure and the like, free of any bias structurally or random bias error that might occur, and what is the underlying distribution of the population? Is the measure appropriate for the underlying distribution? Is the requisite instrumentation from the HIT well understood? How is this data gathered? Is it gathered in the process of care or as a byproduct of care or in some other automatic way through an interface to an ancillary system or what have you? Are these different instrumentation requirements considered by device or by interface appropriately? Is the customer of the measure well understood? Is there use and intent with the measure well described whether it is payor, medical management, quality assessment, et cetera, and is it maintainable? Does it recognize changing, evolving code standards in nomenclature work flow processes, roles and responsibilities in the clinical care environment, and accountabilities, again, as has been mentioned already and of course has it validated?
There are fewer words per slide going forward. To address the implementability part though really getting at the task at hand in HIT, I think as we have seen in the QDS work and the NQF work here that already as Floyd described can we assess the numerator and denominator using standard data elements? Is that even possible? Is it realistic? By the standard data elements therefore then if they are to be used in the EMR typically and even if they are in the EMR, are they well populated? There are several hurdles to scale, mountains to climb. Does the measure rely on a particular HIT feature or function for its acquisition? If it does is that being used? Does this require use of a standard template or a form or a documentation widget? Or if a physician dictating, is the data element achieved in another way? Are these functional requirements considered actually at the time of the measure specification? It could be clear to the HIT implementer how the designer expected the data to be gathered.
Practicality of use in clinical practice with HIT. Are the standard data elements well populated as I mentioned? Are they being captured automatically or as a byproduct of care? Do the methods of data capture measure or are they biased in any way as the measure is being assessed? And of course two critical forms of error are biased both systematic, nonrandom, but systematic and then random as well and they will of course bias the results. Are they captured as a byproduct of care or is it outside the routine clinical workflow? If it is outside the routine clinical workflow, who is going to get it into the system so we actually have it as a source of measurement? Does the workflow in which the measure is captured bias the measure in anyway? And were these workflow considerations considered when the measure was being specified?
Second and last on the practicality of use of these measures in HIT. Does the data source itself bias the measure in any way? One lab versus another may have systematic, nonrandom, or even random error and they could be different. When these data are pooled they may therefore bias measures. Are there different coding schemes with partial concept consonants? I think that is a new phrase, Paul. I have not heard that before. If the terms defining the elements being measured are stored in the record arent exactly the same thing even if they have a standard code and label applied to them. If they are not exactly the same thing we will have this concept dissidents rather than consonants.
In the manual data sources does the quality of the data used in the measure vary by which person and which role is gathering the relevant data? A classic example here I think is the nurses are trained to gather the BP only measuring at the 5s. We are trained as physicians depending upon the level of intensity of your professor of medicine to either measure on the even numbers or even on the even and odd numbers. The nurse might take the BP and get 95 over 65 whereas I might take it myself and get 98 over 62 whatever.
Can a measure report be implemented? It is one thing to have the data even if we have gotten this far in this system. Can the measure report be implemented in a useful way for each use of the measure or each user of the measure? Can the same measure scale for multiple uses? Can the measures be used for different purposes, et cetera, at the point of care for the medical director, for the payor, for public health reporting? I think this is one of the visions in the NQF CDS work or QDS work rather that there will be a standard toolkit, a set of measures which could be used and reused in multiple different ways and have the same validity and integrity hopefully.
The last point thought is about this maintainability or adaptability and this is something I think those who implement EMRs and try to derive data from them wrestle with on a daily basis. Even if you have gotten this far and there is some kind of data in the record and you are trying to get it out of the record to make reports, does the measure support quality reporting at the point of care? Is it biased in all the ways we have talked about or is it biased actually in use in some interesting and usability type ways?
The key problem we face routinely is simple things like providers assignment is the report about my patients only or is it about any number of patients, which I might own or not own in my provider panel specification, et cetera.
But equally important I think and from a sustainability point of view, can the measure be updated easily and practically. Is there a way to change numerator and denominator specifications easily and practically? Can we change coding standards and evolve and maintain the history of evolution of definition or coding standards being used and can we evolve the different nomenclatures or vocabularies that might be used, SNOMED or other? And even within a terminology can we actually handle the way they do the update process? In SNOMED this is actually pretty difficult because SNOMED will do things like move an entire branch of a tree from one place on the hierarchy to another place on the hierarchy and any time you have any definition of anything using that branch and if you use any of the hierarchical relationships, you have to go and look back at all of your data and all of your specifications. For example, if you define diabetes as a subset of a variety of trees in SNOMED and they move one tree or one sub-tree from one place to another, that whole subset classification, which could be a critical element of the measure, has to be updated and that can be very tricky.
I guess lastly I just want to point to the issues therefore more broadly of semantic and syntactic integrity. How do we maintain these concept definitions and how do we evolve and migrate concept definitions as our needs change based upon measure requirements? I think Floyd raised the issue or rather Larry asked the question of is there a common information model for measures. Certainly the HL7 QRDA reporting data architecture is kind of working towards that end and I dont know a great deal about this. Its not yet been relevant to our work but I think they are attempting to kind of define a meta-model if you will or a common information model for quality reporting. I think that might actually make sense in the long run but it may simplify things in ways that we really need to have them simplified.
Lastly, is there messaging and reporting where standards methods for the extract transform and load procedures that have to be done in the HIT with the measure specification to get the data out and report on it? What is the architecture for knowledge management and curation? One of my favorite topics at Partners Healthcare. We have built a group there to worry about this kind of problem and as definitions and terminology and specifications change on a regular basis its not just define it once. It is define it and update it and update it continuously. Thank you.
DR. CARR: Excellent. You didnt need that time to prepare. It was excellent. Let me open this up to questions. Maybe we could get the lights back up as well. Could I start off with one question and maybe it is a naïve question and maybe I missed something. As we are looking toward our future state, certainly having the elements in the EHR will take a long way, but as I think about registries and taking that data and turning it into specific measures and/or even to do drill down, it seems that there is a need for an intermediary data aggregator who can take and pull data from perhaps different sources or can do manipulations and apply risk adjustment. Is that a fundamental piece of the big picture?
MR. EISENBERG: Yes, I think a data aggregator definitely is a fundamental piece of the picture. In enterprises, say hospital enterprises, it may be that the hospital system acts as the aggregator as opposed to necessarily using an outside registry. In other cases it would be a health information exchange or registry. But I agree. There does have to be an aggregator capability.
DR. MIDDLETON: And I would agree too, Justine. From an architectural point of view it is clear that the sources of data on which we wish to report will come from a variety of different places even within one institution and even potentially within one clinic. Depending upon the instrumentation of that clinic, laboratory data sources, other types of physiologic assessment, BP assessment tools, whatever, the data will be aggregated once for clinical purposes in the EMR in some form or another but often times for reporting it needs to be combined with administrative claims data or other sources of data again. And further not only is it combined but also it really is cleaned up. I havent seen a record yet and I have seen a few in details that really have data in the clinical record in a way that is ready for this quality kind of reporting.
DR. CARR: It seems that as we think about that for all the things that our earlier panel talked about in terms of risk adjustment, are there 35 risk adjustors or 6 and are we all using the same ones because even today I think that is an issue across various roads. Do we have resources or committees or panels focused on this?
DR. TANG: Focused on what?
DR. CARR: On the data aggregation function and risk adjustment. We had EHR and meaningful measures.
DR. TANG: Somebody Im sure know more about this than me but there is PQRI registries, for example, and I believe some vendors will actually extract the information needed and will act as the PQRI registry and extract the information from your EHR for submission. That would be almost the best I mean that is an intermediary, but the nice part about that is it is coming from your EHR. Now we need to have an API so that you can have some kind of aggregator act for multiple EHRs but that is at least a model for how things should be.
DR. CARR: In the administrative data world UHC is that kind of thing where there is this standardized risk adjustment model and a cleaning up of the data, but even hearing today about the variability and risk adjustment that seems like a whole body of work that will be critical to meaningful measurement.
DR. TANG: But even without risk adjustment though - I mean the whole clean up that - I guess my question back to the panelists are how feasible is it to get to the F7 and the other is Blackford you had a long list of considerations for how to consider which the make up of the quality measures. Who should be responsible for all that to consider all that?
DR. MIDDLETON: Well the NQF obviously. Right, Floyd? First of all there is a question about the method if you will on a national scale. What we are figuring out on a local scale is somewhat arcane and laborious procedures that you have to go through to get the data from all the different systems in this aggregation place. We call it quality data warehouse. It is parallel to and corollary to the clinical data repository and even another one called the research data repository. But whatever the architecture I think this is an excellent line of thinking. What do we actually need to have by a way of an architecture nationally for quality data reporting and management. Starting locally with QDS and specification, implementation, and use, but then that is all the afferent limb on the efferent limb. What do we do about data aggregation, extract, transform, and load and et cetera? The problem I think will be ameliorated to a degree when we have much more standardization of the cardinal data types in the record. When the problem list has to be SNOMED if it is going to be SNOMED then theoretically over time the problem list will clean up itself in use; however, when the problem list is in evolution from ICD-9 to ICD-10 or maybe partly to SNOMED and all of the rest of it, we are going to live in this room of cleaning up the data all the time. I think it is going to be extraordinarily valuable service if you will of the second question is who pays for that if it is coming from small office environment, et cetera. How does that get managed all the way to CMS and all the way back down?
DR. TANG: It almost seems like we have to in Floyds term the workflow. We almost have to prescribe from the measure developer point of view or upstream the clinical trials folks deciding who enters the blood pressure therefore manifest in the EHR has to be a place for that kind of person to enter the blood pressure so that information can be stored and passed on to fill that measure definition.
MR. EISENBERG: Well I think when I was talking about workflow is what information do you want. If I wanted a patient engagement I really might want not blood pressure from Im not trying to create a measure here but just hypothetically. If I want to make sure the patient is doing their own blood pressure at home then I want it to come from the patient and I need to find a source for that. If there is not a source obviously that is a problem. I dont know that it is prescribing what the EHR looks like as much as I know the data I want and how the EHR captures clinician entered or patient entered data if you will is in some ways up to EHR as long as what we are looking for is identified as long as we are clear what we need because then both can kind of build toward what would be the Rosetta Stone in the middle.
DR. GREEN: I want to thank Floyd for me was a pretty darn elegant model and I appreciate it very much. As is often the case someone presents an elegant model and you realize how sloppy yours is. I still practice medicine and as I was looking at that model I kept trying to map it to a mental model why care is obtained and rendered. Then your last comment just then about if you want a blood pressure. The entire asynchrony of healthcare that is our future seems to me has to be harmonized with this process and it sounds to me that NQF is further along in its modeling than physicians and nurses and other folks are with their models. I am ventilating now.
I really have a question for Blackford. Could you just say more about two things on those slides? One is you asked the question about are repeated measures independent and can you just talk a little more about what the nature of that independence is to you and the other one is about your scaling. Im not sure I understood your point. I thought you were talking about scaling them for multiple uses across settings and sectors and that sort of thing, independence in scaling.
DR. MIDDLETON: These are relatively off the cuff remarks because I literally did prepare them this morning, but here is my concern. In many ways the statisticians will worry about kind of whether or not measures either repeat, one time or composite are of course actually reflecting the underlining concept being assessed. The issue with repeat measures is if they are not independent, if they are dependent, there may be a false assurance that you are actually measuring something the measurement may be better than the state actually is because the independent measures will confound one another.
Similarly, what did I say the first time? The other one. You can be over confident or under confident based on the independence or the lack of independence of measures. This is the same kind of thing that happens in decision making actually. When you ask physicians of course about their a priori assessments of the probability of disease and relationship to findings. If you find the coexistence of elevated ALT, elevated AST, and maybe a total bili elevated, it doesnt give you much additional information then simply seeing elevated total bili for hepatitis or some obstructive disease in the liver. Similarly, I worry that if we have multiple measures of quality that are highly dependent, we may not actually be measuring more than one central concept. If you have independent measures and you can assess them as being independent, you are actually get much more information.
The second thing about scaling sort of the simple thought there I think is does the measure have the same applicability when measuring 20 or 100 or 100,000 or a million, does the same measure apply to populations in some kind of way? Im not sure I am going to be able to express well as well as can it be reused in multiple context, multiple purposes. There is a scaling issue about measuring more than just my clinic how I measure entire populations.
DR. CARR: Just one comment on that. Just to get back to one of the comments from the earlier session again the importance of being able to drill down on data because having this data source as well as other descriptors of the population who is being measured becomes critically relevant. In other words, if you have a bimodal cut population and your composite is average then you have missed the opportunity to improve the low performers. Again, it goes back to this intermediary aggregator. How do you configure the data and make it available in such a way as to take it to the next level?
DR. MIDDLETON: That was the notion of understanding the underlying population distribution for whatever the measures is trying to assess. If it is not normal you may miss completely different subpopulations in a bimodal thing or other types occurs and simply miss it.
DR. FITZMAURICE: I guess the second part of my question has been answered about how do you get the distribution reported. The first part is the data aggregator, is it the aggregator a part of the enterprise that is doing the reporting that is part of the single hospital, the single physician group, or the single chain of hospitals so that they have control or maybe they get to see it before it is reported as opposed to some comes in from here, some comes in from there and then it goes up and the person who is responsible for it at the enterprise offices. I dont know where these numbers came from. It seems to me that it has to be aggregated at the enterprise level first then when reported up, you have to report at least the denominator so that people can multiply the numerator by the n and aggregate across, get the total numerators then total denominators. If you just get fractions you are getting averages of fractions rather than and aggregated the whole population.
And the second part I think Justine brought up and you answered is that how do we get the distribution? You find out a lot of things from distributions then just from one point.
DR. MIDDLETON: This is helpful actually to discuss these thoughts, which are completely off the cuff. In a way, Paul, maybe a heuristic Justine that we can take away is at a level we want to argue for a parsimony of measures. We want to have the minimal set of required measures to assess quality because more than that is just burdensome in all the different ways we have been discussing. Is there one diabetes measure that will really cut the mustard?
DR. TANG: One of the challenges is that we have multiple siloed measure developers and diabetes of course is cross cutting among specialties. And even though there is NQF endorsement criteria that ask them to look at the contribution of the next measure, we have a challenge when it comes to - who is making the decision? Is this one better but actually it would be better yet if the such and such - and then you actually have to retire the other measure otherwise the public reporting folks are going to ask you to do that and just adds to the measure burden. The cycle time for that is every three years. It is timing, et cetera. As you were talking I was thinking about how to have some kind of coordinating body over the measure developers and that is where it actually goes back to QDS. If the QDS and its authoring tools somehow at least in form, the different silos creating measures perhaps there would be less new marginally contributing measures developed.
DR. CARR: We have time at the end to come back to some of this discussion.
MR. EISENBERG: I was just going to agree with Paul, that I think the QDS will help that because one is it will keep the elements and the sets of codes you are looking for somewhat consistent because without that reuse of the measure with different levels and versions of coding is going to change reporting and is going to affect that. I also think that there are sometimes and the additional data elements are often used for risk adjustment or for adjusting performance. Those additional elements can be added to the measure and reported even though they are not calculated in it. As long as that availability is there, I think that may keep measures fairly consistent so that you could do some analysis based on some variation with additional elements. Does that make sense?
DR. TANG: Maybe NQF can offer that service as proposals come in using the QDS and instead of measures to try to view its own sensitivity analysis in a sense. Is NQF funded to convene the supply chain to start talking about the use of and population of QDS?
MR. EISENBERG: We are. We are funded to continue and that is advice we are looking for you as our HITEP chair, where we move forward for this.
DR. CARR: Did you have a comment? With that then, I would like to reintroduce Sarah Scholle and then Kathy McDonald is on the phone. Kathy, why dont we start with you?
Agenda Item: Meaningful Measures for Care Coordination
MS. MCDONALD: Would you like me to just go ahead and start with my remarks?
DR. CARR: Yes.
MS. MCDONALD: Do you have my slides there?
DR. CARR: We have handouts. Let me make sure everybody has them in hand. I think they are in our blue books here. They are in these blue folders. Meaningful measures for care coordination. Kathy, can you just say a little bit about your background and then start right in. We have our slides.
MS. MCDONALD: Are they showing too, because there are a few spots there in color? If they are not in color I can make sure to point out what I am talking about.
DR. CARR: I think we are set. Yes, we have them on the screen. Just say when you want us to change and we will take care of that.
MS. MCDONALD: Okay. Thank you. Can everybody hear me okay? Okay great. I work at Stanford University. I am a Senior Scholar here and Executive Director of two research centers that operate under the umbrella of Stanford Health Policy. I have been involved in quality measurement development for about 12 years, and will sort of do some further introductions in my remarks here.
I appreciate the invitation to speak on this topic of meaningful measures of care coordination. On the second slide I can introduce some of the work that I will be drawing from as I make my remarks. If that slide were up that would be great.
I have been the Associate Director and an Investigator at Stanford and the UCSF Evidence-based Practice Center for the last decade and during this time we have produced numerous evidence reports on healthcare quality and patient safety. The three that are the most relevant building blocks for todays discussion are listed on this slide.
The most recent evidence report focused on care coordination as part of a series called Closing the Quality Gap that reviewed quality improvement strategies from the 20 topics identified by the Institute of Medicines National Priorities Report as most promising for improvability. A care coordination was listed as one of the cross cutting areas and our report included background on the topic of care coordination especially stakeholders suggestions about areas that required more research including measure development, a working definition of care coordination, conceptual frameworks, and systematic reviews pertaining to care coordination quality improvements.
This work will serve as a foundation for a new project recently started and supported by the AHRQ Quality Indicators Program on care coordination measure development for ambulatory care.
The other two reports listed in my background as building blocks for this conversation constitute the initial evidence reviews for the first three AHRQ quality indicator modules. The inpatient quality indicators, the prevention quality indicators, which were both covered in the 2001 report and the patients safety indicators, which were presented in the 2002 report the evidence on those.
Next I would like to give you a little history of the AHRQ quality indicator program since this is the home for the new care coordination measures project that I am leading. AHRQs QI program and the evidence-based practice center program each represents wonderful examples of producing research that is tied to the actual needs of the fields. The EPC program routinely and publicly asks for nominations of topics where the expectation is the stakeholder will use the product of an EPCs work, that is the evidence base synthesized well and comparatively were applicable.
The AHRQs QI work started within the EPC world after AHRQ and HCUP partners requested an evidence project to refine the original HCUP quality indicators. The AHRQ quality indicator development has always been grounded in evidence-based medicine methods applied to measurements.
In addition the measurement work has always been tightly coupled to users needs since AHRQ already had its HCUP partners who wanted tools to work with their data. The motivation for the original work was to satisfy the needs of those who were collecting the data and who were working in their states to supply hospitals, legislators, policy makers, and of course the public at large, patients and their families with something meaningful based on the routinely collected administrative data sets available at the time and still in use.
As the program evolved AHRQ initiated the support contract to assure ongoing refinements of the indicators, and this represents the guiding philosophy of the program that of continuous quality improvement based on user experience and changes in the medical evidence. In addition, the program includes expansions within domains and data sets initially covered as well as expansions to new domains without ties to any particular data set in reflecting new priorities in the healthcare field, such as, the new project on care coordination measures.
All along AHRQ and the QI team have continued to innovate to expand measurement methods always evaluating measures from initial assessments to implementation of the measures and then feedback and support throughout the life of the measures in the field.
I would like to suggest two analogies based on these observations of the history of AHRQ quality indicator program. First there is an analogy between continuous quality improvements for industries that delivery goods or services and a similar need for measurement to involve continuous improvement as well.
Second, and more apropos for the current session there is an apt analogy between coordinating care to deliver seamless and effect of delivery of healthcare to patients, and coordinating measurement efforts to delivery a non-burdensome and effective picture of the systems ability to deliver high quality coordinated care. In those cases it is crucial to minimize important gaps, gaps in care or gaps in measurement. We should be most interested in gaps in care that has the potential to contribute to bad outcomes for patients, and similarly in the analogous case we need to pay special attention to gaps in measurement that could result in missed opportunities to improve the quality of care. It is often more difficult to pay attention to what is missing as opposed to evaluating what is present.
Im going to shift gears a bit and give my perspective on the four questions posed as the goal for this hearing with the focus on care coordination measures. The first question that was part of the agenda was how do we approach building meaningful measures. For care coordination I think it is important to think carefully about this domain and how measures might best promote the best care. For care coordination I think there are four main concerns. First in care coordination we have to have a working definition of that is. In our previous work in evidence-based practice center we found over 40 unique definitions of care coordination, which we used to develop. The working definition I would be glad to share that if asked.
Second, in considering care coordination measure development we need to draw from one or more conceptual frameworks for being able to deconstruct activities that lead to well coordinated care so that we can determine the desired outcomes that might be related to that and establish a causal chain about where in that process measurement would be most valuable. I know that in an earlier session it looked like there was discussion of structure process and outcomes, the classic Donabedian model. That would be one framework. With care coordination we might also want to think more creatively given the conceptual challenges of this particular domain, and I will share some of our thinking on this in a moment.
Third, in terms of developing meaningful measures we require some research evidence that shows that any measurement proposed actually maps to components of any of the frameworks that allows us to say that there is some base validity to the measure.
Finally, in building meaningful care coordination measures we need to make sure that a measure set adequately covers the areas most likely to drive quality improvement efforts, have transparency, and ultimately gain the outcomes that we believe are sensitive to coordination or failures in coordination.
It was suggested that we needed to have a conceptual framework or frameworks of care coordination as the basis for measurements, development and evaluation in this domain as in any domain. I offer two examples. Go to the next slide with conceptualization option number one. These arent meant to be definitive or exhaustive. They are simply to exemplify the importance of using some logic regarding the connection between what might be measured and how that measure could monitor the situation.
In this slide we see a diagram adapted for management sciences and organizational design. The key point of this framework is the care coordination of the product of a good fit between information processing requirements of a particular care situation and information processing capacity of the system or in many cases the non-system delivering care. We can see that an area prone to failures coordination such as hand off to transition of care as we think about that we can keep in mind this organizational design framework and are reminded that we might want to monitor to the fits. That is what is in the middle circle between information processing capacity and information processing requirements. On the left side of the diagram processing requirements depend on the situation and empirical work has shown that varying levels of interdependence, uncertainty, and complexity can be addressed best by varying the mechanisms on the right side and used to provide information processing capacity.
Eric Coleman from Colorado has led the development of the Care Transitions Measure or the CTM. There is a 15-item version and 3-item version and also interventions to improve transition from hospitals to home. One of his studies about an intervention to improve these transitions offers a good example of taking stock of the care transition situation by getting input from patients and caregivers about the areas of interdependence, uncertainty, and complexity similar to that shown on the left side of the diagram. He refers to them as the four pillars: medication management, personal health information, follow up visits, and specification of red flags. This in turn allows for establishing and testing the mechanisms to address the coordinating needs of the situation. In the case of some of his work a transition coach is a key feature of one of the interventions and Dr. Coleman has evaluated and could be thought of as an example of structural linking between settings of care.
Another key feature is a personal health record, which is part of the operational processes box on the right hand side as a supporting tool to provide coordination and it is used along with specific casts for patients and coach aimed at specific goals.
For measures we could target any of these steps in assessing the fit between the needs of situation and the mechanisms to address those needs, or with measures we could focus on understanding patients preparation for the discharge transition as is the case for the care transitions measure, which is based on direct questions to the patient.
Another approach to conceptualization is shown in the next few slides that again try to reduce care coordination as the components that might be measurable. We built is the next slide up conceptualization option number two. We built upon NQFs work and mathematicas evaluation of care coordination demonstration projects and lists essential care tasks in bold and associated coordination activities in italics. Measure development could be organized around ways to observe or quantify the activities in their efficacy.
Then on the next slide we note common features of interventions to support coordination activities in bold and list examples in italics. The extent that there is evidence of the presence of the supporting feature improves care coordination it would make sense to consider measuring the presence of the feature. For example, in a 2008 article in Health Affairs by Diane Rittenhouse and colleagues they used data from the Second National Study of Physician Organizations and the Management of Chronic Illnesses or the NSPO2 to quantify the extent of adoption of infrastructure components of the patient-centered medical home. One of these measures is the care coordination integration component consisting of several questions and resulting in an index from zero to five, which asks questions about use of electronic medical records, exchange of information across settings and presence of registries and nurse managers for specific chronic diseases. All of these concepts and examples are shown in our second framework.
These frameworks can be used by measure developers to consider the relationship of potential measures to the ways the measures might be meaningful to improving care coordination.
Now turning to the second question posed for this hearing, what is the current process for developing measures? Does it adequately address measure development for key national priorities and subpopulations? I would like to give a quick thumbnail sketch of the AHRQ QI process and its application to care coordination and measure developments. The remaining three slides provide an overview of the process and evaluation criteria.
The first slide labeled indicator set development shows a standard AHRQ QI development process, which starts at producing a list of candidate indicators based on a variety of sources and leads to an evaluation of each individual indicator, the development of a set of indicators after a selection process, and then evaluation of the indicators and practice, which leads in turn to further evidence and ultimately feedback to improved measures.
For care coordination given the dynamic nature of this area with lots of work underway we have added the box in red call in development to highlight the importance of looking for measures and development.
The next slide shows the steps, sort of a stair step, to evaluating each promising indicator in our candidate list. We start at the bottom and work our way up the steps. If an indicator doesnt capture an aspect of quality that is important and subject controlled by the healthcare community, it isnt going to be meaningful. For care coordination we have added the red text emphasizing the patient in the first step. This is an area where failures in coordination often only experienced by the patient. This is particularly true in transitions across settings for example. Of course there are also times where the patient does not know immediately or perhaps ever that a failure of coordination has occurred as in a missed diagnosis based on test result that the provider missed getting or seeing or reacting to.
The rest of the stair steps are no doubt were understood by this group and already covered to some extent in remarks from earlier speakers. I can go through these in more detail if desired later. The main point is that we apply these evaluation criteria of all AHRQ QI development efforts but can emphasize new aspects of the new domain called for such enhancements.
Similarly, in the next slide labeled evaluation stages for indicators we undertake three main activities of background research, external input and supplementary research. Again, items in red suggest enhancements that may be necessary for the care coordination domain. Two of these coding consultation and empirical analyses lead me to the next question posed for this hearing. That question is how do we introduce new data sources, clinical data from EHR, user generated data, et cetera into the measure development process.
It is important to start with that anchoring to any one data set, keeping the options for candidate indicators wide open and looking for data sources that support those indicators or concepts. Expertise from those who have worked with identified data sources has been critical as empirical analyses to test alternative definitions, assess rates, variation, and relationship among potential indicators and also to test risk adjustment methods.
A given data source sometimes starts a measure development process as is the case of the HCUP data and the initial AHRQ QIs, but in the case of care coordination I believe it is vital to draw from a variety of data sources to reflect a patient, clinical, and organizational knowledge and experience.
The last question for the hearing is how do we maintain and update measures and what are the health IT system implications. Here I would suggest that we draw from the experience AHRQ QI program, which follows a continuum for measure development to translation of measures into practice and then support for users and feedback to the developers about opportunities for improvement.
In conclusion I anticipate that we might expect care coordination measurements to follow at similar paths to what our team experienced with the development of the patient safety indicators, one of the AHRQ QI modules. Eight years ago healthcares heightened attention on patient safety resulted in numerous research efforts to understand practices for improving patient safety and ways to measure progress. Our group worked on a patient safety practice evidence report called making healthcare safer and at the same time develop patient safety indicators, which started from a modest evidence base that built over time with validation efforts by our team and others ultimately achieving NQF endorsements.
Similarly, AHRQ QI care coordination measures project will draw on our experience understanding the state of evidence and care coordination based on our research for the EPC report on care coordination. Given the increasing level of activity in this area we plan to form a stakeholder and informants work group to assist in identifying all relevant current activities. In addition to our usual practice of inventorying and assessing measures from prior published measure development studies.
In addition we recognize the importance of the research community in this developing area. Specific interventions to improve coordination are still under development and testing either as part of research studies or demonstration projects. As a result evaluators and care coordination improvement strategies need guidance about what is measure in their evaluations and how to prioritize potential measures. To address this need the AHRQ QI teams care coordination measures project will form a second work group of evaluators, evaluatees, and research experts to assist and develop an evaluation tool to guide choice of measures for research purposes. This part of the project responds directly to one of the conclusions of our care coordination evidence report and that is studies of care coordination quality improvement strategies have immense heterogeneity and measures making comparative assessments of what works best to improve care coordination challenging. Its not impossible.
Any QI development process must adapt to the needs of a particular domain understudy. We have a toolbox of AHRQ QI development methods from which we choose approaches best suited to a new area. As needed we also develop new research methods or tools to tackle unique aspects of a new domain such as care coordination. In contrast the dynamic and adaptable nature of our growing toolbox, the AHRQ QI development approach has kept the same standards and criteria for achieving meaningful measurements and I think that this experience is highly useful for the goals of this hearing.
Thank you for the opportunity. I am honored to convey my experience and thoughts on this important topic and I would be glad to respond to any questions at any point.
DR. CARR: Thank you very much. I think what we will do is go straight onto Sarah Scholle and then come back and take questions for both speakers.
MS. SCHOLLE: I would like to talk with you about some of the work that we have underway with colleagues at Johns Hopkins and Park Nicollet to look at care coordination and I just wanted to start off with key points that I will try to make in this discussion. First of all to suggest that care coordination can be measured by thinking about the structures, the processes, and outcomes and that processes are most actionable but that is where we have the fewest measures. What is challenging about the measures is that when you are measuring this process, you really need to have it embed in care. Your measurement needs to be embedded in the care delivery and it needs to help with decision support as well as be something that leads you to measures that can help you monitor care and improve care. In particularly when we think about measuring care coordination in an electronic environment we need HIT systems that can track essential data elements and can support the care coordination process, but we also need the workflows that use the systems. I will try to expand on that.
What is care coordination? It is the information sharing that happens across providers, patients, different services, sites, and across time. What are we trying to do when we coordinate care? We are trying to make sure those patients needs and preferences are respected and that care is both efficient and leads to good quality outcomes. We know that care coordination is most important for people that have the more complex situations because they are seeing multiple doctors, because they have more complex needs but it is likely to change over time.
Care coordination. It can be structure, process or outcome measures. I think structural measures are an important starting place. They lay out what capabilities need to be in place, information systems, staffing. They articulate expectations. Process measures try to get at what is happening. How is the information being exchanged? Is it being exchanged? Outcomes of care coordination are probably the things that are most relevant to families and policy makers. These are measures like readmission rates and failures of care coordination are probably easier or more relevant to families and policy makers, but the problem is that you need risk adjustment. Some of these things like risk or readmissions are rare and they may be difficult to attribute to particular actions and players.
We have an ongoing grant right now where we are trying to think about this whole measurement framework in the context of vulnerable children, children who are at risk for developmental delay. This diagram tries to lay out. It is a starting point for thinking about this structure, process, and outcomes. What do we mean by structure? Well it is for care coordination. In a primary care practice it could be a process for tracking referrals and it might be different in different levels of the healthcare system. At a state it might be service capacity. Within a community it might be whether you have someone to act as a navigator to help families understand what are the service opportunities.
The process measures. Here are some examples of process measures and I will talk about it some more but it is really the information gets sent from the primary care provider to the specialist as the information may get back. Does somebody act on it? Is there a care plan that identifies who is responsible for what?
Then outcome measures could be clinical outcomes. Poor control of diabetes could be a failure of adequate care coordination. Patient and family perceptions are an important factor but Im not sure. The challenge right now is that we dont really have very good measures of care coordination. I think we are starting to have better measures on the structural side. I think we have some sense of what are good or bad outcomes like readmission rates but they are hard to measure. We have measures of care coordination from patient surveys, but that is probably one of the weakest points of a CAHPS survey today and part of it is that Im not sure that people really know what it is like to be in a system of care that is really well coordinated. It is hard to measure that.
In our research work we are really going to try to hone in on this from the perspective of looking at children with chronic conditions.
I said that we have been doing a lot of work on the structural side and I think that physician practice connections, patient-centered medical home. These standards for what practices should have in place to organize care well through managing their patient population, try to get at some of these structural aspects of care coordination by looking to see whether practices have tools and processes for tracking tests and referrals.
To some extent the measures that are in PPC reflect are limited by what we think a practice should be responsible for, is accountable for. It looks to see did the practice have a way to track whether information went to the specialists but in some ways that primary care practice cant really be responsible for getting the information back from the specialist because if the specialist doesnt want to do it. That is one of the challenges that we have with measures of care coordination is like where is the accountability and how do you measure in that set?
I have been working with Jonathan Winer at Johns Hopkins University and several other of his colleagues there as well as Jinnet Fowles from Park Nicollet, to think about measuring care coordination specifically in the ambulatory setting and to look at the process of referrals and consultations, communication between a primary care practice and medical specialists. We are just finishing up this work that was an initial phase of trying to identify potential measures, develop preliminary technical specifications and then to work with some practices and clinicians to try to understand where these usable and acceptable measures.
This is the model for ambulatory care coordination that we developed in our project through a number of iterations with clinicians and key stakeholders and on the left side you see the logic model, the process flow of care coordination from the physician discusses options for referral with the patient, identifies relevant information that needs to go to the specialist, sends the information with the request. The specialist receives the information, sends the results back, and shares it with the family. The primary care physician sees the information acts on it whether that is updating the care plan or talking with the family.
That was the process flow that we developed and then we talked with our expert panel and with clinicians and others in practices about what are the meaningful points along this process where it would make sense to measure that the measure could be useful for informing the care process. It could be useful for monitoring care and measuring.
What you will see is that I have these in different colors. The measures that have the dotted lines are measure concept that we thought about but dropped and the reason we dropped those, actually several of them have to do with communication with patients about shared decision making. Our panel said focus on a narrow set, a small set of measures that we can really get implemented and they said that is really important but it is not happening so dont start there. Updated care plan was the same way. They said really important but theres not one yet so dont try to measure it yet. Start with the pieces that are most reasonable today and that is where we get the piece of critical it is really a referral feedback. It is that the PCP sends critical information to the specialist. The specialist gets the information. The specialist sees the patient, takes care of it, sends the information back, and the primary care physician gets it back, referral loop.
Now what we have also added in here is that some of that information is communicated to patients both from a primary care physician this is why I want you to go see the specialist for this reason and also that the specialist communicates the results to the family. The green ones are the ones that we thought were primarily on the accountability of the primary care provider and the blue are on the accountability of the specialist. These are the ones we are moving forward with.
I also want to mention the one that visits schedule within a requested timeframe. One of the things that we are interested in from a community accountability point of view was did this visit happen when a referral was suggested. We had to drop that for a couple of reasons. One because it wasnt clear who would be accountable for it but the other because of this idea about a timeframe or did the referral happen within a particular timeframe. This is not something that people document and that is what we found. I want to talk about some of the measurement issues that we have identified. This applies both in a paper-based world and in an electronic world.
First of all, really do any measurement here of completed referrals if that is in your goal whether you get completed referrals. You have to kind of know which ones you really expect to happen. When we talked with clinicians and they said basically there is some referrals that are stats, some referrals where you really want them to happen maybe in the next couple of weeks, and then other referrals are just kind of when you think about it or they might be I want you to get a colonoscopy in the next six months but they are not triggered to timeframe. That makes measurement very hard and that information we found it but if it was in there, it wasnt defined and it wasnt in a way that would lead to measurement.
Effective communication with patients and families. We saw where practices were using clinical summaries, visit summaries that they print out from the EHR. When we talked with them about that being measure that like did the family get the results, they said I could tell you that the report was printed. I dont know whether it actually got to the family or anybody talked to them about it. That is a challenge for measurement.
Accountability is an issue that I talked about a little bit that there was concern about are you going to hold me responsible for getting the completed report if that specialist never sends reports. What should those be?
The other interesting thing is that we did our work in both integrated settings and in non-integrated. Communities where we were just talking with private practices that were interested in they were using an EHR and we saw really different concerns about care coordination that affect measurement and affect one issue is are you worried about patient dumping or patient stealing. In the integrated systems they were concerned that the primary care physicians were sending them to many patients that really didnt need to be seen in a specialist setting. In the non-integrated settings they were more concerned about well if I referred this patient to see a cardiologist just to get a question answered, I will never see that patient again. The distinction better referrals and consultation Medicare pays different for a visit if it is a consultation than if it is a referral. They pay more for a consultation than for a referral. The integrative settings are really concerned about making sure that they identify what is a consult versus a referral. A consult meaning that you are just asking a question and you intend as a primary care provider to provide ongoing care for that patient. Where a referral is you are transferring here to the specialist. The integrated systems were really concerned about this and in the non-integrated settings we really didnt see a lot of sensitivity to this. This could really change in an accountable care organization model where those payments would differ.
The other piece is that actually in talking with integrated systems this idea of the feedback loop and the information we share a medical record. They know that they can see in the medical record whether the patient saw the specialist. They can look at the report. What does it mean? How do you know whether somebody actually looked at the relevant pieces in the share medical record to be able to say that the information was exchanged? This is so hard to measure.
We actually have done site visits in three integrated settings and two non-integrated settings to try to think about how to measure this and we have a grant that we just submitted to AHRQ that we hope will be able to conduct some research to try to understand this. What we learned is that even in all of these settings where we had leadership wanting to improve care coordination and HIT services to support it, we found that either there wasnt electronic functionality to report measures or the workflows didnt really support it or both. Even in integrated settings where they had this referral requesting, all the data was free text. We didnt have that timeframe. We didnt have the referral consult. It was very hard to construct a measure of completed referrals and the same was true in the non-integrated settings where sometimes the staff workflows didnt support it.
As we go forward in thinking about how to use electronic health records to measure this concept, we are really concerned about a couple of things with EHR-based measurement. One is whether we might have underreporting of the numerators. If you want to look at a completed referral in the EHR, often that information is going to come back from a scanned document. It is the specialist report coming back. It is scanned in, which means somebody has to take the scan and document and report yes the doctor reviewed it and that is different from the referral coming back. You might have parent quality failures that really and truly the loop was closed but the EHR cant identify it.
The other concern is whether the eligible population. Even when you have this referral tool available, will people use that and can you track all the referrals that happen? Those are some of the challenges that we have seen as we have really tried to get into the details of measuring care coordination and the ambulatory care setting.
We do think it is valuable to measure this process of care coordination and to try to figure out how we can get some of these measures to come out of EHRs but we think that having structural measures to support what the capability should like and what the workflow should like and the staffing is going to be important. Monitoring the outcomes will be essential as well.
We are hoping that these kinds of measures could really help to trigger quality improvement, improve decision support, and help us to monitor the quality of care.
DR. CARR: What about this? Could you just say did you see the last note from the last provider and did you do what you were supposed to do? Just have it sort of yes, no. I am being facetious but I mean we heard about that this morning where we have so much complexity we become paralyzed and we lose sight of the immediacy of what we are trying to say.
MS. SCHOLLE: I think part of what we are going to be trying to look at is where would you put that question.
DR. CARR: Every time you log on to a new patient the first message that comes up is did you see the note from their last provider. Who was their last provider? Did you read the note? I am being simplistic.
DR. GREEN: Can I ask you to go back to one of your prior slides? It had figure one on it, model for ambulatory care coordination. I feel a little premature here in what I am about to do. What is setting up for me this afternoon is I want to think about this overnight and come back to it tomorrow. But what is setting up for me is I am learning something important I think, which is how essential it is that the development of meaningful measures move in lock step with explicitly articulated statements about the care process about which the measure is to be pertinent. Justine, your question about that and other comments that we make is teaching me how each of us at the table carries around with us an understanding of the care process in which we participate as a provider or a patient some way or another and that always is coloring the way we think about measures and their development.
If you will indulge me I want to make two comments. Most of the afternoons presentations have appealed to me as a health services researcher type but Sarahs and also Kathryns they appeal to me as a physician trying to take care of folks. Let me use this figure to exemplify this. As a physician who also served on the board of directors of a multi-specialty group practice, this model for ambulatory care coordination for a person with what we would take as part of our concept is that it was a dermatological condition. It had to do with skin. It would be completely different depending on one piece of knowledge. Is this person being in a payor system in which dermatology providers are capitated or are they in a fee for service model? Without knowing that piece of information this is nonsense. I assure you. This is nonsense. You can work on a lot of measures that will have no meaning if dermatology is capitated and you will be missing the measures that will be crucial if they are in fee for service.
Jump from that to just what I saw happen a few days ago. Imagine that you are seeing an 88-year-old mostly deeply demented Alzheimers patient who has an unhealing lesion on the top of his left ear. He has already disrupted the entire practice because he is confused about where he is. He is getting a little noisy. He is getting hard to handle. Now we need to decide if we are going to cut this guys how much of this guys ear are we going to cut off? That is the crucial question. Or are we just going let this thing keep eating away at his ear until he dies of the complications of his other morbidities? The way this problem actually gets done is how many of you have a camera in your pocket right now? Debbie doesnt seem to have a camera. The way the ambulatory care coordination occurred was photograph, email to the dermatologist, phone call saying look at it now, 90 seconds later dermatologist says if that guy is going to live longer than 6 months he should have more surgery on this and we can do it Monday and he goes home. Now look at the things on here that dont exist to measure what happened. I would argue that that was optimized care.
I want to go back to my two examples to developing measures that are not contextualize into the care process almost certainly compromises meaningfulness. What a humbling presentation you have made.
MS. SCHOLLE: Does it work for many situations of those outliers that you just described?
DR. GREEN: They work for many situations. Like I said I felt like I might be premature and I think I am a little premature. I would like to think about this probably the duration of my tour of duty on the NCVHS. This strikes me as a pretty significant challenge in getting to the meaningful use of measures.
MS. MCDONALD: Can I just comment on that comment? That is why I actually shared the somewhat complicated conceptual diagram of organizational design frameworks because I think that does capture the statements that were just made that you really do have to think about the exact setting and the exact patients and what the interdependencies are. You mentioned the form of payments, the types of contextual factors that would then relate to what is going on that would then cause you to think about different coordinating mechanisms like taking a picture and sending that to a dermatologist and there under those circumstances you had a good fit, the best fit possible perhaps and what is hard to do is to measure that, but we can intuitively tell if that was a better option under that kind of context and sending the Alzheimers patient over to the dermatologist. That is why I showed that diagram because I think thinking in a linear Donabedian framework may not be as appropriate for care coordination as it is for some other areas.
DR. CARR: Once we have the measures who is accountable?
MS. SCHOLLE: This was clearly a concern. We talked primarily with primary care physicians and they were interested in having the denominator changed so that the question about whether the critical information was referred that was sent with the specialist, that was fine for everybody for whom they thought needed a referral. But on the last one about the primary care physician reviews as specialist report, they wanted that denominator to be for people who actually saw the specialist but then what happened to the people who needed to see a specialist but didnt get one and that is really the concern about accountability. At a community level it should be all referrals.
The other question on the specialists side was what about these self referrals. When patients go directly to the specialists, is the specialist responsible for trying to get critical information from the primary care physician when they didnt get a referral? It wasnt really a referral. It was a self referral. The patient just went to the specialist.
MS. MCDONALD: We talked about this in terms of starting up our projects on inventory and care coordination measures and audiences and accountability and such. We thought that maybe the first step is to look at those patients that are in some sort of system of care in thinking about measurement within that system since a lot of care coordination failures occur in the white space where no one is accountable. That would be the next level of measurements that would be needed but then there is this problem of but no one is accountable. If we measure it what will really happen. Our thinking is that maybe what is most useful is the start at least with patients for whom they are saying a patient is under medical home, practice setting, or some sort of integrated system and that does not provide a full picture of everyone and all care but it would be a reasonable starting point.
DR. TANG: I am hung up on Blackfords challenge about looking for breakthroughs. It looks like the descriptions of decomposing these processes into steps and then try to instrumental the steps. One question I have is has the cost of complying with reporting been studied already? Yes. You are nodding your head yes. You are nodding in agreement. My concern in todays world and that is the breakthrough I am talking about. You used to have to chase down people with paper and pencil and then come in it. That seems so 20th century and it seems like today care coordination is an outcome. I know it is made up of processes but it is sort of like what Larry was saying. We all have our own. They work locally. They work or they dont work. But we want care coordination. We dont want a letter with the date and time stamp and a all that does is add to the cost more of trying to subtract cost.
It also does not leverage. In this office you might have the CNS, the clinical nurse specialist who just drive your practice and thank goodness for that person and that is how it is going to get done. But if I measured all these other things not only would I pay for all these other things, I actually am not helping pay for the person who is driving my care coordination.
Isnt the time now to focus on the output? For example, readmission. Its not such a rare thing. It is 20 percent. Plenty of opportunity to measure and to detect changes in whatever it takes for you to get care coordination done. The reason I say the other style is 20th century is I think we have new tools and whether it is there is new tools to try to figure out what our processes are and how to streamline or redesign those processes but they are local. We are really trying to measure the final goal. Is now the time focus a lot on the final goals and worry less and burden less the precious resource we have? Primary care providers just arent a dime a dozen and if we spend a nickel chasing after processes that may not be relevant in a practice, I worry about that.
MS. MCDONALD: I think that is a really important point. There is a measure out of a group in Australia. I think it is Clara McGinnis and Beverly Sibthorpe, simply asked the question of a patient. In the past three months how often did you feel the care you received was well coordinated? Of course the issue was having not be the outcome as sometimes patients perceptions are not going to pick up everything. But certainly prioritizing something related to the perception of coordination on the patients part, the familys part and the health providers part would be important in this whole picture.
DR. TANG: And I know Sarah wants to say something too but that is what I thought about when Justine asked her question is in Amazon it pops up and that is how I like that vendor or how I like the product. Well, when you open up that consult result request we can ask, did it meet your needs? Just like, Kathy, you were saying the patient did it meet your needs and to the specialist did it meet your needs. That actually might be a whole lot closer and more meaningful in feedback to each other then all these processes.
MS. SCHOLLE: I guess I have been making a point that these are process measures but in reality they really are looking at depend on structure. Essentially what you want is that you want this structure to be in place so that somebodys asking when they are sending a patient over for a referral for a specialists visit that you have given the patient enough information, you have given the specialist enough information to know what services requested so that the patient understands why they were supposed to go there and that somehow the information gets back. I do think your point about are we putting too much are we investing too much into getting a measure out is a big challenge to us. On the other hand, we are going to lose these. In an integrative system this is where the difference between integrative and non-integrative is huge. In the integrative system okay. Well, the medical record is there. You are going to have to assume that somehow the primary care physician is going to look at the data that came back and it is going to be there and have a chance to do it. But what happens in a non-integrated setting is that that information doesnt come back or if it comes back it may not come back in a timely fashion and you have already seen the patient and you have already done the same set of tests over again or when the specialist gets the patient they end up doing those things again. The other kinds of outcomes you might want to look at would be duplicate testing or numbers of visits. There may be other ways to try to get at that area of efficiency and they are going to be different in different settings.
Im not convinced that we should say no to tracking this. When I talked with people in non-integrative settings, they think this is hugely important to try to understand what is happening because they think that a way to understand their community and their system. If you talk with John Blair, this is why he set up all the work that he has done in the Mid-Hudson Valley of New York.
MS. MCDONALD: The other reason that the intermediate outcomes or process types of measures can remain important as long as there is not too large a burden or pressure on especially these primary care doctors as was pointed out is that that is often where there is action ability. If you can measure it and there is already been linkage to something like lowered rate admission rates when some process has been in place that coordinates care better. If you can measure that and pick that up as an issue then that is exactly where the interventions and actions can occur and improve quality. That is what makes that part of the measurement potentially meaningful and useful.
DR. CARR: I just want to jump in here before Floyd, and really amplify what Paul was saying. Just as we were talking as I was looking up the commonwealth fund has put together why not the best, and you can look at all the core measures. I just looked to see who were the top performers on did you understand your discharge instructions and 95 percent of them are specialty hospitals, surgical hospitals, orthopedic hospitals. In that split second, I learned a lot of things that just one thing, one question, asking the patient and immediately seeing that the procedural areas have got it down. I can go back and learn from that.
I am just concerned that the complexity of this is so overwhelming that we might never get there. We might not get where we are going as we are trying to figure out how to measure. I think a lot of the benefit of measures is to ask the question. Again, this was a story that was in Europe or something that some consultant from IBM told this story, that there is a mountain in Switzerland and there is a tunnel through it, and they wanted to say if it is daylight, lights are off, turn them on. If it is nighttime and your lights are on, turn them off. They struggled with this for weeks and weeks and finally they said, are your lights on and that was the sign that they put above. I feel like that is the right question.
We talked about this at the OHIMA meeting. We did this very complex glucose augment and finally STS and CMS said was your glucose less than 200 on the morning after surgery. I think sometimes asking one question can catalyze a universe of systems that will answer in a way that is right for them and for us to try to take the elements of the answer before anyone has completely answered it could put us in a cycle that would not see daylight.
MR. EISENBERG: I was just going to add to that because I see a lot of complexity and I dont want to go back to my QDS discussion, but a couple of the elements we identified in there are care goals and experience. The experience is not just the patients experience but also the providers experience. One of our other panels actually through HIT we have a couple of expert panels coming up. One is in care coordination to identify what elements are required and can we help push standardization of an EHR so that we can manage measurement and part of that will be care goals because it is based on the plan of care and the care experience. I hate pop ups so I understand, but when I see something did it meet your needs to be able to answer that answers a couple of questions. One is it means I have read it enough so you can know that I have looked at. It also means there can be feedback to specialists. If that specialist isnt meeting my needs on a regular basis I may not use them again but maybe somebody else ought to know they are not meeting my needs. A lot of this is not in EHRs today and it is written somewhere on the side.
I remember on paper every time I got something I would initial it to say I have read it. I know people initial it when they dont read it but it shows you have looked at it. I dont want to create that attesting in the EHR but I think there are ways to go about this that are kind of in progress that may help us out.
DR. TANG: I just hope we dont get to a point where we need measure coordination on top of our care coordination but it really sounds a bit like that. Another example is Geisinger got a lot of attention for their program they called proven care. All that is is an outcome measure. I leave it to them to figure out how to pay less output on that guaranteed price. I think that would serve us well as a way of measuring care coordination rather than all these process measures. I dont know. Maybe its a bit much.
MS. SCHOLLE: I think these points are well taken. I can assure you these are questions that have come up in our mind as we have done this work and tried to think about what is worthwhile. Where should we be leading the field? It is really going to be a different issue for integrative versus non-integrative settings but in the short term for non-integrative settings where having a pathway to what should you expect and to what information should be shared with the patient with the specialist. Is the specialist getting the information they need and is that information coming back? Having a simple way to know whether on a community level you are accomplishing that is important and it is built in these pieces of did the information get shared and come back or it is serving patients and families in a way that is also another expensive approach. We will certainly as you can tell from our broader framework it is something that is on our plate to try to think about whether the benefits and drawbacks of all the different approaches.
DR. CARR: I think we wanted to have a little bit of time to just recap and prepare for tomorrow.
MR. QUINN: I was going to say another way to look at this is not in the context of care delivery so much but in the context of social networks. Imagine a social network with the patient in the middle and all of these other people around them. There is a world of measurement around the activity of social networks, how well they are used as well as their indicators of how well connected a patient is and people in a social network is connected. To use the paradigm friending someone. Well, maybe we call that a data sharing agreement signed among a particular patients provider to show that they are able to receive messages. And then we can also look at it and Im no expert on how to measure social network activity but how much traffic is on there. These are sort of things that can be done in automated way versus this. I think what we are going to find is that looking at those patterns of social networks for patients some of them are going to be better trafficked. Others are going to be less trafficked and they are going to be very different networks in an integrated system versus not. I think that that is a different paradigm than this for looking at it and its not necessarily tied to all the dependencies here.
DR. CARR: Thanks very much to both of you. It really stimulated a lot of thinking and a lot of appreciation for the excellent work that you both are doing. Thank you very much.
Agenda Item: Re-Cap and Discussion
DR. CARR: I realized it has been a challenging day in terms of incoming information but we had set aside a little time until five thirty anyway to recap and discuss. One thought I had would be to just get major themes that we heard today and think about them because our goal had been as we talk about the national priorities, the care coordination as one, tomorrow disparities, value, population and health, we want to think about the lessons learned about measurement today and then take that and apply to those priorities to identify again are these challenges present in these particular issues. I guess I will open it up to just have folks throw out themes that we heard today.
DR. TANG: Well, it was a very informative day I thought and I think the challenge that most fits where I am thinking is what Blackford said. Where can we go from what we have been doing which is very process laden and based on the paper world to where we would like to go recognizing that it will have a new application in health reform and it seems like we want to if you start with the morning discussion where we talked about the counterpart to CME where you sit down in some location and try to promote knowledge to practicing with ongoing feedback and that is where the quality measures can come on. Ongoing feedback about how you are doing and how to change. That would be really different and as a consequence that might free us up to instead of saying do you have what we think should be process A and B and C and look more towards are you getting outcome E again can be measured electronically through some of the systems that we are now putting into place. That would be a very different way of looking at both what to measure and how to measure it and at least it stimulated my thinking that.
DR. MIDDLETON: Just a couple of quick thoughts building on Pauls summary. I just wrote down several themes listening through the day. One is how do we enable this idea of the real time quality and management and feedback and reporting and perhaps have an aspirational new approach to quality as opposed to the sometimes pejorative approach, that is, Im not doing good enough. I would rather it tell me, how do I do better. And maybe even combine with that some of the CME and other sorts of affinity type activities that might relate very well to both data acquisition and management for quality.
We talked a lot about the payors becoming harmonized around a core set of national standard quality measures, which certainly would make things a lot easier on the payor side perhaps, the HIT side, and maybe the provider side, as well.
There was a subtle point I think, Frank Opelka made about actionable quality. Make the reports, make the data actionable from the providers point of view to try to bring them along. I raised the issue of what is the sustainable business case for quality post-stimulus. We are in a great transitional stage now period, but what happens when all that goes away?
I guess the last thing would be can we through all of this think about a parsimony quality measures and approach and what national architecture does that imply to get quality all the way from the point of care, all the way up to CMS and to get insights and payment reform all the way back to the provider.
DR. GREEN: I thought one of the most important consolidating statements was there has to be an aggregator data cleaner. I thought that was pretty much right on. I dont feel like we have matured our thinking adequately about this, but I do hear a solid conversation about what was at one point called, concurrency of clinical decision making, local care arrangement, and Pauls point about can there just be a few measures that are independent of all of these things including type of practice arrangements. Thats where the sweet spot is. They have to come together. They have to operate with this word concurrency is a notion.
When they are not concurrent one of the apparent consequences is that things get very complicated, very fast and they get very expensive very fast and they probably get undoable very fast. That struck me as an important thing.
Matt didnt develop it and we didnt really hear this except from his comments, but what a great thing to take home tonight to think about is instead of thinking about this around protocols of care for chronic diseases for mostly older people that have expensive problems is maybe we should be thinking about measuring social networks. That is a good question, in my opinion.
MR. REYNOLDS: What was that again, Larry?
DR. GREEN: That is an observation that instead of us continuing to think about this the way we are accustomed to thinking about it are these care processes, care protocols, the usual suspects and players that are trying to communicate with each other. That maybe we should jump and think about seeing this as a social networking problem and think about how do you measure the way a social network functions and does it accomplish what it is supposed to. Something like that. Im not giving it justice.
DR. MIDDLETON: I can see the NEJM article, Larry, with your name on it. Just like your health behaviors are basically contagious. You are only as smart as the people you practice with.
MR. QUINN: If you are a primary care doctor and you dont have a lot of friends, then you are probably not coordinating care.
(Laughter)
DR. GREEN: Let me invert that. I think it was D.F. Fox back in Lancet, about getting dangerously close to 50 years ago, also pointed out that one of the functions of ones personal doctors to protect that person from the over zealous specialist and we have a lot of evidence that this is true- that this is something that is true. When you start measuring the social network you can argue that a robust primary care system that has few the other referral to visit ratio is low. That measure can either be neglect or it can really robust care and we have to have a measure that is a way of approaching that measurement that distinguishes that.
I really like Matts idea. I wish we could talk about it more and I wish we had someone talking to us about it. Do we have someone testifying tomorrow that knows a lot about this?
DR. CARR: No, we did have that NCVHS afternoon. I dont know if you were there.
DR. GREEN: (off mic)
DR. CARR: That is why we keep meeting.
MR. REYNOLDS: First, my comments are not as chair of NCVHS. They are visitor to this committee. I actually was pretty disappointed. I heard an incredible number of experts be experts on their piece and experts on why their piece was good and what is happening with it and I think that is excellent. But I didnt hear any grouping up. I have actually been a huge fan of meaningful use and I use it a lot at home. I use it a lot everywhere I go. Some of the ability starts sliding further to the right as I listened today, as far as who is agreed with whats in what column. If the industry in essence is truly committed to move this versus prove that individuals or individual groups are the one. That worries me.
I think some of the things that I did like -- I think Blackford we ought to do a primer on how to think about this with some of the slides you did and that Paul did. It really decide this idea because again, when you go home and you start thinking of all the people that are going to have to catch up that dont even know this is going on today and havent seen the chart and oh by the way they got to get somewhere between now and 2013 to do something. There is a lot of help that needs to be done at that end of it and we can use a social network and we use whatever we talk about to get that group to move. We have a lot of educated players around this table but there are a lot of people out there that dont even know what the table would look like. I think that is a key thing.
I think we have to have a light to go to and I think all of us complaining about things like HIPAA but somebody forced it to happen and said this is it and now we have come out with the next versions, which maybe it is going to make a difference, but it grouped everybody up. I think rather than just inviting the vendors, which some of the things that we are going to make sure that we do something that they will play in because I think we all would say there arent many doctors writing their own systems and there arent many practices out there with a whole lot of time to figure out how to use it so we better deliver that in the right way.
The standard data set, I will speak as a payor. I would love to know there was a standard data set and I am telling you we would do everything possible to push people there so that once you have the standard data set then you could start actually having audible discussions. Right now it is everybody picking their own side of the discussion and then you quickly start losing your thread of what I really think about and what data is or isnt actually there. I always call them home base. As an implementer I love a home base. We can argue away from home base but if we dont have a home base we just argue on lines and premises that dont happen.
I just think it is a great time for NCVHS to be doing this. I think it is a great time for us to be partnering with David and others, because maybe in some ways we may be able to say things that may be wouldnt be able to be said in other ways.
PARTICIPANT: Say more about that.
MR. REYNOLDS: I would rather not right now. I am only speaking as a visitor today.
PARTICIPANT: (comment off mic)
MR. REYNOLDS: In other words, all the meaningful use hearings we had observations. Out of these hearings we could have observations and/or recommendations in doing so we could help move a ball forward because one thing is they have their shoulder to the ball. They are having to push a lot of balls at the same time. We happened to have a clean space to step back for a moment and we are looking at this we dont have columns, but yet we have understanding of columns and we have understanding of timeframes. We have understanding of things we have done in the past, and the industry has done in the past.
Hopefully we could create a body of work that could be picked up as Paul and I talked, as David and I talked and others have talked, could be picked up and used in ways to make recommendations that didnt necessarily come just from them. That would be my summary.
DR. CARR: I wanted to add one thing too. As we are talking very facedly today about SNOMED, and SNOMED and problem list. I would just like to point out that I dont know many physicians who are nimble with SNOMED. I think there is a whole educational piece that obviously we will have to do some catch up work, but I mean really building into curriculum not just physicians and nursing and all health education. There needs to be a track within the curriculum about measurement and documentation or taxonomy or something because we do not have a workforce that understands us right now.
DR. FITZMAURICE: Maybe the physicians dont have to understand SNOMED but their vendors do and have to give them choices that will lead them to a SNOMED concept so that they can be added across.
DR. MIDDLETON: Harry, thank you for opening the door to allow us to maybe say a little bit more about what we really feel. I struggled a little bit because I found myself, and its not particularly that I know that much about quality, but I found myself kind of in and out. I found a lot of the same sort of presentations about kind of the same stuff in their points of view and what not. That is all well and good. People are expert in their individual areas and what not, but when I got to the late morning presentation, whatever that was, I wanted to kind of get us to think about the bigger picture here. We really have an opportunity to think about the bigger picture because CMS may not exactly be accountable for quality. AHRQ isnt exactly accountable for quality. Who is the quality czar in this country? Is there such a thing? Could that person with an HIT czar help reform with healthcare reform, help reform along? I think that is kind of the bigger set of issues that have to be put on the table. As one of the folks back home likes to say, as Steve always says put all the blood on the table and then lets see what is going on. I dont think we have done that yet.
MR. REYNOLDS: I would like to see the excitement on the table, excitement to go to the same place and to make a difference on that list and try to do some of this even though I dont always have the ball.
DR. FITZMAURICE: Social networking. Sunday, my son Jim who is 39 years old, went out playing touch football with my son Joey who is 41 who was a quarterback and a bunch of other kids, kids of 41. Somebody caught a pass and my son Jim put his hand in there to knock the ball out jammed the index finger so that it wound up breaking two metacarpal bones here. This finger was pointing in the wrong direction and he drove himself to the local hospital. They couldnt get it relocated. They brought in a surgeon who was able to move it and finally get it in, splinted it. They had a before and after. They didnt pin it or anything. He came to our house and his wife came to our house too, with the kids. He gave me a disc that had a DICOM reader on it and his images. I put it in and we looked at it and it looked horrible. It really did. You wonder is that stuff going to stick together or not with nothing holding it.
We sent it around to the FTSE Google group. It is a group of nine kids and their spouses, and maybe a couple of others. The suggestions started coming back. One of them has a relative that just got out of the military. He is a top hand surgeon in York, Pennsylvania. He was at the Ravens game but he is willing to not drink, not have any beer, so that he could look at it tomorrow and then operate on it. Somebody else said here is someone who did Johnnys ACL. You can try him. A lot of suggestions just from people who are not very knowledgeable in medical care. My wife is a nurse. One of my daughters is a nurse. But we got good information that at least gave him some paths to go. He decided to go to the ACL guy and get a referral to maybe a hand specialist if the guy thinks he needs it.
Can you imagine having that with medical professionals, that you send it around and you start getting this feedback?
What are the themes that I saw? Like Harry, I couldnt see how to fit it all together and move forward? I am struck by there is a lot of investment and demonstration of quality measure effectiveness in changing what we want changed.
I saw quality measure architecture and workflows. I liked seeing that. I wanted to see here is the way it should be but I dont know if we are there yet or not. But I think that is what Harry wants to see is what should we be working for. How did we make this happen? How do we aggregate quality measures for effective change? Provider information for clinical decisions and for patient choices.
Another theme. Is there a financial reward system that promotes/rewards, quality improvement and coordination of care? I dont know. Should we focus on is the mean greater than the bar or should we focus on is the distribution changing? The percentage of patients who are below the bar, percentage of patients who are above the bar. Are we moving toward more and more above the bar? The bar is wherever we set it.
We have a transition coming up, a transition from quality measures to other quality measures to performance measures. We need to keep track of the data elements and their attributes. We need to get into different versions of code sets and this is over time. This is not just in the next three or four years. We are having changes in coding systems. How do we handle that and does that impact do our measures mean the same thing? Do we have the same validity as we carry across these changes?
I have already covered the social networking. Maybe a social network is I am a patient and instead of giving me a disc, can you email this to this Google group? Can you email it to my primary care physician so that he can get on the ball and find the best specialist and get me some answers very quickly? It starts to pull everybody together as a team and with some accurate information and then maybe physicians come back and say, you know I really need this additional information. And the tingling at the end of the fingers, is it grossly swollen? That is what the military doctor said. I need to know that before I can do anything.
DR. CARR: Well, thank you everyone. Thank you for calling the question here because I think it really helps us to focus on the job that we set out to do. With that I think we will call it a day and we resume tomorrow morning at nine.
(Whereupon, the meeting adjourned at 5:35 p.m.)