Comments of Ciaran S. Phibbs, Ph.D.

HSR&D Center for Health Care Evaluation, Veterans Affairs Palo Alto Health Care System
Cooperative Studies Program, Veterans Affairs Palo Alto Health Care System
Department of Health Research and Policy, Stanford University

Address: Center for Health Care Evaluation
VA Medical Center (152)
795 Willow Road, Menlo Park, CA 94025.
(415) 493-5000 x22813. Fax: (415) 617-2736.
E-mail: cphibbs@odd.stanford.edu.


Dr. Phibbs is not testifying as an official representative of the Department of Veterans Affairs.

3. Data available for health policy analysis and assessments.

Adding unique identifiers to data sets will significantly increase the research potential of most, if not all, data sets. Unique identifiers will allow the linking of data sets, which can allow research to conduct more detailed analyses than would otherwise be possible. Based on my past experience with data linkages, I would recommend that these standards include a set of identifying information, instead of relying on just a unique identification number. This can be very important to the outcome of some research projects. A major problem that health researchers face today is that the information they really need is contained in multiple data sets that can not be linked with each other. The researchers is thus faced with having to decide which data is less vital and must be excluded.

This is especially a problem when individuals obtain health care from different health care systems that do not report data to any common data sets. For example, use of health care by Department of Veterans Affairs (VA) facilities are not reported to any other body. But many veterans also obtain care outside of the VA system. Some, but not all of these encounters are reported in state Medicaid data, in state hospital discharge data, or in Medicare data. This presents problems for the analysis of care from many different perspectives, since use of any one of these data sources will exclude some health care utilization. While it is possible to link some of these data sets, it tends to be a laborious process. If all health related data sets had a standard set of unique identifiers these linkages would be much easier. There are many similar examples.

It is not just for research completeness that I think it is important to be able link these data sets. Since the selection to participate in certain types of care can be non- random, findings based on unlinked data can have systematic errors; errors which could potentially mis-direct policy.

Another potential gain from data linking occurs when complimentary data are contained in different data sets. I will use as an example an analysis I recently published in JAMA. (Phibbs CS, Bronstein JM, Buxton E, Phibbs RH. The Effects of Patient Volume and Level of Care at the Hospital of Birth on Neonatal Mortality. JAMA 1996;276:1054-1059.) This study looked at how the patient volume and level of neonatal care available at the hospital of birth effected neonatal mortality. Using a previously unavailable linkage between birth certificates and hospital discharge abstracts, this study was able to do a more complete case-mix adjustment than was previously possible. The findings showed that the mortality advantage of being born in a hospital with a high-volume tertiary facility were much larger than had previously been reported. We are preparing the details of the comparison between our method and previously used methods in a manuscript we hope to submit for publication very soon. I have included a table from that draft manuscript that shows how the results differ when I use information from only one of the two data sets, compared to using information from both data sets. The combined data yielded much better predictive power and, more importantly, the policy relevant findings were quite different. Using just birth certificate data the qualitative effect is the same, but the size of the estimated effects are smaller and not all of them were statistically significant. When only the discharge data are used, the qualitative findings are totally different.

The cause of my findings brings up an issue that applies beyond my study. The reason that the findings changed when I had linked data is that the addition of the discharge data allowed the analysis to control for diagnoses of the patients. This partially corrected for the selection bias due to selective referrals of a disproportionate share of the highest risk cases to tertiary centers. But, the ICD codes only control for diagnoses, not the severity of the diagnoses. Given the direction of the selection bias, it is reasonable to expect that my estimates are still biased. As more complete clinical data sets become available, the linkage to clinical data could reduce the extent of this bias. This is a problem for many health research projects. While some research has found that the value of such data may be limited, I believe that in cases where we know there is systematic referral of at least some of the highest-risk patients to selected hospitals that these additional data can be quite important.

7. Classification for outpatient transactions.

The problem with the current system is that different procedure coding systems are used for inpatient and outpatient care. Further, physician billing data use the outpatient coding system, even for inpatient care. The main problem from a research perspective is that the two coding systems do not perfectly map to each other. But, forcing a standardization on either system would not solve all of the problems because the two coding systems are not perfect substitutes; each contains some complimentary information in some cases. In an ideal revision, there would be a new coding system that included all of this complimentary information.

9. Concerns about privacy provisions.

I have no objection to the implementation of privacy provisions for data. But, I think that it is very important that the data remain available and accessible to qualified researchers. I acknowledge that as more data are linked, that the risks to patient confidentiality goes up. I believe that protection level of these data should also increase as the potential risks increase. But, these procedures should remain reasonable so that qualified researchers can have access to the data. For example, California is now linking the data I used for my study (birth certificate and discharge abstract data). While public use versions of the individual data sets are available for research use, the linked data set will not be a public use data set, even when all of the confidential individual identifiers have been deleted. But, the non-confidential version of these linked data will be available to researchers, subject to the completion of a review process. I view this as a reasonable precaution to prevent unauthorized use of these data. My only concerns are that any rules on the review process to obtain non- public data not be too burdensome, and that it remain possible to obtain the non-public data in a timely manner.

TABLE 3

The Effect of Adding Discharge Data to Birth Certificate Data on

Logistic Regression Estimates of the Combined Effects of NICU Patient Volume

and Level of NICU Care at the Hospital of Birth on Neonatal Related Mortality&,#,%

Level of Care

Average NICU Census

Odds Ratio, Compared to High-Volume Level III Hospitals, Using Discharge Data Only (95% confidence interval)%

Odds Ratio, Compared to High-Volume Level III Hospitals, Using Birth Certificate Data Only (95% confidence interval)%

Odds Ratio, Compared to High-Volume Level III Hospitals, Using Birth Certificate and Discharge Data (95% confidence interval)%@

Level I

0.95 (0.71, 1.29)

1.31 (1.01, 1.68)*

1.59 (1.17, 2.15)**

Level II

< 5 patients

0.86 (0.61, 1.21)

1.33 (1.01, 1.75)*

1.40 (1.01, 1.96)*

Level II

> 5 patients

1.01 (0.67, 1.52)

1.18 (0.83, 1.68)

1.42 (1.01, 2.02)*

Level II+

< 15 patients

1.03 (0.76, 1.40)

1.26 (1.03, 1.55)*

1.42 (1.11, 1.82)**

Level II+

> 15 patients

1.36 (0.98, 1.88)

1.32 (1.03, 1.69)*

1.52 (1.14, 2.03)**

Level III

< 15 patients

0.98 (0.71, 1.36)

1.41 (1.13, 1.76)**

1.55 (1.22, 1.96)**

Level III

> 15 patients

reference

reference

reference

Area Under ROC Curve

---

0.82

0.85

0.92

* p<0.05** p<0.01

& Mortality includes any death within first 28 days of life, regardless of location, plus any deaths after 28 days within the first year of life if the infant was never discharged from the hospital.

# The regression estimates also control for birth weight, sex and race/ethnicity of the infant, type of insurance, maternal education, amount of prenatal care, and if the birth occurred in a county hospital, a proprietaary hospitals, or a teaching hospital. The results with discharge data also control for clinical diagnoses.

% The standard errors have been corrected for within hospital correlation using the method of generalized estimating equations.