On behalf of myself, and my colleagues at the NCI, I thank the Standards and Security Subcommittee of the NCVHS for the opportunity to provide comments on scope and criteria of terminologies for patient medical record information (PMRI). I am Margaret Haber, technical information specialist with the Division of Cancer Information Products and Systems, Office of Communications of the National Cancer Institute (CIPS/OC). I serve as a coordinator and project officer for efforts related to both division and institute-level vocabulary development, including the disease and drug terminology for NCI Thesaurus, and mapping between vocabularies for multiple systems at NCI.
CIPS division is responsible for the production of Physician Data Query (PDQ), NCI's database of evidence-based review summaries on cancer treatment, screening, prevention, genetics and supportive care. The PDQ clinical trials database currently contains some 1,800 open and 12,000 closed cancer clinical trials. The division maintains Cancer.gov, the portal website to NCI, and provides primary content for the NCI Cancer Information Service. As providers of information at the institute level, we also work cooperatively with the NCI Center for Bioinformatics on NCI Thesaurus and NCI Metathesaurus, an institute-level vocabulary, and meta-vocabulary environment, which are core components of the Centers Enterprise Vocabulary Services (EVS).
The issue of specifications and standards for PMRI and related terminologies is one of acute concern to the NCI, as the Institute is both a user and a provider of vocabularies related to our diverse missions. The NCI represents a microcosm of the broader medical community on issues of standards for interoperability such as vocabulary. We face several challenges, among them:
1) Diverse missions, organizations, and goals
There is a historical disconnect of data collection priorities and methods, with information housed in discrete systems and coded using different vocabularies, for:
- portfolio management
- clinical care
- research
- public and provider information
- epidemiology
Furthermore there can be differences in the meanings attached to a particular term in the vocabularies of these various communities. The term prostate may mean to an anatomist the gland, to a pathologist the tissue, to an oncologist shorthand for a malignant lesion.
2) Inheritance of legacy systems
Achieving connectivity between established, mission-critical systems involves complex issues of data conversion, storage and retrieval, especially for retrospective research. Barriers include not only disparate electronic systems, but at times the lack of an electronic record entirely.
3) Changing health information models
Health care, and cancer care, is shifting from an emphasis on treatment to one of prevention. Even more fundamental is the rapid transformation of disease models from traditional to the new genetic and molecular classifications.
4) The pace of change in science, technology, and methods
Vocabulary content and structure must quickly reflect these advances in order for them to remain useful, while still permitting the identification and retrieval of artifacts tagged and coded using earlier systems and vocabulary versions.
We have approached these issues at NCI with enterprise level initiatives to integrate both intramural and extramural data resources, particularly for clinical trials, in collaboration with outside partners such as the Clinical Trials Cooperative Groups. The Center for Bioinformatics is developing tools and technologies to support these efforts, including a cancer data standards repository (caDSR), to standardize metadata for clinical cancer research. The Center is also collaborating with CIPS/OC to develop NCI Thesaurus, a controlled reference terminology to support NCIs vocabulary needs from basic and translational research to clinical care. The effort was spurred by the realization that linking related concepts in the fields of populations, genetics, developmental therapies and disease, was essential to our mission. The core NCI Thesaurus currently contains some 20,000 concepts and 80,000 terms across multiple domains in the cancer realm.
Cancer information encompasses a broad spectrum of medical knowledge, as cancer and its treatment potentially impact all aspects of human health. The rapid evolution of information models in cancer mirrors the challenge of addressing knowledge systems at the broader level of standardized PMRI. For the NCI, this fundamental and ongoing transformation means that any proposed information model not incorporating the assumption of change is destined to failure.
Such rapid changes in both information models and technological capabilities call into question the usefulness of a standard-setting process that moves too slowly. Those providing information services and clinical care in the here and now must cope with how to manipulate and present their data today. In contrast, standards development must anticipate the future evolution of PMRI. More specifically, vocabulary standards must be able to ensure:
These are perhaps more important indicators of the ultimate success of a standard vocabulary than simply an examination of their current contents.
HIPAA requires portability, accessibility and security of data, and hence requires standardized interchange of information across systems. Stable, reliable and consistent meaning of vocabularies and codes is a fundamental component of this requirement. Integrated coding vocabularies are also essential to realizing the goal of building and expanding the National Health Information Infrastructure (NHII), cooperative and Federal eHealth, and other related efforts to provide the structures and information resources for personal, provider, and public health data needs.
While I will refrain from comment on the individual vocabularies, in examining those being considered as standards it would be instructive to see a table showing coverage of the various vocabulary and code sets by content domain, as several of the specified vocabularies have crossover content, i.e. for example, the ICDs, LOINC, and SNOMED. Providers should be asked to indicate depth of coverage in each domain. Some vocabularies necessary to provide adequate coverage of special domain areas, notably ICD-03 for oncology reporting, may have been omitted. An illustrative (only) biaxial chart structured as shown in Table 1 might better reflect the content coverage of the specified products.
The PMRI selection process should accord priority to terminology development efforts that are linked to, and integrated with, related clinical document architecture, format and messaging standards, such as the HL7 vocabulary standards development process. Representatives from the major divisions of NCI will become more active participants in the standards activities of HL7 as we feel that an inclusive, collaborative process offers the best opportunity to leverage the power of multiple developers. Cooperative endeavors at the federal level such as the National Drug File Reference Terminology (NDF-RT), a joint development effort of the VA, FDA, NLM, and most recently NCI, may also serve as a model for further collaborations.
The criteria outlined by the committee for PMRI terminologies cover the main principles for sound practice in the creation and maintenance of medical vocabularies, and would generally apply to all sources. Under the category of general criterion, I would add that vocabulary sources should have explicit, defined, and regular processes for technical and content quality review and assurance. And I would emphasize again under maintenance the necessity of a consistent, reliable, responsive and rapid update process sensitive to the changing needs of the national health community. Content control and regular review by domain experts is essential for usability and validity, and should be a hallmark of every specialized vocabulary or vocabulary domain.
PMRI terminology developers should be encouraged, but not necessarily required, to be accredited as ANSI standard developers. Market acceptance of well-principled terminologies can create de facto standards. However standards that purport to serve the needs of a diverse community ought to be open to community involvement.
Standard coding vocabularies, especially those promoted as major reference terminologies, should not be burdened by undue intellectual property constraints. Additions and enhancements to core structures must not be impeded by copyright restrictions. Terminologies are most likely to serve the community if the community has a sense of ownership, and can bring their expertise to bear in making improvements. The inherent difficulties in overcoming such intellectual property barriers spurred recent initiatives to develop standardized, open resources such as the newly announced Open Health Terminology (OHT) effort.
Finally, PMRI terminologies cannot be examined without looking also at the tools available to create and maintain them, and the integrity of the mappings between them. In fact, accurate mapping is a task often as complex, and as much an intellectual product, as the vocabulary itself. The ownership, maintenance and integrity of mappings between all major, required reporting vocabularies for PMRI should be as explicit as the requirements for maintenance of the terminologies themselves. The relative lack of sophisticated tools for this task presents another barrier to integration.
The challenges encountered by the NCI in creating a more unified terminology for oncology reflect many of the same complexities involved in terminology standards for PMRI. We will continue to work with the health care community on shared solutions. I would like to thank my colleagues Frank Hartel, Sherri de Coronado, James Oberthaler and Larry Wright for their contributions to this testimony. I thank the committee again for the opportunity to provide comments on this process.