Skip to main content

English Summary

Personalised multiscale models of diseases have the potential to predict future events and individual outcome of therapy. It is the aim of this research project to accelerate predictive research and the translation of its results into clinical practice through a novel generation of advanced semantic data integration tools and a corresponding information technology platform that will facilitate the exploitation of health related heterogeneous data sources for the development, validation and clinical usage of predictive models.

The platform will consist of a flexible, ontology-based information extraction and annotation framework to make harmonised clinical data of structured and unstructured heterogeneous sources available in a data warehouse. This framework will be combined with a modelling workbench that supports further analysis of the data in the data warehouse, clinical model validation with corresponding data collection, and the deployment and subsequent clinical usage of the models with feeding the models with patient data in clinical practice. Performance assessments of the resulting platform and tools will be covered by modelling research of high impact in transplantation medicine.

Hematopoietic stem cell transplantation (HSCT) can cure or improve the outcome in a variety of primarily hematological diseases. The associated immune dysfunction causes severe complications like viral infections, graft versus host disease (GvHD) and relapse. Depending on risk factors like HLA matching, disease stage, conditioning regimen, recipient and donor age among others, severe GvHD occurs in up to 15 % and relapse in up to 30 % of recipients. Treatment options are limited and rely on cost-intensive and time-consuming procedures. In addition, diseases associated with viral infections in HSCT patients are of interest for the presented proposal. Cytomegalovirus (CMV) infection/reactivation is still the major viral complication after allogeneic HSCT, which occurs in about 70 % in seropositive recipients. Only early diagnosis and therapy prevents high mortality rates.

Currently, models integrating viral kinetics, immune function and clinical patient data that can predict the risk of infection, GvHD and relapse supporting clinical therapy decisions are lacking and would be highly desirable.

In consequence, XplOit comprises the following objectives:

  1. To exploit the full potential of all kinds of available semantic resources such as controlled vocabularies, terminologies, coding systems, classifications, thesauri and ontologies for data integration in order to provide a semantic information extraction and annotation framework for existing heterogeneous, structured and unstructured health data. This general semantic framework will easily allow data owners to make harmonised data available in a data warehouse in compliance with privacy regulations.
  2. To develop and integrate in the above framework data integration pipelines for the most common types of data objects in the clinical domain, such as HL7 based clinical data sets, imaging data with a focus on pathology, flat files with lab data in comma separated values, clinical trial data in CDISC-ODM form and omics data with the focus on miRNA data.
  3. To develop a generic information extraction tool for unstructured clinical documents, including scans of still widely used paper based health records, to make relevant parts of this data available for research.
  4. To provide a novel modular and extendable ontological framework which enables researchers to re-use and interrelate relevant parts of available semantic resources. This ontological framework is facilitated by a highly automated ontology aggregator tool allowing users to assemble individual ontology modules under a unified axiomatic structure.
  5. To provide a novel highly automated, flexible data description tool which can support both semantic data access and requests. Data owners will easily be able to create coherent data annotations on the fly and model builders will be able to create coherent semantic descriptions of the data with the help of a semantic form builder in case they have not been annotated already.
  6. To integrate these components into an IT platform for modelling support and complement them with an ontology based clinical trial management system that can synchronize data with a data warehouse in order to use retrospective data for model development or prospective ones for model validation and performance assessments.
  7. To facilitate the dissemination and application of predictive models by integrating a generic scalable model deployment service in this platform to release models with semantic descriptions via a portal for clinical usage.
  8. To ensure through a legal framework that privacy rights of patients are fully respected, as well as the interests of data owners and researchers on intellectual property.
  9. To customize and deploy this IT platform for translational predictive research in the domain of clinical virology and transplant medicine.
  10. To validate the created platform for predictive modelling for transplant medicine with the aim to establish models that can predict the course of viral infections and virus associated morbidities like PTLD and haemorrhagic cystitis as well as the non-virus related events of relapse and GvHD. Complementary to this, the IT platform will be further validated by the wider research community whether it will meet their data integration and modelling needs.