Indivo Transforms

Introduction

For the pipeline to be functional, data must be transformed from its original format into processed medical facts ready to be stored in the database. Each schema in Indivo therefore defines a transform that can be applied to any document that validates against the schema.

Transform Outputs

The ultimate output of the transformation step in the data pipeline is a set of Fact objects ready for storage in the database. However, technologies like XSLT are incapable of producing python objects as output. We looked around for a simple, standard way of modeling data that would meet our needs, and came up empty (though we’re open to suggestions if you think you have the silver bullet). As a result, we’ve created our own language, Indivo Simple Data Modeling Lanaguage (SDML), to both define our data models and represent documents (in XML or JSON) that match them.

Thus, transforms may output data in any of the following formats:

Outputs are validated on a per-datamodel-basis. For data model definitions and example outputs, see Indivo Data Models.

Types of Transforms

Indivo currently accepts Transforms in two formats:

  • XSLT documents
  • Python classes

This may change as Indivo begins to accept data in more, varied formats.

XSLTs

We won’t cover XSLTs in any detail here, as their format and use is clearly outlined in the specification. Since XSLT is traditionally used to transform XML to XML, the most natural output format for XSLTs is SDMX.

Python

For those unskilled in the arts of XSLT, we also allow transforms to be defined using python. To define a transform, simply subclass indivo.document_processing.BaseTransform and define a valid transformation method. Valid methods are:

BaseTransform.to_facts(doc_etree)

Transform an etree into a list of Indivo Fact objects.

Subclasses should implement this method, which takes an lxml.etree._ElementTree (the result of calling etree.parse()), and returns a list of indivo.models.Fact subclasses.

BaseTransform.to_sdmj(doc_etree)

Transform an etree into a string of valid Simple Data Model JSON.

Subclasses should implement this method, which takes an lxml.etree._ElementTree (the result of calling etree.parse()), and returns a string in valid SDMJ format.

BaseTransform.to_sdmx(doc_etree)

Transform an etree into a string of valid Simple Data Model XML.

Subclasses should implement this method, which takes an lxml.etree._ElementTree (the result of calling etree.parse()), and returns another lxml.etree._ElementTree instance representing an XML document in valid SDMX format.

Adding Custom Transforms to Indivo

Associating a new transform with an Indivo-supported schema is simple:

  • Write your transform, as either an XSLT or a Python module, as described above.
  • Drop the file containing your transform (transform.xslt or transform.py: make sure to name the file ‘transform’) into the directory containing the schema. See Adding Custom Schemas to Indivo for more details.
  • Make sure to restart Indivo after moving transform files around, or the changes won’t take effect.