Indivo Transforms ================= Introduction ------------ For the pipeline to be functional, data must be transformed from its original format into processed medical facts ready to be stored in the database. Each schema in Indivo therefore defines a :term:`transform` that can be applied to any document that validates against the schema. .. _transform-output-types: Transform Outputs ----------------- The ultimate output of the transformation step in the data pipeline is a set of :term:`Fact objects ` ready for storage in the database. However, technologies like XSLT are incapable of producing python objects as output. We looked around for a simple, standard way of modeling data that would meet our needs, and came up empty (though we're open to suggestions if you think you have the silver bullet). As a result, we've created our own language, :doc:`sdm`, to both define our data models and represent documents (in XML or JSON) that match them. Thus, transforms may output data in any of the following formats: * :ref:`Simple Data Model JSON (SDMJ) ` * :ref:`Simple Data Model XML (SDMX) ` * Python Fact objects Outputs are validated on a per-datamodel-basis. For data model definitions and example outputs, see :doc:`data-models/index`. Types of Transforms ------------------- Indivo currently accepts Transforms in two formats: * XSLT documents * Python classes This may change as Indivo begins to accept data in more, varied formats. XSLTs ^^^^^ We won't cover XSLTs in any detail here, as their format and use is clearly outlined `in the specification `_. Since XSLT is traditionally used to transform XML to XML, the most natural output format for XSLTs is :ref:`SDMX `. .. _python-transforms: Python ^^^^^^ For those unskilled in the arts of XSLT, we also allow transforms to be defined using python. To define a transform, simply subclass :py:class:`indivo.document_processing.BaseTransform` and define a valid transformation method. Valid methods are: .. automethod:: indivo.document_processing.BaseTransform.to_facts As an example, here's how you might implement this for a transform from our :doc:`Simple Clinical Note Schema ` to our :doc:`Simple Clinical Note Data Model `:: from indivo.document_processing import BaseTransform from indivo.models import SimpleClinicalNote from lxml import etree NS = "http://indivo.org/vocab/xml/documents#" class Transform(BaseTransform): def to_facts(self, doc_etree): args = self._get_data(doc_etree) # Create the fact and return it # Note: This method must return a list return [SimpleClinicalNote(**args)] def _tag(self, tagname): return "{%s}%s"%(NS, tagname) def _get_data(self, doc_etree): """ Parse the etree and return a dict of key-value pairs for object construction. """ ret = {} _t = self._tag # Get the date_of_visit ret['date_of_visit'] = doc_etree.findtext(_t('dateOfVisit')) # Get the date finalized ret['finalized_at'] = doc_etree.findtext(_t('finalizedAt')) # Get the visit_type visit_type_node = doc_etree.find(_t('visitType')) ret['visit_type'] = visit_type_node.text ret['visit_type_type'] = visit_type_node.get('type') ret['visit_type_value'] = visit_type_node.get('value') ret['visit_type_abbrev'] = visit_type_node.get('abbrev') # Get the visit location ret['visit_location'] = doc_etree.findtext(_t('visitLocation')) # Get the specialty of the clinician specialty_node = doc_etree.find(_t('specialty')) ret['specialty'] = specialty_node.text ret['specialty_type'] = specialty_node.get('type') ret['specialty_value'] = specialty_node.get('value') ret['specialty_abbrev'] = specialty_node.get('abbrev') # get the signature signature_node = doc_etree.find(_t('signature')) provider_node = signature_node.find(_t('provider')) ret['signed_at'] = signature_node.findtext(_t('at')) ret['provider_name'] = provider_node.findtext(_t('name')) ret['provider_institution'] = provider_node.findtext(_t('institution')) # get the chief complaint ret['chief_complaint'] = doc_etree.findtext(_t('chiefComplaint')) # get the content of the note ret['content'] = doc_etree.findtext(_t('content')) return ret .. automethod:: indivo.document_processing.BaseTransform.to_sdmj As an example, here's how you might implement this for a transform from our :doc:`Simple Clinical Note Schema ` to our :doc:`Simple Clinical Note Data Model `. Note the reuse of the ``_get_data()`` function from above:: from indivo.document_processing import BaseTransform SDMJ_TEMPLATE = ''' { "__modelname__": "SimpleClinicalNote", "date_of_visit": "%(date_of_visit)s", "finalized_at": "%(finalized_at)s", "visit_type": "%(visit_type)s", "visit_type_type": "%(visit_type_type)s", "visit_type_value": "%(visit_type_value)s", "visit_type_abbrev": "%(visit_type_abbrev)s", "visit_location": "%(visit_location)s", "specialty": "%(specialty)s", "specialty_type": "%(specialty_type)s", "specialty_value": "%(specialty_value)s", "specialty_abbrev": "%(specialty_abbrev)s", "signed_at": "%(signed_at)s", "provider_name": "%(provider_name)s", "provider_institution": "%(provider_institution)s "chief_complaint": "%(chief_complaint)s", "content": "%(content)s" } ''' class Transform(BaseTransform): def to_sdmj(self, doc_etree): args = self._get_data(doc_etree) return SDMJ_TEMPLATE%args .. automethod:: indivo.document_processing.BaseTransform.to_sdmx As an example, here's how you might implement this for a transform from our :doc:`Simple Clinical Note Schema ` to our :doc:`Simple Clinical Note Data Model `. Note the reuse of the ``_get_data()`` function from above:: from indivo.document_processing import BaseTransform from lxml import etree from StringIO import StringIO SDMX_TEMPLATE = ''' %(date_of_visit)s %(finalized_at)s %(visit_type)s %(visit_type_type)s %(visit_type_value)s %(visit_type_abbrev)s %(visit_location)s %(specialty)s %(specialty_type)s %(specialty_value)s %(specialty_abbrev)s %(signed_at)s %(provider_name)s %(provider_institution)s %(chief_complaint)s %(content)s ''' class Transform(BaseTransform): def to_sdmx(self, doc_etree): args = self._get_data(doc_etree) return etree.parse(StringIO(SDMX_TEMPLATE%args)) .. _add-transform: Adding Custom Transforms to Indivo ---------------------------------- Associating a new transform with an Indivo-supported schema is simple: * Write your transform, as either an XSLT or a Python module, as described above. * Drop the file containing your transform (``transform.xslt`` or ``transform.py``: make sure to name the file 'transform') into the directory containing the schema. See :ref:`add-schema` for more details. * Make sure to restart Indivo after moving transform files around, or the changes won't take effect.