Indivo Data Models

Introduction

Data Models in Indivo describe the format in which Indivo represents medical information. They are NOT the same as Schemas, which describe formats that Indivo recognizes as valid input data. Rather, data models describe the final processed state of medical data in Indivo: how data are stored, how they are queryable via the Query API, and how they are returned via the Reporting API.

We also introduce one additional term: Medical Facts. A Fact is one datapoint corresponding to a data model: for example, a latex allergy is a Fact that is an instance of the Allergy data model. Internally, Indivo represents facts as Python objects, so you’ll see us referencing medical facts as fact objects as well.

Defining a Data Model

At its most basic level, a data model definition is just a list of fields and their types. For example, our Problem data model is defined as (some fields omitted):

  • date_onset: Date
  • date_resolution: Date
  • name: String
  • comments: String
  • diagnosed_by: String

This is pretty simple, and we’d like to enable others add new data models to Indivo just as easily. So we currently allow two formats for defining data models:

Django Model Classes

Since our data models are directly mapped to database tables using Django’s ORM, they are most effectively represented as Django Models. Django has a flexible, powerful method for expressing fields as python class attributes, so data models defined in this way can harness the full capabilities of the Django ORM. Of course, representing data models in this way requires some knowledge of python. For a full reference of Django models, see Django models and Django model fields.

One important Indivo-specific note: when defining Django Model Classes, make sure to subclass indivo.models.Fact, which will ensure that your class can be treated as a data model. For example, your class definition might look like:

from indivo.models import Fact
from django.db import models

class YourModel(Fact):
    your_field1 = models.CharField(max_length=200, null=True)

    ...

    # Additional fields here

Custom Django Model Fields

For modeling medical data, Indivo provides some custom Field Subclasses. These fields represent their data as multiple separate database fields, with names formed from the original field’s name and some appended sufffixes (see the classes below for some examples). You should use these fields as if they were any other Django Model Field:

from indivo.models import Fact
from django.db import models
from indivo.fields import YourFavoriteFieldSubclass

class YourModel(Fact):
    normal_field = models.CharField(max_length=200, null=True)
    special_field = YourFavoriteFieldSubclass()

Now YourModel has both a standard CharField, and also other fields defined by the Field Subclass. We define the following Field Subclasses:

class indivo.fields.CodedValueField(Type)

A field for representing coded data elements.

Creating a CodedValueField named ‘value’, for example, will (under the hood) create thee fields:

  • value_identifier, the system-specific identifier that represents the element (i.e. an RXNorm CUI)
  • value_title, the human-readable title of the element
  • value_system, the coding system used to represent the element

When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original value field name.

class indivo.fields.ValueAndUnitField(Type)

A field for representing data elements with both a value and a unit.

Creating a ValueAndUnitField named ‘frequency’, for example, will (under the hood) create the fields:

  • frequency_value, the value of the element
  • frequency_unit, the units in which the value is measured

When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original frequency field name.

class indivo.fields.AddressField(Type)

A field for representing a physical address.

Creating an AddressField named ‘address’, for example, will (under the hood) create the fields:

  • address_country, the country in which the address is located
  • address_city, the city in which the address is located
  • address_postalcode, the postalcode of the address
  • address_region, the region (state, in the US) in which the address is located
  • address_street, the street address (including street number, apartment number, etc.) at which the address is located

When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original address field name.

class indivo.fields.NameField(Type)

A field for representing a person’s name.

Creating a NameField named ‘name’, for example, will (under the hood) create the fields:

  • name_family, the family (last) name of the person
  • name_given, the given (first) name of the person
  • name_middle, the middle name of the person
  • name_prefix, the prefix (i.e. ‘Mr.’, ‘Sir’, etc.) for the person’s name
  • name_suffix, the suffix (i.e. ‘Jr.’, ‘Ph.D.’, etc.) for the person’s name

When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original name field name.

class indivo.fields.TelephoneField(Type)

A field for representing a telephone number.

Creating a TelephoneField named ‘phone’, for example, will (under the hood) create the fields:

  • phone_type, The type of the phone number, limited to h (home), w (work), or c (cell)
  • phone_number, The actual phone number
  • phone_preferred_p, Whether or not this number is a preferred method of contact (True or False)

When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original phone field name.

class indivo.fields.PharmacyField(Type)

A field for representing a pharmacy.

Creating a PharmacyField named ‘pharmacy’, for example, will (under the hood) create three fields:

  • pharmacy_ncpdpid, the pharmacy’s National Council for Prescription Drug Programs (NCPDP) ID number
  • pharmacy_adr, the address at which the pharmacy is located (an AddressField)
  • pharmacy_org, the name of the organization that owns the pharmacy

When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original pharmacy field name.

class indivo.fields.ProviderField(Type)

A field for representing a medical provider.

Creating a ProviderField named ‘doc’, for example, will (under the hood) create the fields:

  • doc_dea_number, the provider’s Drug Enforcement Agency (DEA) number
  • doc_ethnicity, the provider’s ethnicity
  • doc_npi_number, the provider’s National Provider Identification (NPI) number
  • doc_preferred_language, the provider’s preferred language
  • doc_race, the provider’s race
  • doc_adr, the provider’s address (an AddressField)
  • doc_bday, the provider’s birth date
  • doc_email, the provider’s email address
  • doc_name, the provider’s name (a NameField)
  • doc_tel_1, the provider’s primary phone number (a TelephoneField)
  • doc_tel_2, the provider’s secondary phone number (a TelephoneField)
  • doc_gender, the provider’s gender, limited to m (male) or f (female)

When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original doc field name.

class indivo.fields.VitalSignField(Type)

A field for representing a single measurement of a vital sign.

Creating a VitalSignField named ‘bp’, for example, will (under the hood) create the fields:

  • bp_unit, the unit of the measurement
  • bp_value, the value of the measurement
  • bp_name, the name of the measurement (a CodedValueField)

When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original bp field name.

class indivo.fields.BloodPressureField(Type)

A field for representing a blood pressure measurement.

Creating a BloodPressureField named ‘bp’, for example, will (under the hood) create the fields:

When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original bp field name.

class indivo.fields.ValueRangeField(Type)

A field for representing a range of values.

Creating a ValueRangeField named ‘normal_range’, for example, will (under the hood) create the fields:

When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original normal_range field name.

class indivo.fields.QuantitativeResultField(Type)

A field for representing a quantitative result, and expected ranges for that result.

Creating a QuantitativeResultField named ‘lab_result’, for example, will (under the hood) create the fields:

  • lab_result_non_critical_range, the range outside of which results are ‘critical’ (a ValueRangeField)
  • lab_result_normal_range, the range outside of which results are ‘abnormal’ (a ValueRangeField)
  • lab_result_value, the actual result (a ValueAndUnitField)

When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original lab_result field name.

Simple Data Modeling Language (SDML)

For those less python-savvy who are still capable of thinking in terms of ‘fields’ and ‘types’ (which should be most people), we’ve defined a JSON-based modeling language for defining the very simple data models easily. SDML is less flexible than Django’s modeling language, but is much quicker to get started with and is less verbose for describing simple models. See our documentation of the language here.

Feeling Lost?

For help getting started, see our core data models, below, each of which provide definitions both in SDML and Django Model classes.

Data Models and the Query API

Since the Query API allows app developers to directly apply filters and ranges to the datamodels they are selecting, they need to know what fields they are allowed to query against. The answer is simple:

ANY FIELD ON A DATA MODEL THAT IS NOT A RELATION TO ANOTHER MODEL MAY BE USED IN THE QUERY API!

For example, we introduced the ‘Problem’ model above, which has the fields:

  • date_onset: Date
  • date_resolution: Date
  • name: String
  • comments: String
  • diagnosed_by: String

If you were making an API call such as GET /records/{RECORD_ID}/reports/minimal/problems/, you could filter by any of:

  • date_onset
  • date_resolution
  • name
  • comments
  • diagnosed_by

If the problems model were a bit more complicated, and had another field:

  • prescribed_med: Medication

You wouldn’t be able to filter by prescribed_med, since that field is a relation to another model.

The only exceptions to this rule are custom Django Model Fields. Such fields are translated into fields with other names, as described above. Any of these fields may be used in the query API, but (for example), when looking at a model with a CodedValue element such as:

  • problem_type: CodedValue

You would be able to filter by problem_type_identifier, problem_type_title, or problem_type_system, but not by problem_type itself.

Core Data Models

Here is a listing of the data models currently supported by Indivo. Each instance might define other, contributed models: see below for information on how to add data models to Indivo.

Advanced Data-Model Tasks

Adding Advanced Features to a Data-Model

For complicated data models, a simple SDML definition just won’t suffice. For a few specific features, such as custom object serialization or creation-time field validation, you can define (in python) an extra options file for a data model.

This file should be named extra.py, and can be dropped into the filesystem next to any data model, as described below. The file should contain subclasses of indivo.data_models.options.DataModelOptions, each of which describes the options for one data model defined in the model.py file in the same directory. Options are:

class indivo.data_models.options.DataModelOptions(Type)

Defines optional extra functionality for Indivo datamodels.

To add options to a datamodel, subclass this class and override its attributes.

Currently available options are:

  • model_class_name: Required. The name of the datamodel class to attach to.
  • serializers: Custom serializers for the data model. Should be set to a subclass of indivo.serializers.DataModelSerializers.
  • field_validators: Custom validators for fields on the data model. A dictionary, where keys are field names on the model, and values are lists of Django Validators to be run against the field.

For example, here’s our options file for the Problem data model:

from indivo.serializers import DataModelSerializers
from indivo.data_models.options import DataModelOptions
from indivo.validators import ExactValueValidator

SNOMED_URI = 'http://purl.bioontology.org/ontology/SNOMEDCT/'

class ProblemSerializers(DataModelSerializers):

    def to_rdf(queryset, result_count, record=None, carenet=None):
        # ... our SMART RDF serializer implementation here ... #
        return 'some RDF'

class ProblemOptions(DataModelOptions):
    model_class_name = 'Problem'
    serializers = ProblemSerializers
    field_validators = {
      'name_system': [ExactValueValidator(SNOMED_URI)],
    }

Make sure to restart Indivo for your changes to take effect after you add your extra.py file–but there’s no need to reset Indivo.

Adding Custom Serializers to a Data-Model

By default, when returning data via the generic reporting API, Indivo will attempt to serialize data as SDMJ or SDMX, depending on the requested response format. If you need your data to come back in other formats, or if the default serializers aren’t smart enough to represent your data model correctly, you can implement custom serializers for the data model.

Defining the Serializers

Serializers for a data model are implemented as simple methods that take a Django queryset object, and return a serialized string. For a given data-model, you should define a subclass of indivo.serializers.DataModelSerializers, and add your desired serializers as methods on the class. Currently, available serializers are:

to_xml(queryset, result_count, record=None, carenet=None)

returns an XML string representing the model objects in queryset.

Parameters:
  • queryset (QuerySet) – the objects to serialize
  • result_count (integer) – the total number of items in queryset
  • record (Record) – the patient record that the objects belong to, if available.
  • carenet (Carenet) – the Carenet via which the objects have been retrieved, if available.
Return type:

string

to_json(queryset, result_count, record=None, carenet=None)

returns a JSON string representing the model objects in queryset.

Parameters:
  • queryset (QuerySet) – the objects to serialize
  • result_count (integer) – the total number of items in queryset
  • record (Record) – the patient record that the objects belong to, if available.
  • carenet (Carenet) – the Carenet via which the objects have been retrieved, if available.
Return type:

string

to_rdf(queryset, result_count, record=None, carenet=None)

returns an RDF/XML string representing the model objects in queryset.

Parameters:
  • queryset (QuerySet) – the objects to serialize
  • result_count (integer) – the total number of items in queryset
  • record (Record) – the patient record that the objects belong to, if available.
  • carenet (Carenet) – the Carenet via which the objects have been retrieved, if available.
Return type:

string

For example, here’s a (non-functional) implementation of the serializers for the Problems data-model:

from indivo.serializers import DataModelSerializers

class ProblemSerializers(DataModelSerializers):
    def to_xml(queryset, result_count, record=None, carenet=None):
        return '''<Problems>...bunch of problems here...</Problems>'''

    def to_json(queryset, result_count, record=None, carenet=None):
        return '''[{"Problem": "data here"}, {"Problem": "More data here..."}]'''

    def to_rdf(queryset, result_count, record=None, carenet=None):
        return '''<rdf:RDF><rdf:Description rdf:type='indivo:Problem'>...RDF data here...</rdf:Description></rdf:RDF>'''

A couple things to note:

  • The to_*() methods DO NOT take self as their first argument. Under the hood, we actually rip the methods out of the serializers class and attach them directly to the data-model class.
  • The model_class_name attribute is required, and indicates which data-model the serializers should be attached to.

Libraries for Serialization

When serializing models, the following libraries can come in handy:

Attaching the Serializers to a Data Model

Adding custom serializers to a data-model is simple: simply set your DataModelSerializers subclass to the serializers attribute of a DataModelOptions subclass in an extra.py file (see above for info on adding advanced data-model options.

Adding Field Validation to a Data-Model

By default, data models defined in SDML are very permissive: all fields are nullable, and there are no constraints on valid data points other than their type (string, date, etc.). In some cases, a data element could satisfy these constraints, but still be invalid. For example, an Indivo Problem must have its name coded using SNOMED, so a problem without a snomed code is invalid.

Defining the Validators

In such cases, you can attach validators to the data model. Django Validators are essentially just python callables that raise a django.core.exceptions.ValidationError if they are called on an invalid data point. We’ve defined a couple of useful validators, though you could use any function you’d like.

For example, here’s a validator that will accept only the value 2:

from django.core.exceptions import ValidationError

def validate_2(value):
    if value != 2:
        raise ValidationError("Invalid value: %s. Expected 2"%str(value))

Built in Validators

Django provides a number of built-in validators, for which a full reference exists here: https://docs.djangoproject.com/en/1.2/ref/validators/#built-in-validators.

In addition, Indivo defines a few useful validators in indivo.validators:

class indivo.validators.ValueInSetValidator(valid_values, nullable=False)

Validates that a value is within a set of possible values.

The optional ‘nullable’ flag determines whether or not the value may also be empty.

class indivo.validators.ExactValueValidator(valid_value, nullable=False)

Validates that a value is exactly equal to a certain value.

The optional ‘nullable’ flag determines whether or not the value may also be empty.

Attaching Validators to a Data Model

Adding custom validators to a data-model is simple: simply add the validator to the field_validators attribute of a DataModelOptions subclass in an extra.py file (see above for info on adding advanced data-model options).

For example, let’s add the requirement that Problem names must be coded as snomed. We can write the validator using the built-in ExactValueValidator:

from indivo.validators import ExactValueValidator
SNOMED_URI = 'http://purl.bioontology.org/ontology/SNOMEDCT/'
snomed_validator = ExactValueValidator(SNOMED_URI)

We can then attach it to the name_system field of a Problem, which will guarantee that we only accept problems which identify themselves as having a snomed code for their names:

class ProblemOptions(DataModelOptions):
    model_class_name = 'Problem'
    field_validators = {
      'name_system': [snomed_validator]
    }

Note that we put snomed_validator in a list, since we might theoretically add additional validators to the name_system field.

Adding Custom Data-Models to Indivo

As of version 1.1 of Indivo X, we’ve added a feature that makes it much easier to add (in a drag-and-drop fashion) new supported data models to an instance of Indivo. Adding a new data model to Indivo involves:

  • Creating the data model definition
  • Dropping the data model into the filesystem
  • Migrating the database tables to support the new model

Defining the Data Model

As you saw above, data models can be defined in two formats: SDML or Django model classes. Simply produce a definition in one of the two forms, and save it to a file named model.sdml or model.py.

Dropping the Definition into the Filesystem

Indivo data models currently have the following layout on the filesystem:

indivo_server/
    indivo/
          ...
        data_models/
            core/
                allergy/
                    model.[sdml | py]
                    example.[sdmj | sdmx | py]
                    extra.py
                  ...
            contrib/

The indivo/data_models/core/ directory contains all of our built-in data models, and you shouldn’t modify it. Since you are ‘contributing’ a data model to Indivo, add your data model to the indivo/data_models/contrib/ directory. Simply:

  • Create a new subdirectory under indivo/data_models/contrib/.

  • Drop your model definition into that directory. This file MUST BE NAMED MODEL.PY OR MODEL.SDML to be identified as a data model.

  • Add (optional) example files into that directory. Files should be named example.sdmj, example.sdmx, or example.py, and should be example instances of the data model as SDMJ, SDMX, or Fact objects respectively. If present, they will help others use and document your data model.

  • Add an (optional) extras file to the directory. The file must be named extra.py, and may contain extra options for your data-model, such as custom serializers.

  • Your final directory structure should now look something like:

    indivo_server/
        indivo/
              ...
            data_models/
                core/
                    allergy/
                        model.[sdml | py]
                        example.[sdmj | sdmx | py]
                        extra.py
                      ...
                contrib/
                    your_data_model/
                        model.[sdml | py]
                        example.[sdmj | sdmx | py]
                        extra.py

Migrating the Database

Indivo relies on the South migration tool to get the database synced with the latest data models. Once you’ve dropped your data model into the filesystem, South should be able to detect the necessary changes.

To detect the new model and generate migrations for it, run (from the indivo_server directory):

python manage.py schemamigration indivo --auto

You should see output like:

+ Added model indivo.YOURMODELNAME
Created 0018_auto__add_model_YOURMODELNAME.py. You can now apply this migration with: ./manage.py migrate indivo

To do a quick sanity check that you aren’t about to blow away your database, run:

python manage.py migrate indivo --db-dry-run -v2

This should output the SQL that will be run. Make sure this looks reasonable, ESPECIALLY if you are running Indivo on Oracle, where the South tool is still in alpha. If the SQL looks reasonable, go ahead and run the migration, with:

python manage.py migrate indivo

And you’re all set!

Next Steps

Make sure to restart Indivo for your changes to take effect.

See also

Now you’ve added a new data model to Indivo: Congratulations! It can be stored in the database and queried via the API.

But until you map a Schema to it, you won’t be able to actually add data to your new model. To learn more, see: