Data Models in Indivo describe the format in which Indivo represents medical information. They are NOT the same as Schemas, which describe formats that Indivo recognizes as valid input data. Rather, data models describe the final processed state of medical data in Indivo: how data are stored, how they are queryable via the Query API, and how they are returned via the Reporting API.
We also introduce one additional term: Medical Facts. A Fact is one datapoint corresponding to a data model: for example, a latex allergy is a Fact that is an instance of the Allergy data model. Internally, Indivo represents facts as Python objects, so you’ll see us referencing medical facts as fact objects as well.
At its most basic level, a data model definition is just a list of fields and their types. For example, our Problem data model is defined as (some fields omitted):
This is pretty simple, and we’d like to enable others add new data models to Indivo just as easily. So we currently allow two formats for defining data models:
Since our data models are directly mapped to database tables using Django’s ORM, they are most effectively represented as Django Models. Django has a flexible, powerful method for expressing fields as python class attributes, so data models defined in this way can harness the full capabilities of the Django ORM. Of course, representing data models in this way requires some knowledge of python. For a full reference of Django models, see Django models and Django model fields.
One important Indivo-specific note: when defining Django Model Classes, make sure to subclass indivo.models.Fact, which will ensure that your class can be treated as a data model. For example, your class definition might look like:
from indivo.models import Fact
from django.db import models
class YourModel(Fact):
your_field1 = models.CharField(max_length=200, null=True)
...
# Additional fields here
For modeling medical data, Indivo provides some custom Field Subclasses. These fields represent their data as multiple separate database fields, with names formed from the original field’s name and some appended sufffixes (see the classes below for some examples). You should use these fields as if they were any other Django Model Field:
from indivo.models import Fact
from django.db import models
from indivo.fields import YourFavoriteFieldSubclass
class YourModel(Fact):
normal_field = models.CharField(max_length=200, null=True)
special_field = YourFavoriteFieldSubclass()
Now YourModel has both a standard CharField, and also other fields defined by the Field Subclass. We define the following Field Subclasses:
A field for representing coded data elements.
Creating a CodedValueField named ‘value’, for example, will (under the hood) create thee fields:
When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original value field name.
A field for representing data elements with both a value and a unit.
Creating a ValueAndUnitField named ‘frequency’, for example, will (under the hood) create the fields:
When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original frequency field name.
A field for representing a physical address.
Creating an AddressField named ‘address’, for example, will (under the hood) create the fields:
When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original address field name.
A field for representing a person’s name.
Creating a NameField named ‘name’, for example, will (under the hood) create the fields:
When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original name field name.
A field for representing a telephone number.
Creating a TelephoneField named ‘phone’, for example, will (under the hood) create the fields:
When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original phone field name.
A field for representing a pharmacy.
Creating a PharmacyField named ‘pharmacy’, for example, will (under the hood) create three fields:
When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original pharmacy field name.
A field for representing a medical provider.
Creating a ProviderField named ‘doc’, for example, will (under the hood) create the fields:
When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original doc field name.
A field for representing a single measurement of a vital sign.
Creating a VitalSignField named ‘bp’, for example, will (under the hood) create the fields:
When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original bp field name.
A field for representing a blood pressure measurement.
Creating a BloodPressureField named ‘bp’, for example, will (under the hood) create the fields:
When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original bp field name.
A field for representing a range of values.
Creating a ValueRangeField named ‘normal_range’, for example, will (under the hood) create the fields:
When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original normal_range field name.
A field for representing a quantitative result, and expected ranges for that result.
Creating a QuantitativeResultField named ‘lab_result’, for example, will (under the hood) create the fields:
When describing instances of your model (either when defining a transform output or when referencing fields using the Indivo Query API), you must refer to these field names, not the original lab_result field name.
For those less python-savvy who are still capable of thinking in terms of ‘fields’ and ‘types’ (which should be most people), we’ve defined a JSON-based modeling language for defining the very simple data models easily. SDML is less flexible than Django’s modeling language, but is much quicker to get started with and is less verbose for describing simple models. See our documentation of the language here.
For help getting started, see our core data models, below, each of which provide definitions both in SDML and Django Model classes.
Since the Query API allows app developers to directly apply filters and ranges to the datamodels they are selecting, they need to know what fields they are allowed to query against. The answer is simple:
ANY FIELD ON A DATA MODEL THAT IS NOT A RELATION TO ANOTHER MODEL MAY BE USED IN THE QUERY API!
For example, we introduced the ‘Problem’ model above, which has the fields:
If you were making an API call such as GET /records/{RECORD_ID}/reports/minimal/problems/, you could filter by any of:
If the problems model were a bit more complicated, and had another field:
You wouldn’t be able to filter by prescribed_med, since that field is a relation to another model.
The only exceptions to this rule are custom Django Model Fields. Such fields are translated into fields with other names, as described above. Any of these fields may be used in the query API, but (for example), when looking at a model with a CodedValue element such as:
You would be able to filter by problem_type_identifier, problem_type_title, or problem_type_system, but not by problem_type itself.
Here is a listing of the data models currently supported by Indivo. Each instance might define other, contributed models: see below for information on how to add data models to Indivo.
For complicated data models, a simple SDML definition just won’t suffice. For a few specific features, such as custom object serialization or creation-time field validation, you can define (in python) an extra options file for a data model.
This file should be named extra.py, and can be dropped into the filesystem next to any data model, as described below. The file should contain subclasses of indivo.data_models.options.DataModelOptions, each of which describes the options for one data model defined in the model.py file in the same directory. Options are:
Defines optional extra functionality for Indivo datamodels.
To add options to a datamodel, subclass this class and override its attributes.
Currently available options are:
For example, here’s our options file for the Problem data model:
from indivo.serializers import DataModelSerializers
from indivo.data_models.options import DataModelOptions
from indivo.validators import ExactValueValidator
SNOMED_URI = 'http://purl.bioontology.org/ontology/SNOMEDCT/'
class ProblemSerializers(DataModelSerializers):
def to_rdf(queryset, result_count, record=None, carenet=None):
# ... our SMART RDF serializer implementation here ... #
return 'some RDF'
class ProblemOptions(DataModelOptions):
model_class_name = 'Problem'
serializers = ProblemSerializers
field_validators = {
'name_system': [ExactValueValidator(SNOMED_URI)],
}
Make sure to restart Indivo for your changes to take effect after you add your extra.py file–but there’s no need to reset Indivo.
By default, when returning data via the generic reporting API, Indivo will attempt to serialize data as SDMJ or SDMX, depending on the requested response format. If you need your data to come back in other formats, or if the default serializers aren’t smart enough to represent your data model correctly, you can implement custom serializers for the data model.
Serializers for a data model are implemented as simple methods that take a Django queryset object, and return a serialized string. For a given data-model, you should define a subclass of indivo.serializers.DataModelSerializers, and add your desired serializers as methods on the class. Currently, available serializers are:
returns an XML string representing the model objects in queryset.
Parameters: |
|
---|---|
Return type: | string |
returns a JSON string representing the model objects in queryset.
Parameters: |
|
---|---|
Return type: | string |
returns an RDF/XML string representing the model objects in queryset.
Parameters: |
|
---|---|
Return type: | string |
For example, here’s a (non-functional) implementation of the serializers for the Problems data-model:
from indivo.serializers import DataModelSerializers
class ProblemSerializers(DataModelSerializers):
def to_xml(queryset, result_count, record=None, carenet=None):
return '''<Problems>...bunch of problems here...</Problems>'''
def to_json(queryset, result_count, record=None, carenet=None):
return '''[{"Problem": "data here"}, {"Problem": "More data here..."}]'''
def to_rdf(queryset, result_count, record=None, carenet=None):
return '''<rdf:RDF><rdf:Description rdf:type='indivo:Problem'>...RDF data here...</rdf:Description></rdf:RDF>'''
A couple things to note:
When serializing models, the following libraries can come in handy:
Adding custom serializers to a data-model is simple: simply set your DataModelSerializers subclass to the serializers attribute of a DataModelOptions subclass in an extra.py file (see above for info on adding advanced data-model options.
By default, data models defined in SDML are very permissive: all fields are nullable, and there are no constraints on valid data points other than their type (string, date, etc.). In some cases, a data element could satisfy these constraints, but still be invalid. For example, an Indivo Problem must have its name coded using SNOMED, so a problem without a snomed code is invalid.
In such cases, you can attach validators to the data model. Django Validators are essentially just python callables that raise a django.core.exceptions.ValidationError if they are called on an invalid data point. We’ve defined a couple of useful validators, though you could use any function you’d like.
For example, here’s a validator that will accept only the value 2:
from django.core.exceptions import ValidationError
def validate_2(value):
if value != 2:
raise ValidationError("Invalid value: %s. Expected 2"%str(value))
Django provides a number of built-in validators, for which a full reference exists here: https://docs.djangoproject.com/en/1.2/ref/validators/#built-in-validators.
In addition, Indivo defines a few useful validators in indivo.validators:
Validates that a value is within a set of possible values.
The optional ‘nullable’ flag determines whether or not the value may also be empty.
Validates that a value is exactly equal to a certain value.
The optional ‘nullable’ flag determines whether or not the value may also be empty.
Adding custom validators to a data-model is simple: simply add the validator to the field_validators attribute of a DataModelOptions subclass in an extra.py file (see above for info on adding advanced data-model options).
For example, let’s add the requirement that Problem names must be coded as snomed. We can write the validator using the built-in ExactValueValidator:
from indivo.validators import ExactValueValidator
SNOMED_URI = 'http://purl.bioontology.org/ontology/SNOMEDCT/'
snomed_validator = ExactValueValidator(SNOMED_URI)
We can then attach it to the name_system field of a Problem, which will guarantee that we only accept problems which identify themselves as having a snomed code for their names:
class ProblemOptions(DataModelOptions):
model_class_name = 'Problem'
field_validators = {
'name_system': [snomed_validator]
}
Note that we put snomed_validator in a list, since we might theoretically add additional validators to the name_system field.
As of version 1.1 of Indivo X, we’ve added a feature that makes it much easier to add (in a drag-and-drop fashion) new supported data models to an instance of Indivo. Adding a new data model to Indivo involves:
As you saw above, data models can be defined in two formats: SDML or Django model classes. Simply produce a definition in one of the two forms, and save it to a file named model.sdml or model.py.
Indivo data models currently have the following layout on the filesystem:
indivo_server/
indivo/
...
data_models/
core/
allergy/
model.[sdml | py]
example.[sdmj | sdmx | py]
extra.py
...
contrib/
The indivo/data_models/core/ directory contains all of our built-in data models, and you shouldn’t modify it. Since you are ‘contributing’ a data model to Indivo, add your data model to the indivo/data_models/contrib/ directory. Simply:
Create a new subdirectory under indivo/data_models/contrib/.
Drop your model definition into that directory. This file MUST BE NAMED MODEL.PY OR MODEL.SDML to be identified as a data model.
Add (optional) example files into that directory. Files should be named example.sdmj, example.sdmx, or example.py, and should be example instances of the data model as SDMJ, SDMX, or Fact objects respectively. If present, they will help others use and document your data model.
Add an (optional) extras file to the directory. The file must be named extra.py, and may contain extra options for your data-model, such as custom serializers.
Your final directory structure should now look something like:
indivo_server/
indivo/
...
data_models/
core/
allergy/
model.[sdml | py]
example.[sdmj | sdmx | py]
extra.py
...
contrib/
your_data_model/
model.[sdml | py]
example.[sdmj | sdmx | py]
extra.py
Indivo relies on the South migration tool to get the database synced with the latest data models. Once you’ve dropped your data model into the filesystem, South should be able to detect the necessary changes.
To detect the new model and generate migrations for it, run (from the indivo_server directory):
python manage.py schemamigration indivo --auto
You should see output like:
+ Added model indivo.YOURMODELNAME
Created 0018_auto__add_model_YOURMODELNAME.py. You can now apply this migration with: ./manage.py migrate indivo
To do a quick sanity check that you aren’t about to blow away your database, run:
python manage.py migrate indivo --db-dry-run -v2
This should output the SQL that will be run. Make sure this looks reasonable, ESPECIALLY if you are running Indivo on Oracle, where the South tool is still in alpha. If the SQL looks reasonable, go ahead and run the migration, with:
python manage.py migrate indivo
And you’re all set!
Make sure to restart Indivo for your changes to take effect.
See also
But until you map a Schema to it, you won’t be able to actually add data to your new model. To learn more, see: