Knowledge Integration for Disease Characterization:
A Breast Cancer Example

Oshani Seneviratne, Sabbir Rashid, Shruthi Chari, Jim McCusker, Kristin Bennett, James Hendler and Deborah McGuinness
Tetherless World Constellation, Rensselaer Polytechnic Institute


With the rapid advancements in cancer research, the information that is useful for characterizing disease, staging tumors, and creating treatment and survivorship plans has been changing at a pace that creates challenges when practicing oncologists and physicians try to remain current. One example of this involves increasing usage of biomarkers when characterizing the pathologic prognostic stage of a breast tumor. We present our semantic technology approach to support cancer characterization and demonstrate it in our end-to-end prototype system that collects the newest breast cancer staging criteria from authoritative oncology manuals to construct an ontology for breast cancer. Using a tool we developed that utilizes this ontology, physicians can quickly stage a new patient to support identifying risks, treatment options, and monitoring plans based on authoritative and best practice guidelines. Physicians can also re-stage an existing patient, allowing them to find patients whose stage has changed in a given patient cohort. As new guidelines emerge, using our proposed mechanism, which is grounded by semantic technologies for ingesting new data from staging manuals, we have created an enriched cancer staging ontology that integrates relevant data from several sources with very little human intervention.



Bringing up a Whyis VM

In order to host our application on the Whyis framework, we need to bring up a machine with Whyis installed. You could install Whyis on your host machine (if on an Ubuntu machine); but the preferred method is to do so by bringing up a VM. In order to bring up a VM, you need to have a Virtualbox platform installed.

On installing the VM container, we can now proceed to installing Whyis. For the installation you need to create a whyis-vm directory within the home directory of your system. If on a Mac/Ubuntu Machine, to do that you could:
mkdir whyis-vm
cd whyis-vm

Once we are within the whyis-vm directory(this will serve as a mounted folder between our host machine and VM), we need to download the Vagrantfile and install script to launch a Whyis machine. They can be downloaded by following the below commands.
curl -skL > Vagrantfile
curl -skL >

To launch a machine:
vagrant up

Once the machine is launched, we can connect to the machine, and activate the virtual environment venv.
vagrant ssh
sudo su - whyis
cd /apps/whyis
source venv/bin/activate

Once these steps are done, we can start the Whyis server by running:
python runserver -h

The Whyis landing page can be accessed at:

Let’s go ahead and register now, using the below URL:

For further information on installation, please see:

Installing the heals2vis application

Now that we have a working Whyis instance, we can install the heals2vis application: It can be downloaded by performing
cd /apps
git clone

Once we have downloaded the repository, we need to install the application. We only need the heals2vis directory from the above repository, we can move the heals2vis directory up a directory.
sudo mv .
sudo chown -R whyis:whyis heals2vis/

Become whyis user again, to be able install the heals2vis app
sudo su - whyis
cp heals2vis/ whyis/

<>Change directory to the heals2vis directory, and run the below command:
cd heals2vis && pip install -e .

Exit the whyis user mode to restart services. Now we need to restart our webserver and queueing scheduler in order for the above installation to take effect.
sudo service apache2 restart
sudo service celeryd restart

Now we can restart the server, by changing directory back to /apps/whyis and switch to whyis user mode.
sudo su - whyis
cd /apps/whyis
python runserver -h

We should be able to see a heals2vis landing page when we try

Loading the data into Whyis’s Blazegraph instance

We need to load in the SEER patient records, the CIViC drug dataset and the Cancer Staging Ontology to be able to load the Whyis physicianView and derive knowledge by inferencing. The below commands need to be run to load knowledge into the Blazegraph instance.

Considering we are at the point, we stopped at last; we just need to stop the server by using a kill command Ctrl+C; to be able to proceed with the data load
python load -i /apps/heals2vis/data/viz.ttl -f turtle
python load -i /apps/heals2vis/data/cancer_staging_terms.owl.ttl -f turtle
python load -i /apps/heals2vis/data/civic-out.txt -f trig
python load -i /apps/heals2vis/data/seer-out-sample.txt -f trig

Now that the data is loaded we need to pre-run the inferencer on these records, it can be run by the below command:
python test_agent -a heals2vis.inferencer.Infer

Accessing the view

Once these steps are done, we can restart the Whyis server by running:
python runserver -h

Now that we have the heals2vis application installed and the data loaded in: We can explore the interactive physician view at: Make sure to click this URL twice, it gives an error on the first hit due to a known bug.


If you have any questions about this work, please contact Cancer Staging Ontology Developers.