Animal Diseases

Animal diseases in Switzerland

FSVO, Federal Food Safety and Veterinary Office, collects data on the animal diseases in Switzerland. This data is published as Linked Data.

In this tutorial, we will show how to work with Linked Data. Mainly, we will see how to work with data on animal disease.
We will look into how to query, process, and visualize it.

1. Setup
      1.1 Python
      1.2 SPARQL endpoints
      1.3 SPARQL client

2. Data
      2.1 Animal species
      2.2 Diseases
      2.3 Can we link disease to animal type?
      2.4 Reports
      2.5 Example: goats

3. Analysis
      3.1 Cattle: Mucosal Disease
      3.2 Bees: Sauerbrut
      3.3 Rabies
      3.4 Most common diseases
      3.5 Reports over time

Setup

Python

This notebook requires Python 3.9 or later to run.

SPARQL endpoints

For epidemiological data

Reports on animal diseases are published as Linked Data. It can be accessed with SPARQL queries.
You can send queries using HTTP requests. The API endpoint is https://lindas.admin.ch/query/.

For geodata

Animal diseases are closely linked to places. To understand their location, we will work with Swiss geodata. It is published as Linked Data. It can be accessed using API endpoint under https://ld.geo.admin.ch/query.

SPARQL client

Let's use SparqlClient from graphly to communicate with both databases. Graphly will allow us to:

  • send SPARQL queries
  • automatically add prefixes to all queries
  • format response to pandas or geopandas
In [1]:
import datetime
import itertools
import json

import folium
import mapclassify
import matplotlib as mpl
import matplotlib.cm
import networkx as nx
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from bokeh.io import output_notebook, show
from bokeh.models import (Circle, CustomJS, HoverTool, LabelSet, MultiLine,
                          NodesAndLinkedEdges, Plot, TapTool)
from bokeh.palettes import Paired
from bokeh.plotting import ColumnDataSource, from_networkx
from dateutil.relativedelta import relativedelta
from folium.plugins import TimeSliderChoropleth
from graphly.api_client import SparqlClient
from plotly.subplots import make_subplots
from enum import Enum
from collections import namedtuple
In [2]:
# Uncomment to install dependencies in Colab environment
#!pip install mapclassify
#!pip install git+https://github.com/zazuko/graphly.git
In [3]:
sparql = SparqlClient("https://int.lindas.admin.ch/query")
geosparql = SparqlClient("https://ld.geo.admin.ch/query")

sparql.add_prefixes({
    "schema": "<http://schema.org/>",
    "cube": "<https://cube.link/>",
    "admin": "<https://schema.ld.admin.ch/>",
    "skos": "<http://www.w3.org/2004/02/skos/core#>",
    "disease": "<https://agriculture.ld.admin.ch/foen/animal-pest/>"   
})

geosparql.add_prefixes({
    "dct": "<http://purl.org/dc/terms/>",
    "geonames": "<http://www.geonames.org/ontology#>",
    "schema": "<http://schema.org/>",
    "geosparql": "<http://www.opengis.net/ont/geosparql#>",
})

SPARQL queries can become very long. To improve the readibility, we will work wih prefixes.

Using the add_prefixes method, we can define persistent prefixes. Every time you send a query, graphly will now automatically add the prefixes for you.

Data

Our goal is to understand outbreaks of animal diseases. First, let's take a look at the animals found in our dataset.

Animal species

In [4]:
query = """
SELECT DISTINCT ?species ?group
WHERE {
  ?s disease:animal-specie ?speciesIRI.
  ?speciesIRI schema:name ?species.
  
  ?speciesIRI skos:broader/schema:name ?group.
  
  FILTER (LANG(?species) = "de")
  FILTER (LANG(?group) = "de")
} 
ORDER BY ?group
"""

df = sparql.send_query(query)
df[0:15]
Out[4]:
species group
0 Anderes Haustier Andere Haustiere
1 Bienen Bienen
2 Pferd Equiden
3 Fisch Fische
4 Krebs Fische
5 Hausente Gefluegel
6 Hausgans Gefluegel
7 Huhn Gefluegel
8 Wachtel Gefluegel
9 Hund Hund
10 Kaninchen Kaninchen
11 Katze Katzen
12 Echse Reptilien
13 Schlange Reptilien
14 Schildkröte Reptilien

Diseases

Now, let's take a look at the diseases the animals can suffer from.

In [5]:
query = """
SELECT DISTINCT ?disease ?group
WHERE {
  ?s disease:epidemics ?diseaseIRI.
  ?diseaseIRI schema:name ?disease.
  
  ?diseaseIRI skos:broader/schema:name ?group.

  FILTER (LANG(?disease) = "de")
  FILTER (LANG(?group) = "de")
}
ORDER BY ?group
"""

df = sparql.send_query(query)
df.head()
Out[5]:
disease group
0 Tollwut Auszurottende Seuchen
1 Brucellose der Rinder Auszurottende Seuchen
2 Milzbrand Auszurottende Seuchen
3 Infektiöse bovine Rhinotracheitis/Infektiöse p... Auszurottende Seuchen
4 Infektionen mit Tritrichomonas foetus Auszurottende Seuchen

Thus far, we have seen all animals, and all diseases. However, not all animals will catch all diseases.
What diseases are swiss animals exposed to?

In [6]:
query = """
SELECT DISTINCT ?disease ?species ?group
WHERE {
  <https://agriculture.ld.admin.ch/foen/animal-pest/observation/> cube:observation ?obs .
  ?obs disease:epidemics/schema:name ?disease;
       disease:animal-specie ?speciesIRI.
       
  ?speciesIRI schema:name ?species.
  ?speciesIRI skos:broader/schema:name ?group.
  
  FILTER (LANG(?disease) = "de")
  FILTER (LANG(?species) = "de")
  FILTER (LANG(?group) = "de")
} 

ORDER BY ?species
"""
df = sparql.send_query(query)
df = df[df.group != "Wildtier"]
In [7]:
# Graph of disease-species relations
graph = nx.from_pandas_edgelist(df, source='disease', target='species')
groups = {key: "disease" for key in df.disease} | {key: "species" for key in df.species}

nx.set_node_attributes(graph, groups, name="group")

colormaps = {"fill_color": 
                {"disease": Paired[4][0], "species": Paired[4][2]}, 
            "hover_color": 
                {"disease": Paired[4][1], "species": Paired[4][3]}
            }

for name, colormap in colormaps.items():
    colors = {k: colormap[v] for k, v in groups.items()}
    nx.set_node_attributes(graph, colors, name=name)
In [8]:
# Interactive graph plot
plot = Plot(plot_width = 1000, plot_height=1000)
plot.title.text = 'Swiss animals and their diseases'

graph_renderer = from_networkx(graph, nx.drawing.layout.bipartite_layout, nodes = df.species.unique())

# manipulating nodes
nonselect_node = Circle(size = 15, fill_color = "fill_color")
select_node = Circle(size = 15, fill_color = "hover_color")
graph_renderer.node_renderer.glyph = nonselect_node
graph_renderer.node_renderer.nonselection_glyph = nonselect_node
graph_renderer.node_renderer.selection_glyph = select_node
graph_renderer.node_renderer.hover_glyph = select_node
graph_renderer.node_renderer.data_source.data['group'] = [groups[g] for g in graph_renderer.node_renderer.data_source.data["index"]]

# manipulating edges
select_edge = MultiLine(line_color = Paired[7][-1], line_width = 2)
graph_renderer.edge_renderer.glyph = MultiLine(line_color = '#CCCCCC', line_alpha = .5, line_width = 1)
graph_renderer.edge_renderer.selection_glyph = select_edge
graph_renderer.edge_renderer.hover_glyph = select_edge


graph_renderer.selection_policy = NodesAndLinkedEdges()
graph_renderer.inspection_policy = NodesAndLinkedEdges()

plot.renderers.append(graph_renderer)

callback_code = '''
    if (cb_data.index.indices.length > 0) {
        const index = cb_data.index.indices[0];

        if (source.data.group[index] === "species") {
            hover.tooltips = [["Animal", "@index"]];  
        }
        else {
            hover.tooltips = [["Disease", "@index"]];
        }                                     
    }
'''

hover = HoverTool(
    renderers=[graph_renderer.node_renderer]
)
hover.callback = CustomJS(
    args = dict(source = graph_renderer.node_renderer.data_source, hover = hover),
    code = callback_code
)

pos = {k: v for k, v in graph_renderer.layout_provider.graph_layout.items() if graph.nodes[k]["group"] == "species"}
names = list(pos.keys())
x, y = map(list, zip(*pos.values()))

source = ColumnDataSource({'x': x, 'y': y, 'name': names})
labels = LabelSet(x_offset=15, y_offset=-6, x='x', y='y', text='name', source=source, background_fill_color='white', text_font_size='10px', background_fill_alpha=.7)
plot.renderers.append(labels)

plot.add_tools(hover, TapTool())

output_notebook()
show(plot)
Loading BokehJS ...

Reports

When animals get sick, the farmer will typically consult a vet. After diagnosis, the incident is reported to the Federal Office for Agriculture. This data is publicly available and can be found in our endpoint:

In [9]:
#Run in browser: https://s.zazuko.com/24fSKE
query = """
SELECT ?diagnosis ?commune ?species ?stock ?sick ?infected ?killed ?deceased ?disease
WHERE {
  <https://agriculture.ld.admin.ch/foen/animal-pest/observation/> cube:observation ?obs .
  ?obs disease:epidemics/schema:name ?disease;
       disease:diagnosis-date ?diagnosis;
       disease:animals-stock ?stock;
       disease:animals-sick ?sick;
       disease:animals-infected ?infected;
       disease:animals-killed ?killed;
       disease:animals-deceased ?deceased;
       disease:internet-publication ?date;
       disease:animal-specie/schema:name ?species;
       schema:containedInPlace/schema:name ?commune .
  
  FILTER (LANG(?disease) = "de")
  FILTER (LANG(?species) = "de")
} 
ORDER BY DESC(?diagnosis) ?commune
"""
df = sparql.send_query(query)
df.head()
Out[9]:
diagnosis commune species stock sick infected killed deceased disease
0 2022-09-05 Kallern Rind 99 3 0 0 0 Salmonellose
1 2022-09-05 Stein (AR) Rind 23 1 0 0 0 Coxiellose
2 2022-09-02 Châtel-Saint-Denis Rind 1 1 0 0 0 Coxiellose
3 2022-09-02 Cottens (FR) Rind 1 1 0 0 0 Coxiellose
4 2022-09-02 Langnau am Albis Anderes Haustier 1 1 0 0 0 Salmonellose

Aggregated reports

The reports give us many details. It tells us not only what species was affected, but also how many animals got sick, infected, or deceased.

These reports, however, have one limitation. The analysis of a report of a single animal holding does not give insight into the disease situation of the whole population. Therefore it is necessary to look at all reports for that disease in a given time.

We will hence work with number of reports. Instead of looking at how individual farms were affected, we will simply count how many farms reported health issues.

Our core data becomes:

In [10]:
query = """
SELECT ?diagnosis ?municipality_id ?municipality ?species ?disease
WHERE {
  <https://agriculture.ld.admin.ch/foen/animal-pest/observation/> cube:observation ?obs .
  ?obs disease:epidemics/schema:name ?disease;
       disease:diagnosis-date ?diagnosis;
       disease:internet-publication ?date;
       disease:animal-specie/schema:name ?species;
       schema:containedInPlace ?municipality_iri.
       
  ?municipality_iri schema:name ?municipality ;
                    schema:identifier ?municipality_id.
  
  FILTER (LANG(?disease) = "de")
  FILTER (LANG(?species) = "de")
} 
ORDER BY DESC(?diagnosis) ?municipality
"""
df = sparql.send_query(query)
In [11]:
def aggregate_reports(reports: pd.DataFrame, weekly: bool) -> pd.DataFrame:
    df = reports.copy()
    if weekly:
        df["diagnosis"] = df.diagnosis.apply(lambda x: datetime.date(x.year, 1, 1) + relativedelta(weeks=+x.isocalendar()[1]))
    else:
        df['diagnosis'] = df.diagnosis.apply(lambda x: datetime.date(x.year, x.month, 1))

    return df[["diagnosis", "species", "disease", "municipality_id"]].groupby(by=["diagnosis", "species", "disease", "municipality_id"]).size().reset_index(name="reports")
In [12]:
# Aggregate results by week/month
df_monthly = aggregate_reports(df, False)
df_monthly = df_monthly[["diagnosis", "species", "disease"]].groupby(by=["diagnosis", "species", "disease"]).size().reset_index(name="reports")
df_monthly = df_monthly.sort_values(by=["diagnosis", "disease", "species"], ascending=False).reset_index(drop=True)
df_monthly.head()
Out[12]:
diagnosis species disease reports
0 2022-09-01 Rind Salmonellose 1
1 2022-09-01 Katze Salmonellose 1
2 2022-09-01 Anderes Haustier Salmonellose 1
3 2022-09-01 Schaf Maedi-Visna 1
4 2022-09-01 Rind Coxiellose 8

Example: goats

We have seen that one animal can suffer from many diseases. Remember the animal-disease graph? A goat can get sick of twenty different diseases!

Goat diseases

drawing

Some of them occur more often than others. Which ones were the most often reported in the past?

In [13]:
DimensionData = namedtuple('Dimension', ['slice_of', "category"])

class Dimension(Enum):

    Disease = DimensionData("disease", "species")
    Species = DimensionData("species", "disease")

def plot_multiple_categories(df: pd.DataFrame, dim: Dimension, value: str, min_reports: int=15, min_reports_at_time:int = 5) -> None:

    category = dim.value.category
    df_slice = df[df[dim.value.slice_of] == value]
    counts = df_slice[[category, "reports"]].groupby(category).agg(["sum", "max"]).reset_index()
    categories_filtered = list(counts[(counts["reports"]["sum"] >= min_reports) & (counts["reports"]["max"] >= min_reports_at_time)][category])
    df_slice = df_slice[df_slice[category].isin(categories_filtered)]

    all_dates = [x.date() for x in pd.date_range(df_slice.diagnosis.min(), df_slice.diagnosis.max(), freq='MS')]
    combinations = list(itertools.product(*[all_dates, df_slice[category].unique()]))
    values_empty = pd.DataFrame(combinations, columns =['diagnosis', category])
    plot_df = pd.merge(values_empty, df_slice, how="left", on=["diagnosis", category]).fillna(0)

    fig = go.Figure()
    for cat in plot_df[category].unique():
        plot_df_slice = plot_df[plot_df[category] == cat]
        labels = ["<b>{}</b><br>Date: {}<br>Reports: {}".format(cat, row.diagnosis, int(row.reports)) for row in plot_df_slice[["diagnosis", "reports"]].itertuples()]
        fig.add_trace(go.Scatter(x=plot_df_slice.diagnosis, y=plot_df_slice.reports, name=cat,
                            text=labels,
                            hoverinfo='text',
                            line_shape='spline'))

    fig.update_xaxes(rangeslider_visible=True, range=[min(plot_df.diagnosis), max(plot_df.diagnosis)])
    fig.update_yaxes(title="reports", range = [0, int(max(plot_df.reports)*1.1)+1])
    fig.update_layout(
        title={"text": "Disease: {}".format(value), "x": 0.5}
    )
    fig.show()
In [14]:
plot_multiple_categories(df_monthly, Dimension.Species, "Ziege", 15, 5)