Population

Population in Zürich

Zürich Statistical Office collects data on the city and its residents. This data is published as Linked Data.

In this tutorial, we will show how to work with Linked Data. Mainly, we will see how to work with population dataset.
We will look into how to query, process, and visualize it.

SPARQL endpoint

Population data is published as Linked Data thatcan be accessed with SPARQL queries.
You can send queries using HTTP requests. The API endpoint is https://ld.stadt-zuerich.ch/query.

Let's use SparqlClient from graphly to communicate with the database. Graphly will allow us to:

  • send SPARQL queries
  • automatically add prefixes to all queries
  • format response to pandas or geopandas
In [1]:
# Uncomment to install dependencies in Colab environment
#!pip install git+https://github.com/zazuko/graphly.git
In [2]:
import re

import pandas as pd
import numpy as np

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from graphly.api_client import SparqlClient
In [3]:
def natural_keys(txt: str) -> list[int]:
    """Extracts the digits from string
    Args:
        txt:             string with digits

    Returns:
        list[int]        digits in string
    """
    
    return [int(s) for s in txt.split() if s.isdigit()]
In [4]:
sparql = SparqlClient("https://ld.stadt-zuerich.ch/query")
sparql.add_prefixes({
    "schema": "<http://schema.org/>",
    "cube": "<https://cube.link/>",
    "property": "<https://ld.stadt-zuerich.ch/statistics/property/>",
    "measure": "<https://ld.stadt-zuerich.ch/statistics/measure/>",
    "collection": "<https://ld.stadt-zuerich.ch/statistics/collection/>",
    "skos": "<http://www.w3.org/2004/02/skos/core#>",
    "ssz": "<https://ld.stadt-zuerich.ch/statistics/>"
})

SPARQL queries can become very long. To improve the readibility, we will work wih prefixes.

Using the add_prefixes method, we define persistent prefixes. Every time you send a query, graphly will automatically add the prefixes for you.

Population in city districts

Let's find the number of inhabitants in different parts of the city. The population data is available in the BEW data cube.

The query for the number of inhabitants in different city districts, over time looks as follows:

In [5]:
query = """
SELECT ?time ?place ?count
FROM <https://lindas.admin.ch/stadtzuerich/stat>
WHERE {
  ssz:BEW a cube:Cube;
             cube:observationSet/cube:observation ?observation.   
  
  ?observation property:RAUM ?place_uri ;
                       property:TIME ?time ;
                       measure:BEW ?count .
  ?place_uri skos:inScheme <https://ld.stadt-zuerich.ch/statistics/scheme/Kreis> ;
         schema:name ?place .
  FILTER regex(str(?place),"ab|Stadtgebiet vor")
}
ORDER BY ?time
"""

df = sparql.send_query(query)
df.head()
Out[5]:
time place count
0 1408-12-31 Kreis 1 (Stadtgebiet vor 1893) 5675.0
1 1467-12-31 Kreis 1 (Stadtgebiet vor 1893) 4750.0
2 1529-12-31 Kreis 1 (Stadtgebiet vor 1893) 5080.0
3 1637-12-31 Kreis 1 (Stadtgebiet vor 1893) 8621.0
4 1671-12-31 Kreis 1 (Stadtgebiet vor 1893) 9590.0

Let's visualize the number of inhabitants per district. To do this, we will aggregate the numbers per place.
The cleaned dataframe becomes:

In [6]:
df.place = df.place.apply(lambda x: re.findall('Kreis \d+', x)[0])

df = pd.pivot_table(df, index="time", columns="place", values="count")
df.dropna(inplace=True)

df = df[df.columns[np.argsort(-df.iloc[0,])]]
df = df.reset_index().rename_axis(None, axis=1)

df.head()
Out[6]:
time Kreis 11 Kreis 3 Kreis 9 Kreis 7 Kreis 6 Kreis 10 Kreis 12 Kreis 2 Kreis 4 Kreis 8 Kreis 5 Kreis 1
0 1971-12-31 56863.0 52707.0 47257.0 39599.0 37837.0 36160.0 33664.0 32708.0 32231.0 20899.0 12833.0 9411.0
1 1972-12-31 56864.0 51674.0 47223.0 39118.0 37763.0 35760.0 33079.0 32561.0 31765.0 20371.0 12462.0 9007.0
2 1973-12-31 56464.0 50879.0 47215.0 38695.0 37059.0 35576.0 32201.0 31925.0 30906.0 19897.0 12235.0 8525.0
3 1974-12-31 56224.0 50175.0 47142.0 38045.0 36305.0 35449.0 31374.0 31706.0 30048.0 19552.0 12165.0 8076.0
4 1975-12-31 55627.0 49326.0 46491.0 37379.0 35294.0 35518.0 30943.0 31179.0 29061.0 19246.0 11798.0 7751.0

And now we can graph it using a line plot or a histogram.

In [7]:
sorted_df = df.reindex(sorted(df.columns, key=natural_keys), axis=1)
fig = px.line(sorted_df, x="time", y = sorted_df.columns)
fig.update_layout(
    title='Population in Zürich Districts', 
    title_x=0.5,
    yaxis_title="inhabitants",
    xaxis_title="Years",
    legend_title="District"
)
fig.show("notebook")