Zürich Statistical Office collects data on the city and its residents. This data is published as Linked Data.
In this tutorial, we will show how to work with Linked Data. Mainly, we will see how to work with data on economic activities.
We will look into how to query, process, and visualize it.
Data on some economic activities is published as Linked Data. It can be accessed with SPARQL queries.
You can send queries using HTTP requests. The API endpoint is https://ld.stadt-zuerich.ch/query.
Let's use SparqlClient
from graphly to communicate with the database.
Graphly will allow us to:
pandas
or geopandas
# Uncomment to install dependencies in Colab environment
#!pip install mapclassify
#!pip install git+https://github.com/zazuko/graphly.git
import mapclassify
import matplotlib
import matplotlib.cm
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from graphly.api_client import SparqlClient
sparql = SparqlClient("https://ld.stadt-zuerich.ch/query")
wikisparql = SparqlClient("https://query.wikidata.org/sparql")
sparql.add_prefixes({
"schema": "<http://schema.org/>",
"cube": "<https://cube.link/>",
"property": "<https://ld.stadt-zuerich.ch/statistics/property/>",
"measure": "<https://ld.stadt-zuerich.ch/statistics/measure/>",
"skos": "<http://www.w3.org/2004/02/skos/core#>",
"ssz": "<https://ld.stadt-zuerich.ch/statistics/>"
})
SPARQL queries can become very long. To improve the readibility, we will work wih prefixes.
Using add_prefixes
method, we can define persistent prefixes.
Every time you send a query, graphly
will now automatically add the prefixes for you.
Let's find the number of restaurants in Zurich over time. This information is available in the AST-BTA
data cube. To give restaurant numbers a context, let's scale them by population size. The number of inhabitants over time can be found in the BEW
data cube.
The query for number of inhabitants and restaurants over time is as follows:
query = """
SELECT *
FROM <https://lindas.admin.ch/stadtzuerich/stat>
WHERE {
{
SELECT ?time (SUM(?ast) AS ?restaurants)
WHERE {
ssz:AST-BTA a cube:Cube;
cube:observationSet/cube:observation ?obs_rest.
?obs_rest property:TIME ?time ;
property:RAUM <https://ld.stadt-zuerich.ch/statistics/code/R30000> ;
property:BTA <https://ld.stadt-zuerich.ch/statistics/code/BTA5000> ;
measure:AST ?ast .
}
GROUP BY ?time ?place
}
{
SELECT ?time ?pop
WHERE {
ssz:BEW a cube:Cube;
cube:observationSet/cube:observation ?obs_pop.
?obs_pop property:TIME ?time ;
property:RAUM <https://ld.stadt-zuerich.ch/statistics/code/R30000>;
measure:BEW ?pop
}
}
}
ORDER BY ?time
"""
df = sparql.send_query(query)
df.head()
Let's calculate number of restaurants per 10 000 inhabitants
df = df.fillna(method="ffill")
df["Restaurants per 10 000 inhabitants"] = df["restaurants"]/df["pop"]*10000
fig = px.line(df, x="time", y = "Restaurants per 10 000 inhabitants", labels={"time": "Years"})
fig.update_layout(title_text='Restaurants in Zürich over time', title_x=0.5)