Back to Blog

Real estate

Real estate in Zürich

the Zürich Statistical Office collects data on the city and its residents. This data is published as Linked Data.

In this tutorial, we will show how to work with Linked Data. Mainly, we will see how to work with the real estate dataset.
We will look into how to query, process, and visualize it.

SPARQL endpoint

Data on the real estate market is published as Linked Data. It can be accessed with SPARQL queries.
You can send queries using HTTP requests. The API endpoint is https://ld.integ.stadt-zuerich.ch/query.

Let's use SparqlClient from graphly to communicate with the database. Graphly will allow us to:

  • send SPARQL queries
  • automatically add prefixes to all queries
  • format response to pandas or geopandas
In [1]:
# Uncomment to install dependencies in Colab environment
#!pip install git+https://github.com/zazuko/graphly.git
In [2]:
import datetime
import re

import pandas as pd
import numpy as np

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from graphly.api_client import SparqlClient
In [3]:
sparql = SparqlClient("https://ld.integ.stadt-zuerich.ch/query")
sparql.add_prefixes({
    "schema": "<http://schema.org/>",
    "cube": "<https://cube.link/>",
    "property": "<https://ld.stadt-zuerich.ch/statistics/property/>",
    "measure": "<https://ld.stadt-zuerich.ch/statistics/measure/>",
    "skos": "<http://www.w3.org/2004/02/skos/core#>",
    "ssz": "<https://ld.stadt-zuerich.ch/statistics/>"
})

SPARQL queries can become very long. To improve the readibility, we will work wih prefixes.

Using the add_prefixes method, we define persistent prefixes. Every time you send a query, graphly will automatically add the prefixes for you.

Housing prices

Let's find the average price per m2 for an apartment in Zurich. This data is available in the QMP-EIG-HAA-OBJ-ZIM data cube. It will allow us to find the price per city district and apartment type. The data is also available for different points in time.

The query for housing prices in city of Zürich for different districts and apartment types over time looks as follows:

In [4]:
query = """
SELECT ?time ?place ?rooms ?price
FROM <https://lindas.admin.ch/stadtzuerich/stat>
WHERE {
  ssz:QMP-EIG-HAA-OBJ-ZIM a cube:Cube;
             cube:observationSet/cube:observation ?observation.   
  ?observation property:TIME ?time ;
                       property:RAUM ?place_uri;
                       property:ZIM/schema:name ?rooms;
                       measure:QMP ?price .
  ?place_uri skos:inScheme <https://ld.stadt-zuerich.ch/statistics/scheme/Kreis> ;
         schema:name ?place .
  FILTER regex(str(?place),"ab|Stadtgebiet vor")
  FILTER (?price > 0)
}
ORDER BY ?time
"""

df = sparql.send_query(query)
df.head()
Out[4]:
time place rooms price
0 2009-12-31 Kreis 10 (ab 1934) 2- und 2,5-Zimmer Wohnung 7674.0
1 2009-12-31 Kreis 11 (ab 1970) 1- und 1,5-Zimmer Wohnung 9387.0
2 2009-12-31 Kreis 6 (ab 1934) 4- und 4,5-Zimmer Wohnung 9134.0
3 2009-12-31 Kreis 4 (ab 1915) 4- und 4,5-Zimmer Wohnung 5647.0
4 2009-12-31 Kreis 5 (ab 1915) 4- und 4,5-Zimmer Wohnung 8993.0

Prices per apartment type

Let's visualize the housing prices per apartment type. To do this, we will aggregate the prices per rooms.
The cleaned dataframe becomes:

In [5]:
df.place = df.place.apply(lambda x: re.findall('Kreis \d+', x)[0])
df.rooms = df.rooms.apply(lambda x: int(re.findall('\d+', x)[0]))
plot_df = df[["rooms", "price"]][df.time == df.time.max()].groupby(["rooms"]).mean().astype(int).sort_values(by="rooms").reset_index()
plot_df
Out[5]:
rooms price
0 1 12706
1 2 12841
2 3 12618
3 4 12138
4 5 13792
5 6 13361
6 7 12146
7 8 20804
In [6]:
fig = px.bar(plot_df, x="rooms", y="price")
fig.update_layout(
    title='Housing prices in Zürich', 
    title_x=0.5,
    yaxis_title="CHF per m<sup>2</sup>",
    xaxis_title="rooms"
)
fig.show()