ElCom, the Federal Electricity Commission, collects data on the electricity tariffs for household and companies. This data is published as Linked Data.
In this tutorial, we will show how to work with Linked Data. Mainly, we will see how to work with data on electricity tariffs.
We will look into how to query, process, and visualize it.
1. Setup
1.1 SPARQL endpoints
1.2 SPARQL client
2. Data
2.1 Electricity tariffs
2.2 Municipalities
3.1 Energy prices across regions
3.2 Energy prices for small and big enterprises
3.3 Energy prices for small and big households
3.4 Competition between energy providers
3.5 Grid costs and population density
3.6 Free grid usage
3.7 Free energy
3.8 Free aidfee
3.9 How many people benefit from fee waivers?
3.10 Which companies benefit from fee waivers?
Data on all electricity tariffs is published as Linked Data. It can be accessed with SPARQL queries.
You can send queries using HTTP requests. The API endpoint is https://lindas.admin.ch/query/.
Different municipalities may have different tariffs. To understand their location, we will work with Swiss geodata. It is published as Linked Data. It can be accessed using API endpoint under https://ld.geo.admin.ch/query.
import json
import re
import string
import folium
import mapclassify
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from graphly.api_client import SparqlClient
%matplotlib inline
# Uncomment to install dependencies in Colab environment
#!pip install mapclassify
#!pip install git+https://github.com/zazuko/graphly.git
sparql = SparqlClient("https://lindas.admin.ch/query")
geosparql = SparqlClient("https://ld.geo.admin.ch/query")
sparql.add_prefixes({
"schema": "<http://schema.org/>",
"cube": "<https://cube.link/>",
"elcom": "<https://energy.ld.admin.ch/elcom/electricityprice/dimension/>",
"admin": "<https://schema.ld.admin.ch/>"
})
geosparql.add_prefixes({
"dct": "<http://purl.org/dc/terms/>",
"geonames": "<http://www.geonames.org/ontology#>",
"schema": "<http://schema.org/>",
"geosparql": "<http://www.opengis.net/ont/geosparql#>",
})
SPARQL queries can become very long. To improve the readibility, we will work wih prefixes.
Using the add_prefixes
method, we can define persistent prefixes.
Every time you send a query, graphly
will now automatically add the prefixes for you.
Our goal is to understand the varaiability in electricity prices. To do so, we will work with electricity tariffs, and municipality data.
Electricity prices are composed of:
Price for transporting electricity from the power plant to the home. The revenue is used to finance the maintenance and expansion of the electricity grid, for example overhead lines, pylons and transformers.
Price for the electrical energy supplied. The grid operator either generates this energy with its own power plants or buys it from upstream suppliers. The energy price is also influenced by the type of energy source. A high share of renewable energy (e.g. wind, solar, biomass energy) usually leads to higher energy prices.
Municipal and cantonal taxes and fees. These include, for example, concession fees or local political energy levies.
Federal tax for the promotion of renewable energies, and the protection of waters and fish. The amount of the aidfee is set annually by the Federal Council. The levy is the same everywhere.
These components vary across:
The place where you live, the consumption category to which you belong, the energy provider you chose, and the product you subscribed for will affect your monthly bills.
Let's take a look at these differences. To simplify the results, we will work with data from 2020. We will also focus on a single standard product, which is available across all of Switzerland.
We can query the tariffs for different municipalities and categories as follows (heads up, the query may take a while to execute!):
query = """
SELECT ?municipality_id ?category ?energy ?grid ?aidfee (?community_fees + ?aidfee as ?taxes) ?fixcosts ?variablecosts
FROM <https://lindas.admin.ch/elcom/electricityprice>
WHERE {
<https://energy.ld.admin.ch/elcom/electricityprice/observation/> cube:observation ?observation.
?observation
elcom:category/schema:name ?category;
elcom:municipality ?municipality_id;
elcom:period "2020"^^<http://www.w3.org/2001/XMLSchema#gYear>;
elcom:product <https://energy.ld.admin.ch/elcom/electricityprice/product/standard>;
elcom:fixcosts ?fixcosts;
elcom:total ?variablecosts;
elcom:gridusage ?grid;
elcom:energy ?energy;
elcom:charge ?community_fees;
elcom:aidfee ?aidfee.
}
ORDER BY ?muncipality ?category ?variablecosts
"""
tariffs = sparql.send_query(query)
tariffs = tariffs.groupby(["municipality_id", "category"]).first().reset_index()
tariffs.head()
Great! We can analyse all price components across municipalities and consumption categories. But what do these consumption categories stand for?
Let's query all categories and their description.
query = """
SELECT DISTINCT ?category ?description
WHERE {
GRAPH <https://lindas.admin.ch/elcom/electricityprice> {
?s <https://energy.ld.admin.ch/elcom/electricityprice/dimension/category> ?category_uri.
?category_uri schema:name ?category .
?category_uri schema:description ?description .
}
}
ORDER BY ?category
"""
df = sparql.send_query(query)
df
So we have 15 different categories. Those knowning german will notice that the categories cover househoulds and companies. The difference is in expected energy consumption.
The tariffs themselves are not very meaningful. "How much is my monthly bill?" is much more relatable question. Let's try to answer that.
def extract_consumption(description: str) -> int:
"""
Extract average electricity consumption from a description.
Args:
description: Category description for electricity tariffs
Returns:
int: Electricity consumption in kWh/year
"""
number_as_string = description.split(" kWh/Jahr")[0]
return int(number_as_string.translate(str.maketrans('', '', string.punctuation)))
cat2description = dict(zip(df.category, df.description))
cat2consumption = dict(zip(df.category, [extract_consumption(d) for d in df.description]))
tariffs["consumption"] = tariffs[["category"]].replace({"category": cat2consumption})
tariffs["monthly_bill"] = ((tariffs.consumption*tariffs.variablecosts/12 + tariffs.fixcosts)/100).round(2)
tariffs.head()
We can now use the tariffs
table to analyze regional differences in energy prices.
Electricity prices vary across municipalities. The tariff
table gives us an overview of prices at each municipality. But what are their names? Where are they located? And how many people live there?
This information can be obtained using the SPARQL Endpoind from Swisstopo, Switzerland's national mapping agency.
We can query the list of municipalities, alongside with its boundaries as follows:
query = """
SELECT ?municipality_id ?municipality ?population ?boundary
WHERE {
?muni_iri dct:hasVersion ?version ;
geonames:featureCode geonames:A.ADM3 .
?version schema:validUntil "2020-12-31"^^<http://www.w3.org/2001/XMLSchema#date>;
geonames:population ?population;
schema:name ?municipality;
geosparql:hasGeometry/geosparql:asWKT ?boundary.
BIND(IRI(REPLACE(STR(?muni_iri), "https://ld.geo.admin.ch/boundaries/", "https://ld.admin.ch/")) AS ?municipality_id)
}
"""
communes = geosparql.send_query(query)
communes = communes.set_crs(epsg=4326)
communes.head()
You may notice that we defined a corrdinate reference system. The default projection for this endpoint is EPSG:4326
, also known as WGS84.
SparqlClient
automatically returns a geopandas
datafrme. By defining the coordinate reference system, we can start using its geoanalytics capacities immediately!
Let's start by visually inspecting all the communes.
communes.plot()
Great! Now we have a geospatial representation of all swiss municipalities and we also have energy tariffs across the municipalities.
Let's join this information in one dataframe.
join = pd.merge(communes[["municipality_id", "municipality", "population", "boundary"]], tariffs, how="inner", on="municipality_id")
join.drop(columns=["variablecosts", "consumption"], inplace=True)
join.head()
Now that we know what is in the data, let's dive deeper into it.
We have seen the prices vary across regions. But how big are these differences? Let's take a look.
We will start with defining a plot_tariffs
function. It will classify prices into buckets, and show the differences on the map.
style_function = lambda x: {'fillColor': '#ffffff',
'color':'#000000',
'fillOpacity': 0.1,
'weight': 0.1}
highlight_function = lambda x: {'fillColor': '#989898',
'color':'#000000',
'fillOpacity': 0.8}
def plot_tariffs(df, variable, variable_description, title):
classifier = mapclassify.NaturalBreaks(y=df[variable], k=5)
bins = [df[variable].min()] + list(classifier.bins)
m = folium.Map(location=[46.83, 8.13], zoom_start=8, tiles="cartodbpositron")
folium.Choropleth(
geo_data=json.loads(df.to_json()),
data=df,
columns=["municipality", variable],
key_on="feature.properties.municipality",
fill_color="YlGn",
fill_opacity=1,
line_weight=0,
smooth_factor=0,
bins=bins,
reset=True,
legend_name=variable_description,
).add_to(m)
hover = folium.features.GeoJson(
df,
style_function=style_function,
control=False,
highlight_function=highlight_function,
tooltip=folium.features.GeoJsonTooltip(
fields=['municipality', variable],
aliases=['Municipality: ', variable_description + ": "],
style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;")
)
)
folium.LayerControl().add_to(m)
m.add_child(hover)
m.keep_in_front(hover)
title_html = '''<h3 align="center" style="font-size:16px"><b>{}</b></h3>'''.format(title)
m.get_root().html.add_child(folium.Element(title_html))
return m
Now let's use it! How do the prices vary for an average household?
category = "H2"
plot_df = join[join.category == category]
plot_tariffs(plot_df, "monthly_bill", "Average monthly bill [CHF]", cat2description[category])
What about a small company?
category = "C2"
plot_df = join[join.category == "C2"]
plot_tariffs(plot_df, "monthly_bill", "Average monthly bill [CHF]", cat2description[category])
The energy prices can vary a lot! The location may become an important factor when starting a business.
What would be the best place to locate a small or a big company (based on exepcted energy consumption)?
cols = 2
titles = {"C1": "8 000 kWh/year\nSmall company", "C7": "1 500 000 kWh/year\nBig company with own transformator"}
categories = list(titles.keys())
fig, axs = plt.subplots(1, cols, figsize=(20,8))
for col in range(cols):
cat = categories[col]
plot_df = join[join.category == cat]
plot_df.plot(ax=axs[col], column="monthly_bill", cmap="GnBu", scheme="Natural_Breaks", k=5, legend=True, legend_kwds={"fmt": "{:.0f}"})
axs[col].set_axis_off()
axs[col].set_title(titles[cat])
fig = fig.suptitle("Average monthly bill for enterprises", fontsize=24)
What about the households? Are the regional differences the same for small and big households?
cols = 2
titles = {"H1": "1 600 kWh/year\n2-room appartment", "H5": "7 500 kW/year\n5-room house"}
categories = list(titles.keys())
fig, axs = plt.subplots(1, cols, figsize=(20,8))
for col in range(cols):
cat = categories[col]
plot_df = join[join.category == cat]
plot_df.plot(ax=axs[col], column="monthly_bill", cmap="YlGn", scheme="Natural_Breaks", k=5, legend=True, legend_kwds={"fmt": "{:.0f}"})
axs[col].set_axis_off()
axs[col].set_title(titles[cat])
fig = fig.suptitle("Average monthly bill for households", fontsize=24)
These plots enable us to easily find the cheapest regions. But how many people actualy pay the cheapest electicity price?
Let's now take a look at the number of consumers per price. The histograms will shows us how many people are elligible for certain tariff. They do not show how many people actually pay thos prices.
As people live in different households, their tariff varies with appartment size.
First, let's take a look at the price distribution of the H1 tariff (1600 kWh/year, corresponding to 2-room appartment)
category = "H1"
plot_df = join[["monthly_bill", "population"]][join.category == category]
plot_df["monthly_bill"] = plot_df["monthly_bill"].round().astype(int)
plot_df = plot_df.groupby("monthly_bill").sum().reset_index()
plot_df.head()
fig = px.bar(plot_df, x="monthly_bill", y="population")
fig.update_layout(
title='Electricity prices for 2-room appartment (1600 kWh/year)',
title_x=0.5,
yaxis_title="Elligible population",
xaxis_title="Average monthly bill"
)
fig.show()
What about bigger households?
Let's take a look at the price distribution of H5 tariff (7500 kW/year, corresponding to 5-room house)
category = "H5"
plot_df = join[["monthly_bill", "population"]][join.category == category]
plot_df["monthly_bill"] = plot_df["monthly_bill"].round().astype(int)
plot_df = plot_df.groupby("monthly_bill").sum().reset_index()
plot_df.head()
fig = px.bar(plot_df, x="monthly_bill", y="population")
fig.update_layout(
title='Electricity prices for 5-room household (7500 kWh/year)',
title_x=0.5,
yaxis_title="Elligible population",
xaxis_title="Average monthly bill"
)
fig.show()
At one place, multiple energy providers can operate. The customers are free to use the provider of their choice. This, in theory, should introduce competition. In long run, the electricity prices in these regions should also be lower.
Which municipalities have access to more than one electricity provider?
First, let's find all muncipalities with more than one provider. We can do it using the SPARQL aggregate query.
query = """
PREFIX cube: <https://cube.link/>
PREFIX elcom: <https://energy.ld.admin.ch/elcom/electricityprice/dimension/>
PREFIX schema: <http://schema.org/>
SELECT ?municipality_id (MAX(?providers) AS ?providers)
FROM <https://lindas.admin.ch/elcom/electricityprice>
WHERE {
SELECT ?municipality_id ?category (COUNT(*) AS ?providers)
WHERE {
<https://energy.ld.admin.ch/elcom/electricityprice/observation/> cube:observation ?observation.
?observation
elcom:category/schema:name ?category;
elcom:municipality ?municipality_id;
elcom:period "2020"^^<http://www.w3.org/2001/XMLSchema#gYear>;
elcom:product <https://energy.ld.admin.ch/elcom/electricityprice/product/standard>;
elcom:energy ?energy.
}
GROUP BY ?municipality_id ?category
HAVING (COUNT(*) > 1)
}
GROUP BY ?municipality_id
"""
df = sparql.send_query(query)
df.head()
All municipalities not on the list have one provider only. Let's add them doing a join on communes
.
plot_df = pd.merge(communes[["municipality_id", "municipality", "population", "boundary"]], df, how="outer", on="municipality_id").fillna(1)
plot_df.providers = plot_df.providers.astype(int)
plot_df.head()
Now that we know how many providers operate at each location, let's plot it. First, we will define a plot_categories
function. It will allow us to plot categorical variables on a map.
def plot_categories(df, variable, label, title, colormap):
m = folium.Map(location=[46.83, 8.13], zoom_start=8, tiles="cartodbpositron")
colormap = {str(k):v for k,v in colormap.items()}
df[variable] = df[variable].astype(str)
style_function = lambda x: {'color': "#a9a9a9",
'fillOpacity': 1,
'weight': 0.05,
'fillColor': colormap[x['properties'][variable]]}
categories = df[variable].unique()
for cat in categories:
data = json.loads(df[["municipality", "boundary", variable]][df[variable] == cat].to_json())
folium.features.GeoJson(
data,
smooth_factor=0,
style_function=style_function,
name=cat,
highlight_function=highlight_function,
tooltip=folium.features.GeoJsonTooltip(
fields=['municipality', variable],
aliases=['Municipality: ', label + ": "],
style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;")
)
).add_to(m)
folium.LayerControl().add_to(m)
title_html = '''<h3 align="center" style="font-size:16px"><b>{}</b></h3>'''.format(title)
m.get_root().html.add_child(folium.Element(title_html))
return m
Let's plot the number of providers per municipality.
categories = plot_df.providers.unique()
colormap = {cat: mpl.colors.rgb2hex(mpl.cm.plasma((i)/(len(categories)-1))) for i, cat in enumerate(categories)}
plot_categories(plot_df, "providers", "Electricity providers", "Electricity providers", colormap)
Grid costs are a core price component. They are used to finance the maintenance and expansion of the electricity grid, for example overhead lines, pylons and transformers.
Densly populated regions can share the infrastructure costs among more people. The space is smaller, and grid is denser. All else equal, this should, resut in cheaper maintenance costs in such areas.
Is this a valid hypothesis? Let's find out!
First, let's calculate population density. Remeber to use coordinate reference system that is meant for representing areas!
join["hectares"] = join.to_crs(epsg=3035).area/10000 # In hectares
join["population_density"] = join["population"]/join["hectares"]
join.head(3)
Let's now take a look at the relationship between grid costs, and population density
plot_df = join[join.category == "H1"]
fig = px.scatter(plot_df, y="grid", x="population_density", hover_data=["municipality"],
labels={"population_density": "Inbahitants per ha", "grid": "Grid costs per kWh"})
fig.show()
It looks like there is no relation between grid costs and population density. However, we can observe one other interesting thing:
Some communes pay zero grid fees!
Let's find out who pays zero grid fees. First, we will assign a binary variable defining whether a commune pays grid fees, aidfee and energy.
join = join.assign(pays_grid=lambda x: x.grid!=0,
pays_aidfee=lambda x: x.aidfee!=0,
pays_energy=lambda x: x.energy!=0)
join.head(3)
Now, let's use the plot_categories
function to find out who benefits from free grid usage.
plot_df = join[join.category=="C1"].reset_index(drop=True)
colormap = {True: "#636EFA", False: "#EF553B"}
plot_categories(plot_df, "pays_grid", "Paid grid usage", "Grid fees", colormap)
Which communes are those? How many people live there?
temp = join[["municipality_id", "municipality", "pays_grid", "population"]].groupby(["municipality_id", "municipality", "population"]).all().reset_index()
pop_free_grid = temp[(~temp.pays_grid)]["population"].sum()
temp[(~temp.pays_grid)].reset_index(drop=True)
We have seen that some places do not pay for grid usage. Another twist is that the energy world also knows negative or zero prices. Are there any places that have zero energy tariffs?
join["has_free_energy"] = ~join.pays_energy
temp = join[["municipality_id", "municipality", "has_free_energy", "population"]].groupby(["municipality_id", "municipality", "population"]).all().reset_index()
temp[temp.has_free_energy]
Not really. There is no municipality with free electricity!
What about aidfee?
The government states that everyone pays aidfee. Do we find the same insights in data?
plot_df = join[join.category=="C1"].reset_index(drop=True)
colormap = {True: "#00CC96", False: "#AB63FA"}
plot_categories(plot_df, "pays_aidfee", "Pays aidfee", "Aidfees", colormap)
Which communes do not pay aidfee? How many people live there?
temp = join[["municipality_id", "municipality", "pays_aidfee", "population"]].groupby(["municipality_id", "municipality", "population"]).all().reset_index()
pop_free_aidfee = temp[(~temp.pays_aidfee)]["population"].sum()
temp[(~temp.pays_aidfee)].reset_index(drop=True)
DISCLAIMER: ElKom states:
The amount of the aidfee is set annually by the Federal Council. The levy is the same everywhere.
However, that is not what we find in the data. Possible explanation that we have not verified?
Some communes managed to get a waiver for aidfee. The aidfee is hence the same for everyone. Exept that not everyone pays it. Or, maybe more probabale, there is an issue with the data as provided by ELKom.
Few communes benefit from fee waivers for aidfee, and grid usage. How many people benefit from these waivers?
Let's find out!
pop_total = communes.population.sum()
fig = make_subplots(rows=1, cols=2, subplot_titles=["grid usage", "Aidfee"], specs=[[{"type": "pie"}, {"type": "pie"}]])
fig.add_trace(go.Pie(
values=[pop_free_grid,pop_total-pop_free_grid],
labels=["Not paying", "Paying"],
name="Grid Usage"),
row=1, col=1)
fig.add_trace(go.Pie(
values=[pop_free_aidfee,pop_total-pop_free_aidfee],
labels=["Not paying", "Paying"],
name="Aidfee"),
row=1, col=2)
fig.update_annotations(yshift=-280)
fig.update_layout(height=400, title={"text": "Population benefiting from free tariffs", "x": 0.5})
fig.show()
We have seen which population benefits from fee waivers. What about companies?
For a manufacturing company in an energy-intensive sector it may make sense to optimize its location. Such companies may explicitly look for places with cheaper electricity. Do the places with free grid usage attract energy-intensive companies? Are there some companies that benefit from this waiver?
To find out who benefits from grid fee waiver, let's take a look at ZEFIX. ZEFIX is the Swiss commerce register. ZEFIX will provide us with company name, type, description, and address and ZEFIX is also available as Linked Data.
Let's find all companies registered in municipalities with free grid usage. To start, we will list all those municipalities:
municipalities_free_energy = join["municipality_id"][~(join.pays_energy)].unique()
municipalities_free_energy
Now let's find all companies registered in one of those villages (and hence are entitiled to free grid usage).
query = """
SELECT ?company ?type ?company_iri
WHERE {
?company_iri a admin:ZefixOrganisation.
?company_iri schema:legalName ?company.
?company_iri admin:municipality <https://ld.admin.ch/municipality/3695>.
?company_iri schema:additionalType/schema:name ?type .
FILTER(LANG(?type) = "de")
}
ORDER BY ?type ?company
"""
df = sparql.send_query(query)
df.head(7)
This gives us a list of ALL companies in the region. However, only the big energy consumers will see an impact of fee waiver on their accounts. While we don't know which enterprise pays how much, we can hypothesize the following:
Only the big, energy-intensive businesses will have high enough energy bills to justify moving their production lines. Such companies would typically be a join-stock entities.
We can hence filter our company list to only include join-stock companies only.
query = """
SELECT ?municipality ?company ?company_iri
WHERE {{
?company_iri a admin:ZefixOrganisation.
?company_iri schema:legalName ?company.
?company_iri admin:municipality ?municipality_iri.
?company_iri schema:additionalType <https://ld.admin.ch/ech/97/legalforms/0106> .
?municipality_iri schema:name ?municipality.
FILTER(?municipality_iri IN({}))
}}
ORDER BY ?municipality ?company
""".format("<" + ">,<".join(municipalities_free_energy) + ">")
df = sparql.send_query(query)
df
Et volià! The list of all potentail benefitiaries of free grid usage.