View on GitHub

Swiss commerce register¶

The Federal Office of Justice maintains ZEFIX, the swiss commerce register for all legally operating businesses.

The register provides us with company name, type, description, and address.
ZEFIX is also available as Linked Data.

Setup¶

SPARQL endpoints¶

For companies data¶

Swiss commerce register data can be accessed with SPARQL queries.
You can send queries using HTTP requests. The API endpoint is https://lindas.admin.ch/query/.

For geodata¶

To understand companies' location, we will work with Swiss geodata. It can be accessed with GeoSPARQL under https://ld.geo.admin.ch/query.

SPARQL client¶

Let's use SparqlClient from graphly to communicate with both databases. Graphly will allow us to:

send SPARQL queries
automatically add prefixes to all queries
format response to pandas or geopandas

import json
import os.path

import folium
import mapclassify
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from graphly.api_client import SparqlClient

%matplotlib inline

# Uncomment to install dependencies in Colab environment
#!pip install mapclassify
#!pip install git+https://github.com/zazuko/graphly.git

sparql = SparqlClient("https://lindas.admin.ch/query")
geosparql = SparqlClient("https://ld.geo.admin.ch/query")

sparql.add_prefixes({
    "schema": "<http://schema.org/>",
    "admin": "<https://schema.ld.admin.ch/>"
})

geosparql.add_prefixes({
    "dct": "<http://purl.org/dc/terms/>",
    "geonames": "<http://www.geonames.org/ontology#>",
    "schema": "<http://schema.org/>",
    "geosparql": "<http://www.opengis.net/ont/geosparql#>",
})

SPARQL queries can become very long. To improve the readibility, we will work wih prefixes.

Using the add_prefixes method, we can define persistent prefixes. Every time you send a query, graphly will now automatically add the prefixes for you.

Companies by company type¶

Swiss law offers many opportnities to business owners. They can choose from several legal entites. What entities can we find in ZEFIX? Which ones are the most popular among enterpreneurs?

query = """
SELECT ?type (COUNT(DISTINCT ?company_iri) AS ?ccount)
WHERE {
    ?company_iri a admin:ZefixOrganisation.
    ?company_iri schema:additionalType/schema:name ?type .
  
  FILTER(LANG(?type) = "de")
}
GROUP BY ?type
ORDER BY DESC(?ccount)
"""

df = sparql.send_query(query)

# Let's rename variables to english

de2en = {'Kommanditgesellschaft': "Limited Partnership",
         'Ausländische Niederlassung im Handelsregister eingetragen': "Foreign Branch",
         'Genossenschaft': "Cooperative",
         'Verein': "Association",
         'Kollektivgesellschaft': "General Partnership",
         'Schweizerische Zweigniederlassung im Handelsregister eingetragen': "Swiss Branch",
         'Stiftung': "Foundation",
         'Einzelunternehmen': "Sole proprietorship",
         'Aktiengesellschaft': "Joint-stock Company",
         'Gesellschaft mit beschränkter Haftung GMBH / SARL': "Limited Liability Company"}

df["type"] = df["type"].apply(lambda x: de2en[x] if x in de2en else x)

fig = px.bar(df[df.ccount > 500], y="type", x="ccount", orientation = "h", labels={"type": "", "ccount": "Company Count"})
fig.update_layout(
    title='Which company type is most popular?', 
    title_x=0.5,
)
fig.update_layout(bargap=0.40)
fig.show()

Companies by municipality¶

The economic activity varies significantly across regions. What are the most popular places to register a company?

query = """
SELECT ?municipality ?muni_id (COUNT(?sub) AS ?companies)
FROM <https://lindas.admin.ch/foj/zefix>
FROM <https://lindas.admin.ch/territorial>
WHERE {
	?sub a admin:ZefixOrganisation ;
      admin:municipality ?muni_id.
    ?muni_id schema:name ?municipality;
} 
GROUP BY ?municipality ?muni_id
ORDER BY DESC(?companies)
"""

df = sparql.send_query(query)

# Communes
query = """    
SELECT ?muni_id ?population ?boundary 

WHERE {
  ?muni_iri dct:hasVersion ?version ;
            geonames:featureCode geonames:A.ADM3 .
  
  ?version schema:validUntil "2020-12-31"^^<http://www.w3.org/2001/XMLSchema#date>;
           geonames:population ?population;
           geosparql:hasGeometry/geosparql:asWKT ?boundary.
  
  BIND(IRI(REPLACE(STR(?muni_iri), "https://ld.geo.admin.ch/boundaries/", "https://ld.admin.ch/")) AS ?muni_id)
}
"""
communes = geosparql.send_query(query)
communes = communes.set_crs(epsg=4326)
communes["center"] = communes.to_crs(epsg=3035).centroid.to_crs(epsg=4326)

/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/pandas/core/dtypes/cast.py:1983: ShapelyDeprecationWarning:

__len__ for multi-part geometries is deprecated and will be removed in Shapely 2.0. Check the length of the `geoms` property instead to get the  number of parts of a multi-part geometry.

join = pd.merge(communes, df, how="inner", on="muni_id")
join.sort_values(by="companies", ascending=False, inplace=True)
join = join.reset_index(drop=True)
join[["municipality", "companies"]]

The most popular places are the biggest cities.

This should come at no surprise. The more people live in a certain area, the more business founders it will have. The company density is expected to be directly correlated with population density.

Now, let's take into account the population density. Which regions have most companies per 10.000 inhabitants?

join["companies_per_inhabitants"] = round(join.companies/join.population*10000)
join["companies_per_inhabitants"] = join["companies_per_inhabitants"].astype(int)
join.sort_values(by="companies_per_inhabitants", ascending=False, inplace=True)
join = join.reset_index(drop=True)
join[["municipality", "companies_per_inhabitants"]].head()

Interesting! Zug is the only bigger city that appears on the top of the list.

Where are the most popular registration destinations?

# Companies per 10.000 inhabitants

style_function = lambda x: {'fillColor': '#ffffff', 
                            'color':'#000000', 
                            'fillOpacity': 0.1, 
                            'weight': 0.1}

highlight_function = lambda x: {'fillColor': '#989898', 
                                'color':'#000000', 
                                'fillOpacity': 0.8}

def plot_heatmap(df, viz_column, hover_cols, col2label, title):
    
    classifier = mapclassify.NaturalBreaks(y=df[viz_column], k=5)
    bins = [df[viz_column].min()] + list(classifier.bins)
    cols = ["municipality", viz_column, *hover_cols]

    m = folium.Map(location=[46.83, 8.13], zoom_start=8, tiles="cartodbpositron")

    folium.Choropleth(
        geo_data=json.loads(df.to_json()),
        data=df,
        columns=cols,
        key_on="feature.properties.municipality",
        fill_color="YlOrRd",
        fill_opacity=1,
        line_weight=0,
        smooth_factor=0,
        bins=bins,
        reset=True,
        legend_name=col2label[viz_column],
    ).add_to(m)

    hover = folium.features.GeoJson(
        df,
        style_function=style_function, 
        control=False,
        highlight_function=highlight_function,
        tooltip=folium.features.GeoJsonTooltip(
            fields=cols,
            aliases=[col2label[i] for i in cols],
            style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;") 
        )
    )

    folium.LayerControl().add_to(m)
    m.add_child(hover)
    m.keep_in_front(hover)
    
    title_html = '''<h3 align="center" style="font-size:16px"><b>{}</b></h3>'''.format(title)   
    m.get_root().html.add_child(folium.Element(title_html))

    return m

join = join.drop(columns="center")
col2label = {"municipality": "Municipality", "companies_per_inhabitants": "Companies per 10.000 inhabitants", "companies": "Companies ", "population": "Population"}
plot_heatmap(join, "companies_per_inhabitants", ["companies", "population"], col2label, "Where are most companies registered?")

Companies by address¶

We know which municipalities have the most registered companies. Now, let's dive one level deeper.

Every company has to register its business address. So what swiss addresses are most beloved among companies?

query = """
SELECT ?street ?zip ?town (COUNT(?company) AS ?ccount)
FROM <https://lindas.admin.ch/foj/zefix>
WHERE {
  ?company a <https://schema.ld.admin.ch/ZefixOrganisation>;
           <http://schema.org/address> ?address.
  
  ?address <http://schema.org/streetAddress> ?street;
           <http://www.w3.org/ns/locn#postCode> ?zip;
           <http://www.w3.org/ns/locn#postName> ?town.
}
GROUP BY ?town ?zip ?street
ORDER BY DESC (?ccount)
LIMIT 15
"""

df = sparql.send_query(query)
df

Wow! Over 400 companies are registered under the address of Telan AG in Luzern.

Some of those addresses represent big offices. For example, Technoparkstrasse 1 in Zurich is one of the biggest office buildings in the city.

Other addresses are simply registrations used by many, many people. Take a look, and google some of the addresses in Zug. How many companies can you find operating under this address?

Thus far we had a look at individual buildings. Now, let's aggregate this result on the street level. What are the most crowded streets with company registrations? Can we find some business centers?

query = """
SELECT ?thoroughfare ?zip ?town (COUNT(?company) AS ?ccount)
FROM <https://lindas.admin.ch/foj/zefix>
WHERE {
  ?company a <https://schema.ld.admin.ch/ZefixOrganisation>;
           <http://schema.org/address> ?address.
  
  ?address <http://www.w3.org/ns/locn#thoroughfare> ?thoroughfare;
           <http://www.w3.org/ns/locn#postCode> ?zip;
           <http://www.w3.org/ns/locn#postName> ?town.
}
GROUP BY ?town ?zip ?thoroughfare
HAVING (?ccount >= 100)
ORDER BY DESC (?ccount)
"""
df = sparql.send_query(query, timeout=60)
df.head(15)

Slightly different! We see clearly that Zug is the favourite destination for company registrations.

What are other popular locations?

def geocode(df: pd.DataFrame, cache_file: str="addresses.json"):

    if os.path.isfile(cache_file): 
        cached_locations = gpd.read_file(cache_file)
        cached_addresses = cached_locations.address
    else:
        cached_locations = gpd.GeoDataFrame(columns=["address", "geometry"], crs="EPSG:4326")
        cached_addresses = []
        
    is_new_address = ~df.address.isin(cached_addresses)

    if any(is_new_address):
        new_locations = gpd.tools.geocode(df.address[is_new_address], user_agent="zazuko", provider="nominatim")
        new_locations["address"] = df.address[is_new_address]
        new_locations = new_locations.set_crs(epsg=4326)
        new_locations = new_locations[~new_locations.geometry.is_empty]
        all_locations = gpd.GeoDataFrame(pd.concat([new_locations, cached_locations], ignore_index=True), crs=new_locations.crs)
        all_locations.to_file(cache_file, driver="GeoJSON")
    else:
        all_locations = cached_locations

    return pd.merge(all_locations, df, on="address", how="right")

df.loc[: ,"address"] = df["thoroughfare"] + ", " + df["zip"].astype(str) + " " + df["town"] + ", Switzerland"
plot_df = geocode(df)

bins = [250, 500, 1000, df.ccount.max()]
plot_df.loc[:,"text"] = df["thoroughfare"] + ", " + df["town"] + "<br>Companies: " + plot_df.ccount.astype(int).astype(str)
classifier = mapclassify.UserDefined(y=df.ccount, bins=bins)
plot_df.loc[:, "buckets"] = plot_df[["ccount"]].apply(classifier) 

norm = mpl.colors.Normalize(vmin=0, vmax=len(bins))
colormap = mpl.cm.ScalarMappable(norm=norm, cmap=mpl.cm.inferno)
labels = mapclassify.classifiers._get_mpl_labels(classifier, fmt="{:.0f}")

fig = go.Figure()

for bucket in range(len(bins)):

    subset = plot_df[plot_df.buckets == bucket]

    #sizes = (subset.ccount*0.05)**0.9
    #sizes[sizes<7] = 7
    #sizes[sizes>50] = 50

    fig.add_trace(go.Scattermapbox(
        mode="markers",
        lat=subset.geometry.y,
        lon=subset.geometry.x,
        hovertext = subset.text,
        hoverinfo = "text",
        name=labels[bucket],
        marker={'size': 7.5, "color": "rgba{}".format(colormap.to_rgba(bucket+1))}, #((subset.ccount)*0.05)**0.9
    ))

fig.update_layout(
    margin={'l': 0, 't': 50, 'b': 0, 'r': 0},
    mapbox={
        'center': {"lat": 46.80515, "lon": 8.1336},
        'style': "carto-darkmatter",
        'zoom': 6.9},
    showlegend=True,
    legend_title="Registered companies",
    title_text='Where are most companies registered?', 
    title_x=0.5,
    width=980,
    height=600
)

fig.show("notebook")

Thank you for following along and hopefully this notebook was helpful. If you want to get in touch with us, please reach out to us via email.

	municipality	companies
0	Zürich	46011
1	Genève	29999
2	Zug	17536
3	Basel	16449
4	Lugano	13835
...	...	...
2128	Rumendingen	2
2129	Schelten	1
2130	Bister	1
2131	Rebévelier	1
2132	Rongellen	1

	municipality	companies_per_inhabitants
0	Zug	5742
1	Soyhières	4527
2	Feusisberg	3527
3	Manno	3512
4	Chiasso	3388

	street	zip	town	ccount
0	c/o Telan AG\nMurbacherstrasse 37	6003	Luzern	384
1	Sihleggstrasse 23	8832	Wollerau	320
2	Dammstrasse 19	6300	Zug	215
3	Haldenstrasse 5	6340	Baar	213
4	Bahnhofstrasse 21	6300	Zug	206
5	Roosstrasse 53	8832	Wollerau	199
6	Neuhofstrasse 5A	6340	Baar	193
7	Chamerstrasse 172	6300	Zug	193
8	Gotthardstrasse 26	6300	Zug	190
9	Technoparkstrasse 1	8005	Zürich	172
10	Industriestrasse 47	6300	Zug	166
11	Churerstrasse 35	9470	Buchs SG	163
12	Alte Steinhauserstrasse 1	6330	Cham	162
13	Baarerstrasse 43	6300	Zug	161
14	Oberneuhofstrasse 5	6340	Baar	159

	thoroughfare	zip	town	ccount
0	Baarerstrasse	6300	Zug	3153
1	Bahnhofstrasse	6300	Zug	1450
2	Bösch	6331	Hünenberg	916
3	Chamerstrasse	6300	Zug	819
4	Seefeldstrasse	8008	Zürich	809
5	Zugerstrasse	6340	Baar	787
6	Churerstrasse	8808	Pfäffikon SZ	725
7	Bahnhofstrasse	8001	Zürich	721
8	Corso San Gottardo	6830	Chiasso	720
9	Industriestrasse	6300	Zug	681
10	Poststrasse	6300	Zug	655
11	Rue du Rhône	1204	Genève	622
12	rue du Rhône	1204	Genève	618
13	Gewerbestrasse	6330	Cham	597
14	Baarerstrasse	6302	Zug	569

Central Business Name Index - Zefix