AI

Assessing City Inexperienced Equality Utilizing Vienna’s Open Information Portal | by Milan Janosov | Sep, 2023

Photograph by CHUTTERSNAP on Unsplash

Regardless of their many benefits, accessing nature and inexperienced areas is getting more and more tough in extremely urbanized areas. Some worry that underserved communities are extra uncovered to those points. Right here, I suggest a data-driven strategy to discover this.

Specifically, I pose an city improvement query that has these days been gaining curiosity throughout skilled circles and native governments — now as inexperienced equality. This idea refers back to the disparities in individuals accessing inexperienced areas in several components of a specific metropolis. Right here, I discover its monetary dimension and see if there are any clear relationships between the accessible inexperienced space per capita and the financial stage of that very same city unit.

I’ll discover two completely different spatial resolutions of town — districts and census districts utilizing Esri Shapefiles supplied by the Austrian Authorities’s Open Information Portal. I can even incorporate tabular statistical knowledge (inhabitants and earnings) into the georeferenced administrative areas. Then, I overlay the executive areas with an official inexperienced space dataset, recording the placement of every inexperienced house in a geospatial format. Then, I mix this info and quantify every city district’s complete inexperienced house per capita measurement. Lastly, I relate every space’s monetary standing, captured by annual web earnings, to the inexperienced space per capita ratio to see if any patterns emerge.

Let’s check out the Austrian authorities’s Open Information Portal here.

Once I was writing this text, the web site’s English translation wasn’t actually working, so as a substitute of counting on my long-forgotten 12 years of German lessons, I used DeepL to navigate throughout the subpages and 1000’s of datasets.

Then, I collected a few knowledge recordsdata, each georeferenced (Esri shapefiles) and easy tabular knowledge, which I’ll use for the later evaluation. The info I collected:

Boundaries — the executive boundaries of the next spatial models in Vienna:

Land-use — details about the placement of inexperienced areas and built-in areas:

Statistics — knowledge on inhabitants and earnings similar to the socio-economical stage of an area:

  • Population per district, yearly recorded since 2002, and saved break up primarily based on 5-year age teams, gender, and unique nationality
  • Population per census district, yearly recorded since 2008 and saved break up primarily based on three irregular age teams, gender, and origin
  • Average net income since 2002 within the districts of Vienna, expressed in Euros per worker every year

Moreover, I saved the downloaded knowledge recordsdata in an area folder referred to as knowledge.

2.1 Administrative boundaries

First, learn and visualize the completely different form recordsdata containing every administrative boundary stage to have a more in-depth grip on town at hand:

folder     = 'knowledge'
admin_city = gpd.read_file(folder + '/LANDESGRENZEOGD')
admin_district = gpd.read_file(folder + '/BEZIRKSGRENZEOGD')
admin_census = gpd.read_file(folder + '/ZAEHLBEZIRKOGD')

show(admin_city.head(1))
show(admin_district.head(1))
show(admin_census.head(1))

Right here we make an observation that the column names BEZNR and ZBEZ, correspond to the District ID and the Census district ID, respectively. Unexpectedly, they’re saved/parsed in several codecs, numpy.float64 and str:

print(kind(admin_district.BEZNR.iloc[0]))
print(kind(admin_census.ZBEZ.iloc[0]))pyth

Ensuring we certainly have 23 districts and 250 census districts as the info recordsdata documentation claimed:

print(len(set(admin_district.BEZNR)))
print(len(set(admin_census.ZBEZ)))

Now visualize the boundaries — first town, then its districts, after which the even smaller census districts.

f, ax = plt.subplots(1,3,figsize=(15,5))

admin_city.plot(ax=ax[0],
edgecolor = 'okay',
linewidth = 0.5,
alpha = 0.9,
cmap = 'Reds')

admin_district.plot(ax=ax[1],
edgecolor = 'okay',
linewidth = 0.5,
alpha = 0.9,
cmap = 'Blues')

admin_census.plot(ax=ax[2],
edgecolor = 'okay',
linewidth = 0.5,
alpha = 0.9,
cmap = 'Purples')

ax[0].set_title('Metropolis boundaries')
ax[1].set_title('District boundaries')
ax[2].set_title('Census distrcit boundaries')

This code outputs the next visuals of Vienna:

Differet admin ranges of Vienna. Picture by the creator.

2.2 Inexperienced areas

Now, additionally check out the inexperienced house distribution:

gdf_green  = gpd.read_file(folder + '/GRUENFREIFLOGD_GRUENGEWOGD')
show(gdf_green.head(3))

Right here, one could discover that there isn’t any direct strategy to hyperlink inexperienced areas (e.g., no district id-s added) to neighborhoods — so afterward, we are going to accomplish that by manipulating the geometries to search out overlaps.

Now visualize this:

f, ax = plt.subplots(1,1,figsize=(7,5))

gdf_green.plot(ax=ax,
edgecolor = 'okay',
linewidth = 0.5,
alpha = 0.9,
cmap = 'Greens')

ax.set_title('Inexperienced areas in Vienna')

This code reveals the place the inexperienced areas are inside Vienna:

The official inexperienced belt of Vienna. Picture by the creator.

We could word that forestry segments are nonetheless inside the admin boundary, implying that not each a part of town is urbanized and considerably populated. In a while, we are going to get again to this when evaluating the per-capital inexperienced space.

2.3 Statistical knowledge — inhabitants, earnings

Lastly, let’s check out the statistical knowledge recordsdata. The primary main distinction is that these are usually not georeferenced however easy csv tables:

df_pop_distr = pd.read_csv('vie-bez-pop-sex-age5-stk-ori-geo4-2002f.csv', 
sep = ';',
encoding='unicode_escape',
skiprows = 1)

df_pop_cens = pd.read_csv('vie-zbz-pop-sex-agr3-stk-ori-geo2-2008f.csv',
sep = ';',
encoding='unicode_escape',
skiprows = 1)

df_inc_distr = pd.read_csv('vie-bez-biz-ecn-inc-sex-2002f.csv',
sep = ';',
encoding='unicode_escape',
skiprows = 1)

show(df_pop_distr.head(1))
show(df_pop_cens.head(1))
show(df_inc_distr.head(1))

3.1. Getting ready the statistical knowledge recordsdata

The earlier subsection reveals that the statistical knowledge tables use completely different naming conventions — they’ve DISTRICT_CODE and SUB_DISTRICT_CODE identifiers as a substitute of issues like BEZNR and ZBEZ. Nonetheless, after studying every knowledge set’s documentation, it turns into clear that it’s straightforward to remodel from one to a different, for which I current two brief capabilities within the subsequent cell. I’ll concurrently course of knowledge on the extent of districts and census districts.

Moreover, I’ll solely have an interest within the (newest) aggregated values and knowledge factors of the statistical info, reminiscent of the entire inhabitants on the latest snapshot. So, let’s clear up these knowledge recordsdata and maintain the columns I’ll use later.

# these capabilities convert the district and census district ids to be compatbile with those discovered within the shapefiles
def transform_district_id(x):
return int(str(x)[1:3])

def transform_census_district_id(x):
return int(str(x)[1:5])

# choose the most recent yr of the info set
df_pop_distr_2 = df_pop_distr[df_pop_distr.REF_YEAR
==max(df_pop_distr.REF_YEAR)]
df_pop_cens_2 = df_pop_cens[df_pop_cens.REF_YEAR
==max(df_pop_cens.REF_YEAR)]
df_inc_distr_2 = df_inc_distr[df_inc_distr.REF_YEAR
==max(df_inc_distr.REF_YEAR)]

# convert district ids
df_pop_distr_2['district_id'] =
df_pop_distr_2.DISTRICT_CODE.apply(transform_district_id)

df_pop_cens_2['census_district_id'] =
df_pop_cens_2.SUB_DISTRICT_CODE.apply(transform_census_district_id)

df_inc_distr_2['district_id'] =
df_inc_distr_2.DISTRICT_CODE.apply(transform_district_id)

# combination inhabitants values
df_pop_distr_2 = df_pop_distr_2.groupby(by = 'district_id').sum()
df_pop_distr_2['district_population'] = df_pop_distr_2.AUT +
df_pop_distr_2.EEA + df_pop_distr_2.REU + df_pop_distr_2.TCN
df_pop_distr_2 = df_pop_distr_2[['district_population']]

df_pop_cens_2 = df_pop_cens_2.groupby(by = 'census_district_id').sum()
df_pop_cens_2['census_district_population'] = df_pop_cens_2.AUT
+ df_pop_cens_2.FOR
df_pop_cens_2 = df_pop_cens_2[['census_district_population']]

df_inc_distr_2['district_average_income'] =
1000*df_inc_distr_2[['INC_TOT_VALUE']]
df_inc_distr_2 =
df_inc_distr_2.set_index('district_id')[['district_average_income']]

# show the finalized tables
show(df_pop_distr_2.head(3))
show(df_pop_cens_2.head(3))
show(df_inc_distr_2.head(3))

# and unifying the naming conventions
admin_district['district_id'] = admin_district.BEZNR.astype(int)
admin_census['census_district_id'] = admin_census.ZBEZ.astype(int)

print(len(set(admin_census.ZBEZ)))

Double-check the computed complete inhabitants values on the two ranges of aggregations:

print(sum(df_pop_distr_2.district_population))
print(sum(df_pop_cens_2.census_district_population))

These two ought to each present the identical end result — 1931593 individuals.

3.1. Getting ready the geospatial knowledge recordsdata

Now that we’re accomplished with the important knowledge preparation of the statistical recordsdata, it’s time to match the inexperienced space polygons to the executive space polygons. Then, let’s compute every admin space’s complete inexperienced space protection. Moreover, I’ll add every admin space’s relative inexperienced space protection out of curiosity.

To acquire areas expressed in SI models, we have to change to a so-called native CRS, which within the case of Vienna is EPSG:31282. You extra learn extra on this matter, map projection and coordinate reference methods here and right here.

# changing all GeoDataFrames into the loca crs
admin_district_2 =
admin_district[['district_id', 'geometry']].to_crs(31282)

admin_census_2 =
admin_census[['census_district_id', 'geometry']].to_crs(31282)

gdf_green_2 = gdf_green.to_crs(31282)

Compute the executive unit’s space measured in SI models:

admin_district_2['admin_area'] = 
admin_district_2.geometry.apply(lambda g: g.space)

admin_census_2['admin_area'] =
admin_census_2.geometry.apply(lambda g: g.space)

show(admin_district_2.head(1))
show(admin_census_2.head(1))

4.1 Compute the inexperienced space protection in every administrative unit

I’ll use GeoPandas’ overlay perform to overlay these two administrative boundary GeoDataFrames with the GeoDataFrame containing the inexperienced space polygons. Then, I compute the world of every inexperienced space part falling into completely different administrative areas. Subsequent, I sum up these areas to the extent of every admin space, each districts and census districts. Within the remaining step, at every decision unit, I add the executive beforehand computed official unit areas and calculate the entire space to inexperienced space ratio for every district and census district.

gdf_green_mapped_distr = gpd.overlay(gdf_green_2, admin_district_2)

gdf_green_mapped_distr['green_area'] =
gdf_green_mapped_distr.geometry.apply(lambda g: g.space)

gdf_green_mapped_distr =
gdf_green_mapped_distr.groupby(by = 'district_id').sum()[['green_area']]

gdf_green_mapped_distr =
gpd.GeoDataFrame(admin_district_2.merge(gdf_green_mapped_distr, left_on = 'district_id', right_index = True))

gdf_green_mapped_distr['green_ratio'] =
gdf_green_mapped_distr.green_area / gdf_green_mapped_distr.admin_area

gdf_green_mapped_distr.head(3)

gdf_green_mapped_cens = gpd.overlay(gdf_green_2, admin_census_2)
gdf_green_mapped_cens['green_area'] =
gdf_green_mapped_cens.geometry.apply(lambda g: g.space)

gdf_green_mapped_cens =
gdf_green_mapped_cens.groupby(by = 'census_district_id').sum()[['green_area']]

gdf_green_mapped_cens =
gpd.GeoDataFrame(admin_census_2.merge(gdf_green_mapped_cens, left_on = 'census_district_id', right_index = True))

gdf_green_mapped_cens['green_ratio'] = gdf_green_mapped_cens.green_area / gdf_green_mapped_cens.admin_area
gdf_green_mapped_cens.head(3)

Lastly, visualize the inexperienced ratio per district and census district! The outcomes appear to make a number of sense, with a excessive stage of greenery on the outer components and far decrease within the central areas. Additionally, the 250 census districts clearly present a extra detailed, fine-grained image of the completely different neighborhood’s traits, providing extra profound and extra localized insights for city planners. Then again, the district-level info, with ten instances fewer spatial models, as a substitute reveals grand averages.

f, ax = plt.subplots(1,2,figsize=(17,5))

gdf_green_mapped_distr.plot(ax = ax[0],
column = 'green_ratio',
edgecolor = 'okay',
linewidth = 0.5,
alpha = 0.9,
legend = True,
cmap = 'Greens')

gdf_green_mapped_cens.plot(ax = ax[1],
column = 'green_ratio',
edgecolor = 'okay',
linewidth = 0.5,
alpha = 0.9,
legend = True,
cmap = 'Greens')

This block of code outputs the next maps:

These two maps present the inexperienced space ratio in every district / census district. in Vienna. Picture by the creator.

4.2 Add inhabitants and earnings info for every administrative unit

Within the remaining step of this part, let’s map the statistical knowledge into administrative areas. Reminder: We now have inhabitants knowledge on each the extent of districts and the extent of census districts. Nonetheless, I may solely discover earnings (socioeconomic stage indicator) on the extent of districts. It is a typical trade-off in geospatial knowledge science. Whereas one dimension (greenery) is far more insightful on the increased decision (census districts), knowledge constraints could power us to make use of the decrease decision anyway.

show(admin_census_2.head(2))
show(df_pop_cens_2.head(2))
gdf_pop_mapped_distr  = admin_district_2.merge(df_pop_distr_2, 
left_on = 'district_id', right_index = True)

gdf_pop_mapped_cens = admin_census_2.merge(df_pop_cens_2,
left_on = 'census_district_id', right_index = True)

gdf_inc_mapped_distr = admin_district_2.merge(df_inc_distr_2,
left_on = 'district_id', right_index = True)

f, ax = plt.subplots(1,3,figsize=(15,5))

gdf_pop_mapped_distr.plot(column = 'district_population', ax=ax[0],
edgecolor = 'okay', linewidth = 0.5, alpha = 0.9, cmap = 'Blues')

gdf_pop_mapped_cens.plot(column = 'census_district_population', ax=ax[1],
edgecolor = 'okay', linewidth = 0.5, alpha = 0.9, cmap = 'Blues')

gdf_inc_mapped_distr.plot(column = 'district_average_income', ax=ax[2],
edgecolor = 'okay', linewidth = 0.5, alpha = 0.9, cmap = 'Purples')

ax[0].set_title('district_population')
ax[1].set_title('census_district_population')
ax[2].set_title('district_average_incomee')

This block of codes ends in the next determine:

The assorted statistical info on Vienna neighborhoods. Picture by the creator.

4.3. Inexperienced area-per-capita computation

Let’s sum up what we have now now, all built-in into respectable shapefiles similar to the districts and census districts of Vienna:

On the extent of districts, we have now inexperienced space ratio, inhabitants and earnings knowledge

On the extent of census districts, we have now a inexperienced space ratio and inhabitants knowledge

To seize inexperienced equality merely, I merge the data on the inexperienced space’s absolute measurement and the inhabitants in districts and census districts and compute the entire quantity of inexperienced space per capita.

Let’s check out our enter — inexperienced protection and inhabitants:

# a plot for the disticts
f, ax = plt.subplots(1,2,figsize=(10,5))

gdf_green_mapped_distr.plot(
ax = ax[0],
column = 'green_ratio',
edgecolor = 'okay',
linewidth = 0.5,
alpha = 0.9,
cmap = 'Greens')

gdf_pop_mapped_distr.plot(
ax = ax[1],
column = 'district_population',
edgecolor = 'okay',
linewidth = 0.5,
alpha = 0.9,
cmap = 'Reds')

ax[0].set_title('green_ratio')
ax[1].set_title('district_population')

# a plot for the census disticts
f, ax = plt.subplots(1,2,figsize=(10,5))
gdf_green_mapped_cens.plot(
ax = ax[0],
column = 'green_ratio',
edgecolor = 'okay',
linewidth = 0.5,
alpha = 0.9,
cmap = 'Greens')

gdf_pop_mapped_cens.plot(
ax = ax[1],
column = 'census_district_population',
edgecolor = 'okay',
linewidth = 0.5,
alpha = 0.9,
cmap = 'Reds')

ax[0].set_title('green_ratio')
ax[1].set_title('district_population')

This block of codes ends in the next determine:

Inexperienced space and inhabitants ranges in Vienna on the ranges of districts and census districts. Picture by the creator.

To compute the inexperienced space per capita, I’ll first merge the greenery and inhabitants knowledge frames within the following steps. I’ll accomplish that through the instance of census districts as a result of its increased spatial decision permits us to look at higher patterns (if any) rising. Make certain we don’t divide by zero and in addition observe widespread sense; let’s drop these areas which can be unpopulated.

gdf_green_pop_cens = 
gdf_green_mapped_cens.merge(gdf_pop_mapped_cens.drop(
columns = ['geometry', 'admin_area']), left_on = 'census_district_id',
right_on = 'census_district_id')[['census_district_id',
'green_area', 'census_district_population', 'geometry']]

gdf_green_pop_cens['green_area_per_capita'] =
gdf_green_pop_cens['green_area'] /
gdf_green_pop_cens['census_district_population']

gdf_green_pop_cens =
gdf_green_pop_cens[gdf_green_pop_cens['census_district_population']>0]

f, ax = plt.subplots(1,1,figsize=(10,7))

gdf_green_pop_cens.plot(
column = 'green_area_per_capita',
ax=ax,
cmap = 'RdYlGn',
edgecolor = 'okay',
linewidth = 0.5)

admin_district.to_crs(31282).plot(
ax=ax, colour = 'none', edgecolor = 'okay', linewidth = 2.5)

This block of codes ends in the next determine:

Every census district is coloured primarily based on its inexperienced space to capita rating. Picture by the creator.

Let’s tweak the visualization somewhat:

f, ax = plt.subplots(1,1,figsize=(11,7))

ax.set_title('Per-capita inexperienced space innthe census districts of Vienna',
fontsize = 18, pad = 30)

gdf_green_pop_cens.plot(
column = 'green_area_per_capita',
ax=ax,
cmap = 'RdYlGn',
edgecolor = 'okay',
linewidth = 0.5,
legend=True,
norm=matplotlib.colours.LogNorm(
vmin=gdf_green_pop_cens.green_area_per_capita.min(),
vmax=gdf_green_pop_cens.green_area_per_capita.max()), )

admin_district.to_crs(31282).plot(
ax=ax, colour = 'none', edgecolor = 'okay', linewidth = 2.5)

This block of codes ends in the next determine:

Every census district is coloured primarily based on its inexperienced space to capita rating. Picture by the creator.

And the identical for districts:

# compute the per-capita inexperienced space scores
gdf_green_pop_distr =
gdf_green_mapped_distr.merge(gdf_pop_mapped_distr.drop(columns =
['geometry', 'admin_area']), left_on = 'district_id', right_on =
'district_id')[['district_id', 'green_area', 'district_population',
'geometry']]

gdf_green_popdistr =
gdf_green_pop_distr[gdf_green_pop_distr.district_population>0]

gdf_green_pop_distr['green_area_per_capita'] =
gdf_green_pop_distr['green_area'] /
gdf_green_pop_distr['district_population']

# visualize the district-level map
f, ax = plt.subplots(1,1,figsize=(10,8))

ax.set_title('Per-capita inexperienced space within the districts of Vienna',
fontsize = 18, pad = 26)

gdf_green_pop_distr.plot(column = 'green_area_per_capita', ax=ax,
cmap = 'RdYlGn', edgecolor = 'okay', linewidth = 0.5, legend=True,
norm=matplotlib.colours.LogNorm(vmin=
gdf_green_pop_cens.green_area_per_capita.min(),
vmax=gdf_green_pop_cens.green_area_per_capita.max()), )

admin_city.to_crs(31282).plot(ax=ax,
colour = 'none', edgecolor = 'okay', linewidth = 2.5)

This block of codes ends in the next determine:

Every district is coloured primarily based on its inexperienced space to capita rating. Picture by the creator.

Whereas the numerous traits are clear — outer rim, extra greenspace for everybody, built-in downtown, reversed. Nonetheless, these two plots, particularly the extra detailed one on the extent of census districts, clearly present a variance within the quantity of inexperienced house individuals get pleasure from within the completely different areas. Additional analysis and incorporating extra knowledge sources, as an illustration, on land use, may assist clarify higher why these areas are increased in inexperienced space or inhabitants. For now, let’s get pleasure from this map and hope everyone finds the correct amount of greenery of their house!

# merging the greenery, inhabitants and monetary knowledge
gdf_district_green_pip_inc =
gdf_green_pop_distr.merge(gdf_inc_mapped_distr.drop(columns =
['geometry']))

Visualize the connection between the monetary and the greenery dimensions:

f, ax = plt.subplots(1,1,figsize=(6,4))

ax.plot(gdf_district_green_pip_inc.district_average_income,
gdf_district_green_pip_inc.green_area_per_capita, 'o')

ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('district_average_income')
ax.set_ylabel('green_area_per_capita')

The results of this code block is the next scatter plot:

Constracting the typical web earnings and the inexperienced space per capita ratio within the districts of Vienna. Picture by the creator.

At first look, the scatterplot doesn’t notably set a powerful case for the financials figuring out individuals’s entry to inexperienced areas. Truthfully, I’m a bit shocked by these outcomes — nonetheless, in gentle of Vienna’s acutely aware, long-standing efforts in greening up their metropolis, it might be why we don’t see any main development right here. To verify, I additionally checked the correlations between these two variables:

print(spearmanr(gdf_district_green_pip_inc.district_average_income, gdf_district_green_pip_inc.green_area_per_capita))

print(pearsonr(gdf_district_green_pip_inc.district_average_income, gdf_district_green_pip_inc.green_area_per_capita))

Because of the heavy-tailed distribution of the monetary knowledge, I’d take the Spearman (0.13) correlation extra critically right here, however even the Pearson correlation (0.30) implies a comparatively weak development, aligning with my earlier observations.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button