AI

Discovering Patterns in Comfort Retailer Areas with Geospatial Affiliation Rule Mining | by Elliot Humphrey | Apr, 2023

[ad_1]

Picture by Matt Liu on Unsplash

When strolling round Tokyo you’ll usually go quite a few comfort shops, regionally often known as “konbinis”, which is smart since there are over 56,000 comfort shops in Japan. Usually there can be completely different chains of comfort retailer situated very shut to 1 one other; it’s not unusual to see shops across the nook from one another or on reverse sides of the road. Given Tokyo’s inhabitants density, it’s comprehensible for competing companies to be pressured nearer to one another, nevertheless, might there be any relationships between which chains of comfort shops are discovered close to one another?

The objective can be to gather location information from quite a few comfort retailer chains in a Tokyo neighbourhood, to grasp if there are any relationships between which chains are co-located with one another. To do that would require:

  • Means to question the placement of various comfort shops in Tokyo, in an effort to retrieve every retailer’s identify and site
  • Discovering which comfort shops are co-located with one another inside a pre-defined radius
  • Utilizing the info on co-located shops to derive affiliation guidelines
  • Plotting and visualising outcomes for inspection

Let’s start!

For our use case we need to discover comfort shops in Tokyo, so first we’ll have to perform a little homework on what are the frequent retailer chains. A fast Google search tells me that the primary shops are FamilyMart, Lawson, 7-Eleven, Ministop, Every day Yamazaki and NewDays.

Now we all know what we’re looking out, lets go to OSMNX; an incredible Python bundle for looking out information in OpenStreetMap (OSM). In accordance the OSM’s schema, we must always have the ability to discover the shop identify in both the ‘model:en’ or ‘model’ area.

We are able to begin by importing some helpful libraries for getting our information, and defining a operate to return a desk of areas for a given comfort retailer chain inside a specified space:

import geopandas as gpd
from shapely.geometry import Level, Polygon
import osmnx
import shapely
import pandas as pd
import numpy as np
import networkx as nx

def point_finder(place, tags):
'''
Returns a dataframe of coordinates of an entity from OSM.

Parameters:
place (str): a location (i.e., 'Tokyo, Japan')
tags (dict): key worth of entity attribute in OSM (i.e., 'Title') and worth (i.e., amenity identify)
Returns:
outcomes (DataFrame): desk of latitude and longitude with entity worth
'''

gdf = osmnx.geocode_to_gdf(place)
#Getting the bounding field of the gdf
bounding = gdf.bounds
north, south, east, west = bounding.iloc[0,3], bounding.iloc[0,1], bounding.iloc[0,2], bounding.iloc[0,0]
location = gdf.geometry.unary_union
#Discovering the factors throughout the space polygon
level = osmnx.geometries_from_bbox(north,
south,
east,
west,
tags=tags)
level.set_crs(crs=4326)
level = level[point.geometry.within(location)]
#Ensuring we're coping with factors
level['geometry'] = level['geometry'].apply(lambda x : x.centroid if kind(x) == Polygon else x)
level = level[point.geom_type != 'MultiPolygon']
level = level[point.geom_type != 'Polygon']

outcomes = pd.DataFrame({'identify' : listing(level['name']),
'longitude' : listing(level['geometry'].x),
'latitude' : listing(level['geometry'].y)}
)

outcomes['name'] = listing(tags.values())[0]
return outcomes

convenience_stores = place_finder(place = 'Shinjuku, Tokyo',
tags={"model:en" : " "})

We are able to go every comfort retailer identify and mix the outcomes right into a single desk of retailer identify, longitude and latitude. For our use case we will deal with the Shinjuku neighbourhood in Tokyo, and see what the abundance of every comfort retailer appears like:

Frequency depend of comfort shops. Picture by writer.

Clearly FamilyMart and 7-Eleven dominate within the frequency of shops, however how does this look spatially? Plotting geospatial information is fairly simple when utilizing Kepler.gl, which features a good interface for creating visualisations which may be saved as html objects or visualised immediately in Jupyter notebooks:

Location map of Shinjuku comfort shops, color coded by retailer identify. Picture by writer.
Location map of Shinjuku comfort shops, color coded density in a two minute strolling radius (168m). picture by writer.

Now that we’ve got our information, the following step can be to search out nearest neighbours for every comfort retailer. To do that, we can be utilizing Scikit Be taught’s ‘BallTree’ class to search out the names of the closest comfort shops inside a two minute strolling radius. We aren’t eager about what number of shops are thought of nearest neighbours, so we’ll simply have a look at which comfort retailer chains are throughout the outlined radius.

# Convert location to radians
areas = convenience_stores[["latitude", "longitude"]].values
locations_radians = np.radians(areas)

# Create a balltree to look areas
tree = BallTree(locations_radians, leaf_size=15, metric='haversine')

# Discover nearest neighbours in a 2 minute strolling radius
is_within, distances = tree.query_radius(locations_radians, r=168/6371000, count_only=False, return_distance=True)

# Exchange the neighbour indices with retailer names
df = pd.DataFrame(is_within)
df.columns = ['indices']
df['indices'] = [[val for val in row if val != idx] for idx, row in enumerate(df['indices'])]

# create short-term index column
convenience_stores = convenience_stores.reset_index()
# set short-term index column as index
convenience_stores = convenience_stores.set_index('index')
# create index-name mapping
index_name_mapping = convenience_stores['name'].to_dict()

# change index values with names and take away duplicates
df['indices'] = df['indices'].apply(lambda lst: listing(set(map(index_name_mapping.get, set(lst)))))
# Append again to authentic df
convenience_stores['neighbours'] = df['indices']

# Establish when a retailer has no neighbours
convenience_stores['neighbours'] = [lst if lst else ['no-neighbours'] for lst in convenience_stores['neighbours']]

# Distinctive retailer names
unique_elements = set([item for sublist in convenience_stores['neighbours'] for merchandise in sublist])
# Depend every shops frequency within the set of neighbours per location
counts = [dict(Counter(row)) for row in convenience_stores['neighbours']]

# Create a brand new dataframe with the counts
output_df = pd.DataFrame(counts).fillna(0)[sorted(unique_elements)]

If we need to enhance the accuracy of our work, we might change the haversine distance measure for one thing extra correct (i.e., strolling instances calculated utilizing networkx), however we’ll hold issues easy.

This may give us a DataFrame the place every row corresponds to a location, and a binary depend of which comfort retailer chains are close by:

Pattern DataFrame of comfort retailer nearest neighbours for every location. Picture by writer.

We now have a dataset able to carry out affiliation rule mining. Utilizing the mlxtend library we will derive affiliation guidelines utilizing the Apriori algorithm. There’s a minimal assist of 5%, in order that we will look at solely the foundations associated to frequent occurrences in our dataset (i.e., co-located comfort retailer chains). We use the metric ‘raise’ when deriving guidelines; raise is the ratio of the proportion of areas that include each the antecedent and consequent relative to the anticipated assist below the idea of independence.

from mlxtend.frequent_patterns import association_rules, apriori

# Calculate apriori
frequent_set = apriori(output_df, min_support = 0.05, use_colnames = True)
# Create guidelines
guidelines = association_rules(frequent_set, metric = 'raise')
# Kind guidelines by the assist worth
guidelines.sort_values(['support'], ascending=False)

This offers us the next outcomes desk:

Affiliation guidelines for comfort retailer information. Picture by writer.

We’ll now interpret these affiliation guidelines to make some excessive stage takeaway learnings. To interpret this desk its finest to learn extra about Affiliation Guidelines, utilizing these hyperlinks:

Okay, again to the desk.

Assist is telling us how usually completely different comfort retailer chains are literally discovered collectively. Due to this fact we will say that 7-Eleven and FamilyMart are discovered collectively in ~31% of the info. A raise over 1 signifies that the presence of the antecedent will increase the chance of the ensuing, suggesting that the areas of the 2 chains are partially dependent. However, the affiliation between 7-Eleven and Lawson reveals a better raise however with a decrease confidence.

Every day Yamazaki has a low assist close to our cutoff and reveals a weak relationship with the placement of FamilyMart, given by a raise barely above 1.

Different guidelines are referring to combos of comfort shops. For instance when a 7-Eleven and FamilyMart are already co-located, there’s a excessive raise worth of 1.42 that means a powerful affiliation with Lawson.

If we had simply stopped at discovering the closest neighbours for every retailer location, we might not have been in a position to decide something concerning the relationships between these shops.

An instance of why geospatial affiliation guidelines may be insightful for companies is in figuring out new retailer areas. If a comfort retailer chain is opening a brand new location, affiliation guidelines can assist to establish which shops are prone to co-occur.

The worth on this turns into clear when tailoring advertising campaigns and pricing methods, because it supplies quantitative relationships about which shops are prone to compete. Since we all know that FamilyMart and 7-Eleven usually co-occur, which we reveal with affiliation guidelines, it could make sense for each of those chains to pay extra consideration to how their merchandise compete relative to different chains similar to Lawson and Every day Yamazaki.

On this article we’ve got created geospatial affiliation guidelines for comfort retailer chains in a Tokyo neighbourhood. This was accomplished utilizing information extraction from OpenStreetMap, discovering nearest neighbour comfort retailer chains, visualising information on maps, and creating affiliation guidelines utilizing an Apriori algorithm.

Thanks for studying!

[ad_2]

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button