Map-Matching for Pace Prediction | by João Paulo Figueira | Jan, 2024


How briskly will you drive?

Picture by Julian Hochgesang on Unsplash

When planning future automobile routes, you belief the digital map suppliers for correct pace predictions. You do that when selecting your cellphone as much as put together for a automobile journey or, in knowledgeable setting, when planning routes in your automobile fleet. The forecasted speeds are a vital part of the journey’s price, particularly as they’re one of many basic drivers of power consumption in electrical and combustion autos.

The digital mapping service suppliers acquire data from stay site visitors and historic information to estimate how briskly you may drive alongside a selected highway at any given time. With this knowledge and clever algorithms, they estimate how shortly a median automobile will journey by means of the deliberate route. Some companies will settle for every automobile’s options to tune the route path and the anticipated journey instances.

However what about particular autos and drivers like yours? Do these predictions apply? Your automobiles and drivers might need specific necessities or habits that don’t match the standardized forecasts. Can we do higher than the digital map service suppliers? We’ve an opportunity in the event you hold an excellent file of your historic telematics knowledge.

On this article, we’ll enhance the standard of pace predictions from digital map suppliers by leveraging the historic pace profiles from a telematics database. This database comprises information of previous journeys that we use to modulate the usual pace inferences from a digital map supplier.

Central to that is map-matching, the method by which we “snap” the noticed GPS places to the underlying digital map. This correcting step permits us to convey the GPS measurements consistent with the map’s highway community illustration, thus making all location sources comparable.

The Street Community

A highway community is a mathematical idea that helps most digital mapping functions. Normally applied as a directed multi-graph, every node represents a identified geospatial location, often some noteworthy landmark resembling an intersection or a defining level on a highway bend, and the connecting directed edges characterize straight-line paths alongside the highway. Determine 1 under illustrates the idea.

Determine 1 — The diagram above exhibits the simplified illustration of a highway section as represented by a digital map. Every node, represented by a pink dot, corresponds to a identified geospatial location outlined by latitude, longitude, and altitude coordinates. Every connecting line is an edge, and digital maps use these to characterize the roads (the journey route is omitted right here). Once we ask a digital map supplier for instructions, we get a protracted sequence of those nodes and edges and an estimate of the time it is going to take to journey such roads. (Picture supply: Writer)

Once we request a route from a digital map supplier, we get a sequence of highway community nodes and their connecting edges. The service additionally offers the estimated journey instances and corresponding speeds between all pairs of nodes (in some instances, the pace estimates cowl a spread of nodes). We get the full journey length by including all of the partial instances collectively.

If we get higher estimates for these instances, we will even have higher pace estimates and a greater route prediction total. The supply for these higher estimates is your historic telematics knowledge. However realizing the historic speeds is simply part of the method. Earlier than we will use these speeds, we should be sure that we will challenge them onto the digital map, and for this, we use map-matching.


Map-matching tasks sequences of GPS coordinates sampled from a shifting object’s path to an present highway graph. The matching course of makes use of a Hidden Markov Model to map the sampled places to the almost certainly graph edge sequence. In consequence, this course of produces each the sting projections and the implicit node sequence. You’ll be able to learn a extra detailed clarification in my earlier article on map matching:

After studying the above article, you’ll perceive that the Valhalla map-matching algorithm tasks the sampled GPS places into highway community edges, to not the nodes. The service may return the matched poly-line outlined by way of the highway community nodes. So, we will get each edge projections and the implicit node sequence.

However, when retrieving a route plan from the identical supplier, we additionally get a sequence of highway community nodes. By matching these nodes to the beforehand map-matched ones, we will overlay the identified telematics data to the newly generated route and thus enhance the time and pace estimates with precise knowledge.

Earlier than utilizing the map-matched places to deduce precise speeds, we should challenge them to the nodes and modify the identified journey instances, as illustrated in Determine 2 under.

Determine 2 — The diagram above illustrates the problem of mapping the implicit speeds of sampled GPS places, displayed as inexperienced dots, to the speeds between the map nodes, represented as pink dots. The orange diamonds characterize the map-matched sampled GPS places. On this downside, we all know the common speeds between the inexperienced dots and wish to infer the common speeds between the pink dots. (Picture supply: Writer)

As a prerequisite, we should appropriately sequence each units of places, the nodes, and the map matches. This course of is depicted in Determine 2 above, the place the map matches, represented by the orange diamonds, are sequenced together with the highway community nodes, represented as pink dots. The journey sequence is clearly from left to proper.

We assume the time variations between the GPS places are the identical because the map-matched ones. This assumption, illustrated by Determine 3 under, is crucial as a result of there isn’t a technique to infer what impact in time the map matching has. This assumption simplifies the calculation whereas maintaining an excellent approximation.

Determine 3 — We assume that the time variations between consecutive samples are the identical because the corresponding map-matched ones. (Picture supply: Writer)

Now that we all know the time variations between consecutive orange diamonds, our problem is to make use of this data to deduce the time variations between consecutive pink dots (nodes). Determine 4 under exhibits the connection between the 2 sequences of time variations.

Determine 4 — The diagram above helps us perceive how we interpolate the time variations between highway community nodes (pink dots) utilizing the time variations from the sequence of map-matched places (orange diamonds). (Picture supply: Writer)

We will safely assume that the common speeds between consecutive orange diamonds are fixed. This assumption is crucial for what comes subsequent. However earlier than we proceed, let’s outline some terminology. We’ll use some simplifications because of Medium’s typesetting limitations.

We have to cope with two basic portions: distances and timespans. Utilizing Determine 4 above as a reference, we outline the gap between the orange diamond one and pink dot one as d(n1, m1). Right here, the letter “m” stands for “map-matched,” and the letter “n” stands for node. Equally, the corresponding timespan is t(n1, m1), and the common pace is v(n1, m1).

Allow us to deal with the primary two nodes and see how we will derive the common pace (and the corresponding timespan) utilizing the identified timespans from orange diamonds one to 4. The common pace of journey between the primary two map-matched places is thus:

As a result of the common pace is fixed, we will now compute the primary timespan.

The second timespan is simply t(m2, m3). For the ultimate interval, we will repeat the method above. The entire time is thus:

We should repeat this course of, adapting it to the sequence of nodes and map matches to calculate the projected journey instances between all nodes.

Now that we’ve got seen how one can challenge measured speeds onto a digital map let’s see the place to get the information.

Telematics Database

This text makes use of a telematics database to deduce unknown highway section common speeds. All of the geospatial knowledge within the database is already map-matched to the underlying digital map. This attribute helps us match future service-provided routes to the identified or projected highway section speeds utilizing the abovementioned course of.

Right here, we’ll use a tried-and-true open-source telematics database I’ve been exploring recently and offered in a beforehand revealed article, the Extended Vehicle Energy Dataset (EVED), licensed below Apache 2.0.

We develop the answer in two steps: knowledge preparation and prediction. Within the knowledge preparation step, we traverse all identified journeys within the telematics database and challenge the measured journey instances to the corresponding highway community edges. These computed edge traversal instances are then saved in one other database utilizing most decision H3 indices for quicker searches throughout exploration. On the finish of the method, we’ve got collected traversal time distributions for the identified edges, data that can permit us to estimate journey speeds within the prediction section.

The prediction section requires a supply route expressed as a sequence of highway community nodes, resembling what we get from the Valhalla route planner. We question every pair of consecutive nodes’ corresponding traversal time distribution (if any) from the pace database and use its imply (or median) to estimate the native common pace. By including all edge estimates, we get the supposed outcome, the anticipated complete journey time.

Information Preparation

To arrange the information and generate the reference time distribution database, we should iterate by means of all of the journeys within the supply knowledge. Luckily, the supply database makes this simple by readily figuring out all of the journeys (see the article above).

Allow us to take a look at the code that prepares the sting traversal instances.

Determine 5— The code above exhibits the primary loop for the information preparation script. (Picture supply: Writer)

The code in Determine 5 above exhibits the primary loop of the information preparation code. We use the beforehand created EVED database and save the output knowledge to a brand new pace database. Every file is a time traversal pattern for a single highway community edge. For a similar edge, a set of those samples makes up for a statistical time distribution, for which we calculate the common, median, and different statistics.

The decision on line 5 retrieves a listing of all of the identified journeys within the supply database as triplets containing the trajectory identifier (the desk sequential identifier), the automobile identifier, and the journey identifier. We want these final two objects to retrieve the journey’s indicators, as proven in line 10.

Strains 10 to 16 comprise the code that retrieves the journey’s trajectory as a sequence of latitude, longitude, and timestamps. These places don’t essentially correspond to highway community nodes; they are going to principally be projections into the sides (the orange diamonds in Determine 2).

Now, we will ask the Valhalla map-matching engine to take these factors and return a poly-line with the corresponding highway community node sequence, as proven in strains 18 to 25. These are the nodes that we retailer within the database, together with the projected traversal instances, which we derive within the last strains of the code.

The traversal time projection from the map-matched places to the node places happens in two steps. First, line 27 creates a “compound trajectory” object that merges the map-matched places and the corresponding nodes within the journey sequence. The item shops every map-matched section individually for later becoming a member of. Determine 6 under exhibits the thing constructor (source file).

Determine 6 — The compound trajectory constructor receives the map-matched trajectory and the map nodes as separate arrays of latitudes and longitudes. The code merges each trajectories into a listing of segments and later converts these to a trajectory checklist. (Picture supply: Writer)

The compound trajectory constructor begins by merging the sequence of map-matched factors to the corresponding highway community nodes. Referring to the symbols in Determine 2 above, the code combines the orange diamond sequence with the pink dot sequence in order that they hold the journey order. In step one, listed in Determine 7 under, we create a listing of sequences of orange diamond pairs with any pink dots in between.

Determine 7 — The above perform merges the map-matched trajectory factors with the corresponding highway community node sequence. The perform tries to assign sequential nodes, if any, between every pair of trajectory factors. A listing collects all merged sequences for a last consolidation step. (Picture supply: Writer)

As soon as merged, we convert the trajectory segments to node-based trajectories, eradicating the map-matched endpoints and recomputing the traversal instances. Determine 8 under exhibits the perform that computes the equal between-node traversal instances.

Determine 8 — The conversion from a section to a trajectory requires calculating the equal instances between highway community nodes. (Picture supply: Writer)

Utilizing the symbology of Determine 2, the code above makes use of the traversal instances between two orange diamonds and calculates the instances for all sub-segment traversals, particularly between node-delimited ones. This fashion, we will later reconstruct all between-node traversal instances by means of easy addition.

The ultimate conversion step happens on line 28 of Determine 5 after we convert the compound trajectory to a easy trajectory utilizing the features listed in Determine 9 under.

Determine 9 — The ultimate reconstruction step concatenates the projected trajectory segments. Word how arrays with greater than two parts are clipped. The clipped parts correspond to the map-matched positions and are thus eliminated. Arrays with solely two parts have map-matched positions that coincide with highway community nodes. (Picture supply: Writer)

The ultimate step of the code in Determine 5 (strains 30–32) is to save lots of the computed edge traversal instances to the database for posterior use.

Information High quality

How good is the information that we simply ready? Does the EVED permit for good pace predictions? Sadly, this database was not designed for this objective so that we’ll see some points.

The primary difficulty is the variety of single-edge information within the last database, on this case, just a little over two million. The entire variety of rows is over 5.6 million, so the unusable single-edge information characterize a large proportion of the database. Nearly half the rows are from edges with ten or fewer information.

The second difficulty is journeys with very low frequencies. When querying an ad-hoc journey, we might fall into areas of very low density, the place edge time information are scarce or nonexistent. In such circumstances, the prediction code tries to compensate for the information loss utilizing a easy heuristic: assume the identical common pace as within the final edge. For bigger highway sections, and as we see under, we might even copy the information from the Valhalla route predictor.

The underside line is that a few of these predictions will probably be dangerous. A greater use case for this algorithm could be to make use of a telematics database from fleets that often journey by means of the identical routes. It might be even higher to get extra knowledge for a similar routes.


To discover this time-prediction enhancement algorithm, we’ll use two totally different scripts: one interactive Streamlit utility that permits you to freely use the map and an analytics script that tries to evaluate the standard of the anticipated instances by evaluating them to identified journey instances in a LOOCV kind of method.

Interactive Map

You run the interactive utility by executing the next command line on the challenge root:

streamlit run

The interactive application permits you to specify endpoints of a route for Valhalla to foretell. Determine 10 under exhibits what the consumer interface seems to be like.


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button