The place the assumptions behind the two-tower mannequin structure break — and the best way to transcend
Two-tower models are among the many commonest architectural design decisions in fashionable recommender techniques — the important thing concept is to have one tower that learns relevance, and a second, shallow, tower that learns observational biases akin to place bias.
On this submit, we’ll take a more in-depth take a look at two assumptions behind two-tower fashions, specifically:
- the factorization assumption, i.e. the speculation that we are able to merely multiply the possibilities computed by the 2 towers (or add their logits), and
- the positional independence assumption, i.e. the speculation that the one variable that determines place bias is the place of the merchandise itself, and never the context during which it’s impressed.
We’ll see the place each of those assumptions break, and the best way to transcend these limitations with newer algorithms such because the MixEM mannequin, the Dot Product mannequin, and XPA.
Let’s begin with a really transient reminder.
Two-tower fashions: the story thus far
The first studying goal for the rating fashions in recommender techniques is relevance: we would like the mannequin to foretell the absolute best piece of content material given the context. Right here, context merely means the whole lot that we’ve realized concerning the consumer, for instance from their earlier engagement or search histories, relying on the appliance.
Nonetheless, rating fashions normally exhibit sure statement biases, that’s, the tendency for customers to have interaction roughly with an impression relying on the way it was introduced to them. Probably the most distinguished statement bias is place bias — the tendency of customers to have interaction extra with gadgets which are proven first.
The important thing concept in two-tower fashions is to coach two “towers”, that’s, neural networks, in parallel, the primary tower for studying relevance, and…