Use Cases for Learning-based Recommendation Systems

I had to build a recommendation system, so I did some research on use cases as part of the preparation. Netflix really is incredible.

Types of Recommendation Systems

With / without personalization
- When personalizing, two main types of data are used: content information from user profiles and item genres, and user behavior history
Start with recommendations from user profiles, then switch to behavior history-based recommendations once data accumulates
- The latter tends to better reflect user preferences

Popularity Ranking / Newest First

Advantages

High effectiveness relative to low implementation cost
Effective when item turnover is high

Disadvantages

When item turnover is low, the same items keep being displayed
- In such cases, use trending/rising rankings as a workaround?

Industries

Almost all services

Browsing (Purchase) History Display

Advantages

High effectiveness relative to low implementation cost
Effective when users frequently re-browse (re-purchase) items they have viewed (purchased) before

Disadvantages

Not very useful for infrequently purchased items (electronics, etc.)
Consumables must be displayed at the right time to be effective

Industries

Video, music sites

Direct Recommendations to Users

Recommendations based on user behavior history and profile
Often displays items similar to the last viewed/purchased item
- The similar item display system above can be used

Case Studies from Other Companies

Netflix

Over 80% of watched content comes through recommendations
Originally an online DVD rental company

Recommendation System Features

Information Used

Content information -> User behavior information -> User ratings (5-level) -> User ratings (2-level)
- Content information (genre, cast, director, etc.)
  - Easy to implement
  - Recommendations lacked persuasiveness (were not appropriate)
- User behavior information (which titles a user rented together)
  - Better performance than content information
  - Actual recommendation quality could not be gauged
  - Could not handle cases where a rental was a "miss"
- User ratings (5-level)
  - Could make recommendations reflecting content preferences
  - (Individual bias in user rating methods)
  - (Difficult to incorporate into the system / solve with regression?)
- User ratings (2-level)
  - Recommendations reflecting content preferences
  - Easy to display as "XX% likely to enjoy"
  - Easy to incorporate into the system

Cold Start Problem

Use popularity ranking -> Ask users to select favorite titles during registration + use popularity ranking if they skip

Display Method

Rows represent recommendation categories (new releases, genres, popular titles, etc.), columns represent recommendation rank
- Makes it easier for users to explore content
  - Scroll vertically to find interesting genres, scroll horizontally to find interesting titles
Filter recommendation content based on previously watched titles and existing ratings

References

Airbnb

Accommodation and vacation rental service
The following is from a February 2019 article
Has a two-stage service: accommodation plans and experience plans at the destination

Recommendation System Features

Models (3 stages from introduction to present)

Adjust model complexity according to data volume

1. Initial Model (Strong Baseline)

Few recommendation candidates, data collection just started
Data Collection
- Rank randomly every day to collect data
- Collect logs of users who booked to rank candidates
Labeling
- Collected 50,000 training examples labeled as booked (positive) and clicked but not booked (negative)
Features
- 25 dimensions: experience plan details/data, reviews, number of bookings, click-through rate, etc.
- When the service is growing rapidly, convert counts to ratios to prevent model breakdown
- Since it is experience plan information -> prediction, the same ranking is provided to all users (users create subsets by setting search criteria / rankings are updated daily)
Model
- GBDT binary classification
- No feature scaling required, handles missing values directly
Evaluation
- Rank according to model scores and evaluate with AUC and nDCG
Model Analysis
- Observe how scoring changes when all values except one feature are fixed - (Shapley value-based feature contribution calculation?)
Validation
- A/B test comparing with rule-based random ranking
- +13% in bookings

2. Personalization Model

Goal: Quickly capture user interests and place appropriate content at the top of search results
Two different types of personalization:
- Information from booked accommodation plans
  - Since accommodation plan booking -> experience plan booking, accommodation plan information can be used
- User click information
  - Interest level in specific categories over the past 15 days (click rate, weighted sum of categories from clicked plans, days since last click, etc.)
  - User available time slots (proportion of time slots from plans the user clicked)
Ranking Model Training
- 50 ranking features from 250k labeled samples
- Care taken to avoid leakage
  - Time-series-preserving features
  - Data from users who viewed only one plan (= high probability of booking that plan) is not used
- Train two models: one with personalization features for logged-in users, and one without personalization for logged-out traffic data
  - Personalization features depend on data; useless without it
Ranking Model Testing
- A/B test: stage 1 vs. stage 2
- +7.9% improvement
Implementation Details
- Created a table keyed by UserID; logged-out users have UserID := 0
- All rankings computed offline daily, but limited to the most active 1 million users due to weight
  - Up to about 1 day of delay
- This model was used to measure the gain of the personalized model for transitioning to Stage 3

3. Online Scoring

Features
- In addition to previous ones, also uses search plan settings (query feature information), browser information (language, country) for scoring
Ranking Model
- Over 2 million labeled data points
- 90 features
- Two GBDT models
  - Model for logged-in users using experience plan features, query features, and user features
  - Model for logged-out users using experience plan features, query features, and traffic data
Advantages of Online Scoring Model
- No need to pre-compute personalized rankings; can recommend for many use cases
Ranking Model Testing
- A/B test: stage 2 vs. stage 3, +5.1%
Implementation Details
- Consists of the following 3 infrastructure components:
  - Obtain model inputs from various sources in real time
  - Deploy the model to production
  - Perform model scoring
- Storage methods for features varied by infrastructure component

4. Using Business Rules

Improve service quality
Changed the objective function from (+1=booking, -1=clicked but not booked) to learning with weights (low rating=low score, high rating=high score)
- A/B test confirmed improvement in booking quality
Cold Start Problem
- Discover new users and recommend via ranking
Enhance diversity within Top 8 results
- Effective when traffic information is scarce
When users visit the web page but do not search
- Considered to have a different purpose; selected Top 18 by ranking score, re-ranked by click rate, which proved effective

Ranking Monitoring and Explanation

Track general trends of the ranking algorithm and confirm they are desirable trends - Cheap plan != great plan, but cheap plans were being recommended more easily - Removed "price" from the model features, with no issues
Track the rank of a specific experience plan in the market and the features used in the ML model
Ranking trends for specific groups (e.g., 5-star experience plans)
Particularly useful for market managers to send appropriate feedback to hosts

Existing Challenges

Loss Function
- Pairwise loss
Labeling
- Regression based on score functions rather than 0 or 1
Real-time Signals
- Minute-level history rather than daily
Addressing positioning bias in training data
Testing various models beyond GBDT

Summary

References

https://medium.com/airbnb-engineering/machine-learning-powered-search-ranking-of-airbnb-experiences-110b4b1a0789

Gunosy

Specifically about the news app
Over 10,000 new articles per day
Needs fast and stable recommendations from a large volume of articles + behavior logs

Recommendation System Features (News-specific)

News value decays over time
- By the time behavior logs accumulate, the news value is low
Users' interest cycles change rapidly
Word match alone does not guarantee quality
- "XX passed away", "Earthquake in XX prefecture", "XX won at YY"

Recommendation System Features

Model (Custom Matching-based Model)

News articles -> Embedding in vector space
Average of vectors from the M most recently viewed news articles -> User vector
- Each time a new article is clicked, remove the oldest article, add the new article's vector, and recompute the average
Sort and recommend articles by density around the user's vector as a score
- This is the core and the detailed scoring is proprietary
- Scores are configured to decay over time
  - Previously viewed articles are also decayed

Known Issues

Functional Verification

Limited data in development environments
Difficult to verify list comprehensiveness due to personalization

Offline Experiments

A/B testing for parameter tuning is difficult
Does not necessarily match online experiments

Handling Transitions

Tends to rely on click-based recommendations, but would also like to recommend based on article viewing transitions
Practical implementation requires careful data structure design

References

Instagram

A system involving Facebook AI

Recommendation System Features

User Account Embedding

Uses content information liked by user accounts to embed account IDs like Word2vec
Distance is calculated using cosine distance or dot product
Uses FAISS, developed by Facebook, for nearest neighbor search
Trains a classifier to predict the topic set of an account using this embedding
- Narrows down the content presented to the account

Ranking Model

The ranking model uses a small model distilled (approximated) from a large model
Trained to optimize nDCG loss

Candidate Generation

Finds similar accounts using embeddings of accounts the user liked or saved, then finds media posted/engaged by those accounts
Identifies thousands of candidates and narrows down to 500 for the next stage

Ranking Candidates

A small model approximating two other models narrows from 500 to 150 candidates using minimal features
A lightweight NN using full dense features narrows from 150 to 50 candidates
A NN using all features narrows to 25 candidates
The NN predicts positive and negative behaviors such as liking, saving, and reducing display, and ultimately determines signal importance through a weighted linear combination for content
- Produces the ranking
Simple heuristic rules were added to increase diversity
- Penalize posts from the same author by lowering their rank
- Prevent multiple posts from the same author from appearing

References

https://ai.facebook.com/blog/powered-by-ai-instagrams-explore-recommender-system/

Spotify

Music has far more items than movies
Content duration is short
Replay frequency is high

Recommendation System Features

Model (Multi-Armed Bandit)

Uses the multi-armed bandit framework to balance exploration and exploitation
- Exploitation: Recommend based on previously selected music and podcasts
  - Identify and deliver favorite content
- Exploration: Reveal uncertain user responses to unknown content
Training and inference are completed within the algorithm without requiring A/B tests or randomized experiments

System Development and Monitoring

Unified model/validation libraries with TensorFlow
Accelerated model development with Kubeflow
Alerts at specific thresholds

References

https://engineering.atspotify.com/2020/01/16/for-your-ears-only-personalizing-spotify-home-with-machine-learning/

Cookpad

Recipe recommendation, push notifications, recipe author follow recommendations, search keywords, etc.

Recommendation System (Development) Features

KPI Design Aligned with Business Model

E-commerce sites: revenue, number of purchasers; Ad sites: impressions, click rate, etc. as goal design
- Cookpad-specific considerations

Difficulty of Integrating Recommendation Systems into Existing Systems

Ease of data acquisition and maintenance
Large impact = large scope when negative impact occurs

References

https://logmi.jp/tech/articles/321731

Retty

Restaurant (reviews, reservations, etc.) web and app service
Pseudo-restaurants, popular restaurant recommendations, extraction of useful reviews and photos

Recommendation System Features

Model (Algorithm Selection)

Similar restaurants -> Content-based filtering
Popular restaurants -> Popularity ranking
Based on past trends -> (Item) Collaborative filtering
Random recommendations -> Mixed in at a random proportion for evaluation

Presented Information

Display recommendation reasons for items
- Eliminate distrust (not operator-driven ads)
- Appeal selling points (make appeal points clear)
- Introduce comparison elements (make item comparison easier)

Exploration

Use of bandit algorithms
- Maximize cumulative reward while probabilistically presenting items
Use ElasticNet to avoid overfitting

References

https://engineer.retty.me/entry/2018/12/18/103810

ZOZO

Recommending similar items from images sent by users

Recommendation System Features

Model

Image -> Object detection -> 512-dimensional feature vector -> Approximate nearest neighbor search

References

https://techblog.zozo.com/entry/zozotown-item-recommend-infra-arch

Mercari

Image search

Recommendation System Features

Image -> Feature vectorization -> Approximate nearest neighbor search

Others

BigQuery ML
- https://cloud.google.com/bigquery-ml/docs/bigqueryml-mf-implicit-tutorial
Recommendation AI
- https://cloud.google.com/recommendations?hl=ja

Types of Recommendation Systems​

Popularity Ranking / Newest First​

Advantages​

Disadvantages​

Industries​

Browsing (Purchase) History Display​

Advantages​

Disadvantages​

Industries​

Similar Item Display​

Advantages​

Disadvantages​

Industries​

Direct Recommendations to Users​

Case Studies from Other Companies

Netflix​

Recommendation System Features​

Information Used​

Cold Start Problem​

Display Method​

Other Topics​

Trends​

Deep Learning​

Causality​

Bandits & Reinforcement Learning​

Fairness​

Experience Personalization​

References​

Airbnb​

Recommendation System Features​

Models (3 stages from introduction to present)​

1. Initial Model (Strong Baseline)​

2. Personalization Model​

3. Online Scoring​

4. Using Business Rules​

Ranking Monitoring and Explanation​

Existing Challenges​

Summary​

References​

Gunosy​

Recommendation System Features (News-specific)​

Recommendation System Features​

Model (Custom Matching-based Model)​

Known Issues​

Functional Verification​

Offline Experiments​

Handling Transitions​

References​

Instagram​

Recommendation System Features​

User Account Embedding​

Ranking Model​

Candidate Generation​

Ranking Candidates​

References​

Spotify​

Recommendation System Features​

Model (Multi-Armed Bandit)​

System Development and Monitoring​

References​

Cookpad​

Recommendation System (Development) Features​

KPI Design Aligned with Business Model​

Difficulty of Integrating Recommendation Systems into Existing Systems​

References​

Retty​

Recommendation System Features​

Model (Algorithm Selection)​

Presented Information​

Exploration​

References​

ZOZO​

Recommendation System Features​

Model​

References​

Mercari​

Recommendation System Features​

Others​

Types of Recommendation Systems

Popularity Ranking / Newest First

Advantages

Disadvantages

Industries

Browsing (Purchase) History Display

Advantages

Disadvantages

Industries

Similar Item Display

Advantages

Disadvantages

Industries

Direct Recommendations to Users

Netflix

Recommendation System Features

Information Used

Cold Start Problem

Display Method

Other Topics

Trends

Deep Learning

Causality

Bandits & Reinforcement Learning

Fairness

Experience Personalization

References

Airbnb

Recommendation System Features

Models (3 stages from introduction to present)

1. Initial Model (Strong Baseline)

2. Personalization Model

3. Online Scoring

4. Using Business Rules

Ranking Monitoring and Explanation

Existing Challenges

Summary

References

Gunosy

Recommendation System Features (News-specific)

Recommendation System Features

Model (Custom Matching-based Model)

Known Issues

Functional Verification

Offline Experiments

Handling Transitions

References

Instagram

Recommendation System Features

User Account Embedding

Ranking Model

Candidate Generation

Ranking Candidates

References

Spotify

Recommendation System Features

Model (Multi-Armed Bandit)

System Development and Monitoring

References

Cookpad

Recommendation System (Development) Features

KPI Design Aligned with Business Model

Difficulty of Integrating Recommendation Systems into Existing Systems

References

Retty

Recommendation System Features

Model (Algorithm Selection)

Presented Information

Exploration

References

ZOZO

Recommendation System Features

Model

References

Mercari

Recommendation System Features

Others