Posts [Recsys] Collaborative Filtering Model
Post
Cancel

[Recsys] Collaborative Filtering Model

๋ฐ์ดํ„ฐ ์ข…๋ฅ˜์— ๋”ฐ๋ฅธ ์ถ”์ฒœ์‹œ์Šคํ…œ ๋ชจ๋ธ ์ข…๋ฅ˜

  1. User-Item Interaction โ†’ Collaborative Filtering Methods ์‚ฌ์šฉ

    ex) ratings or buying behavior

  2. Attribute Information about the users and items โ†’ Content-based Methods ์‚ฌ์šฉ

    ex) textual profiles or relevant keywords, a single(not all) userโ€™s rating

์ •์˜

Collaborative filtering models use the collaborative power of the ratings provided by multiple users to make recommendations

To address some of the limitations of content-based filtering, collaborative filtering usesย similarities between users and items simultaneouslyย to provide recommendations. This allows for serendipitous recommendations; that is, collaborative filtering models can recommend an item to user A based on the interests of a similar user B. Furthermore, the embeddings can be learned automatically, without relying on hand-engineering of features

Explicit, Implicit

Explicit

Users specify how much they liked a particular movie by providing a numerical rating

Implicit :

If a user watches(ํ–‰๋™) a movie, the system infers that the user is interested

Customer preferences are derived from their activities rather than their explicitly specified ratings

When a customer buys(ํ–‰๋™) an item, it can be viewed as a preference for the item. However, the act of not buying an item from a large universe of possibilities does not always indicate a dislike

๋‹จ์ 

  1. The underlying ratings(or feedback) matrices are sparse

    In the movie cases, most users would have viewed only a small fraction of the large universe of available movies

    โ†’ ํ•ด๊ฒฐ์ฑ…:

  2. Cannot handle fresh items
  3. Hard to include side features for query/item

Collaborative Filtering์˜ ๋‘ ๊ฐ€์ง€ ๋ฐฉ์‹

Memory-based Collaborative Filtering

โ€ป Nearest neighborhood-based(์ตœ๊ทผ์ ‘์ด์›ƒ๊ธฐ๋ฐ˜) Collaborative Filtering์ด๋ผ๊ณ ๋„ ํ•จ

These algorithms are based on the fact that similar users display similar patterns of rating behavior and similar items receive similar ratings

โ†’ Target User๊ฐ€ ์•„์ง ํ‰๊ฐ€ํ•˜์ง€ ์•Š์€ ์•„์ดํ…œ์„ Target User์™€ ์œ ์‚ฌํ•œ User์ง‘๋‹จ๋“ค์˜ ํ‰๊ฐ€์˜ ๊ธฐ๋ฐ˜์œผ๋กœ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ!

Methods

  1. User-based collaborative filtering ๋ฐฉ๋ฒ•

    ํ•œ ์•„์ดํ…œ์— ๋Œ€ํ•œ ์ถ”์ฒœ ์ •๋„๋ฅผ ์ธก์ •ํ•˜๋ ค๋ฉด ํ•ด๋‹น ์•„์ดํ…œ์— ๋Œ€ํ•ด ํ‰์ ์„ ๋‚ด๋ ธ๋˜ ์œ ์ €๋“ค ์ „์ฒด ์ค‘์—์„œ ๋‚˜์™€ ์œ ์‚ฌ๋„๊ฐ€ ๋†’์€ ์œ ์ €๋“ค๋งŒ ์ถ”๋ ค ํ•ด๋‹น ์•„์ดํ…œ์„ ์–ด๋–ป๊ฒŒ ํ‰๊ฐ€ํ•˜๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๋Š” ๋ฐฉ๋ฒ•

    The ratings provided by like-minded users of a target user A are used in order to make the recommendations for user A

    Similarity functions are computed between the rows of the ratings matrix to discover similar users

    ย ItemAItemBItemCItemDItemE
    User1344ย 1
    User24443ย 
    User31125ย 

    โ€ป ์ฃผํ™ฉ์ƒ‰ : ์œ ์‚ฌ๋„๊ฐ€ ๋†’์Œ

    โ€ป ํ…Œ์ด๋ธ” ํ•ด์„ : User1๊ณผ User2๋Š” ItemA, B, C๊นŒ์ง€ ํ‰์  ์œ ์‚ฌ๋„๊ฐ€ ๋†’๋‹ค

  2. Item-based collaborative filtering ๋ฐฉ๋ฒ•

    In order to make recommendations for target item A, the first step is to determine a set S of items, which are most similar to item A. Then, in order to predict the rating of any particular user4 for item A, the ratings in set S, which are specified by user4, are determined. The weighted average of these ratings is used to compute the predicted rating of user A for item B

    Similarity functions are computed between the columns of the ratings matrix to discover similar items

    When such methods are applied to the transpose of the ratings matrix, they are referred to as item-based neighborhood models

    ย User1User2User3User4User5
    ItemA544ย 5
    ItemB4445ย 
    ItemC1123ย 

    โ€ป ์ฃผํ™ฉ์ƒ‰ : ์œ ์‚ฌ๋„๊ฐ€ ๋†’์Œ

Distinction Between User-based and Item-based

The ratings in user-based CF are predicted using the ratings of neighboring users, whereas the ratings in item-based CF are predicted using the userโ€™s own ratings on neighboring (i.e., closely related) items.

In the user-based CF case, neighborhoods are defined by similarities among users (rows of ratings matrix), whereas in the item-based C case, neighborhoods are defined by similarities among items (columns of ratings matrix)

Strengths and Weaknesses of Neighborhood-Based Methods

  1. Advantages
    • Simplicity and intuitive approach.

      It is often easy to justify why a specific item is recommended, and the interpretability of item-based methods is particularly notable

    • Relatively stable with the addition of new items and users

  2. Disadvantage
    • Least O(m2) time and space.
    • Limited coverage because of sparsity

      For example, if none of Johnโ€™s nearest neighbors have rated Terminator, it is not possible to provide a rating prediction of Terminator for John. On the other hand, we care only about the top-k items of John in most recommendation settings. If none of Johnโ€™s nearest neighbors have rated Terminator, then it might be evidence that this movie is not a good recommendation for John. Sparsity also creates challenges for robust similarity computation when the number of mutually rated items between two users is small.

Model-based Collaborative Filtering

In model-based methods, machine learning and data mining methods are used in the context of predictive models. In cases where the model is parameterized, the parameters of this model are learned within the context of an optimization framework. Some examples of such model-based methods include decision trees, rule-based models, Bayesian methods and latent factor models. Many of these methods, such as latent factor models, have a high level of coverage even for sparse ratings matrices

Data Matrix

Data_Matrix

  1. Data Classification problem

    we have an m ร— n matrix, in which the first (n โˆ’ 1) columns are feature variables (or independent variables), and the last (i.e., nth) column is the class variable (or dependent variable). All entries in the first (n โˆ’ 1) columns are fully specified, whereas only a subset of the entries in the nth column is specified. Therefore, a subset of the rows in the matrix is fully specified, and these rows are referred to as the training data. The remaining rows are referred to as the test data. The values of the missing entries need to be learned for the test data. This scenario is illustrated in Figure 3.1(a), where the shaded values represent missing entries in the matrix.

    Columns represent features, and rows represent data instances

  2. Collaborative Filtering Problem

    Any entry in the ratings matrix may be missing, as illustrated by the shaded entries in Figure 3.1(b). Thus, it can be clearly seen that the matrix completion problem is a generalization of the classification (or regression modeling) problem

    The specified (observed) entries to be the training data, and the unspecified (missing) entries to be the test data.

Advantages

  1. Space-efficiency:

    Typically, the size of the learned model is much smaller than the original ratings matrix. Thus, the space requirements are often quite low. On the other hand, a user-based neighborhood method might have O(m2) space complexity, where m is the number of users. An item-based method will have O(n2) space complexity.

  2. Training speed and prediction speed:

    One problem with neighborhood-based methods is that the pre-processing stage is quadratic in either the number of users or the number of items. Model-based systems are usually much faster in the preprocessing phase of constructing the trained model. In most cases, the compact and summarized model can be used to make predictions efficiently.

  3. Avoiding overfitting:

    Overfitting is a serious problem in many machine learning algorithms, in which the prediction is overly influenced by random artifacts in the data. This problem is also encountered in classification and regression models. The summarization approach of model-based methods can often help in avoiding overfitting. Furthermore, regularization methods can be used to make these models robust.

Types for Model-based CF

  • Decision and Regression Tree to Collaborative Filtering
  • Rule-based Collaborative Filtering
  • โ€ฆ
  • Latent Factor Models
  • โ€ฆ

โ†’ ๋‹ค์Œ์€ Model-based CF ์ค‘ Latent Factor Models์— ๋Œ€ํ•ด ์„ค๋ช…ํ•จ

Latent Factor Models

The goal is to use dimensionality reduction methods to directly estimate the data matrix in one shot

Dimensionality Reduction

These models leverage well-known dimensionality reduction methods to fill in the missing entries. Dimensionality reduction methods are used commonly in other areas of data analytics to represent the underlying data in a small number of dimensions. The key idea in dimensionality reduction methods is that the reduced, rotated, and completely specified representation can be robustly estimated from an incomplete data matrix. Once the completely specified representation has been obtained, one can rotate it back to the original axis system in order to obtain the fully specified representation. Under the covers, dimensionality reduction methods leverage the row and column correlations to create the fully specified and reduced representation. For example, user-based neighborhood methods leverage user-wise correlations, whereas item-based neighborhood methods leverage item-wise correlations

Intuition for understanding Latent Factor Model

(1) Geometric Intuition

Geometric Intuition

โ†’ ์ดํ›„ ์„ค๋ช…์€ ์ดํ•ดํ•  ์ˆ˜ ์—†์—ˆ์Œ. ๋‹ค์Œ์— ๋‹ค์‹œ ํ™•์ธํ•ด๋ด์•ผ๊ฒ ๋‹ค

(2)Low-Rank Intuition

\[R = UV^T\]
  • $U=m \times k$
  • $V = n \times k$
  • The rank of both row space and the column space of $R$ is $k$

โ†’ ์ดํ›„ ์„ค๋ช…์€ ์ดํ•ดํ•  ์ˆ˜ ์—†์—ˆ์Œ. ๋‹ค์Œ์— ๋‹ค์‹œ ํ™•์ธํ•ด๋ด์•ผ๊ฒ ๋‹ค

Matrix Factorization(MF)

Matrix factorization methods provide a neat way to leverage all row and column correlations in one shot to estimate the entire data matrix

Most dimensionality reduction methods can also be expressed as matrix factorizations

The goal of matrix factorization is to find the optimal feature vectors that minimize the difference between the estimated ratings and the actual ratings

Basic Principles of MF

MF1

  1. Formula

$ R \approx UV^T $

  • \[A \in R^{\ m \times n}\]

    , where $m$ is the number of users(or queries) and $n$ is the number of items

  • Latent Vector/Component: Each column of $U$ (or $V$)

    A latent vector may often be an arbitrary vector of positive and negative values and it becomes difficult to give it a semantic interpretation. However, it does represent a dominant correlation pattern in the ratings matrix

  • Latent Factor : Each row of $U$ (or $V$)
    • User Factor: Each row $\bar{u_i}$ of $U$

      It contains $k$ entries corresponding to the affinity of user $i$ towards the $k$ concepts in the ratings matrix

    • Item Factor : Each row $\bar{v_i}$ of $V$

      It represents the affinity of the $i$th item towards these $k$ concepts

      MF2

  1. Each Rating Value($r_{ij}$)
\[r_{ij} \approx \bar{u_i} \ \cdot \ \bar{v_i}\]
  • $\bar{u_i}= (u_{i1}\dots u_{ik})$
  • $\bar{v_j}= (v_{j1} \dots v_{jk})$
\[r_{ij} \approx \Sum^k_{s=1}u_{is} \ \cdot \ v_{js}\] \[= \Sum^k_{s=1}(\text{Affinity of user i to concept s}) \ \times (\text{Affinity of item j to concept s})\]

Complexity

Matrix factorization typically gives a more compact representation than learning the full matrix. The full matrix has $O(nm)$entries, while the embedding matrices $U, V$ have $O((n+m)d)$ entries, where the embedding dimension $d$ is typically much smaller than $m$ and $n$

As a result, matrix factorization finds latent structure in the data, assuming that observations lie close to a low-dimensional subspace. In the preceding example, the values of n, m, and d are so low that the advantage is negligible. In real-world recommendation systems, however, matrix factorization can be significantly more compact than learning the full matrix.

Advantage and Disadvantage of MF

Advantage

  • Handle sparse and incomplete data

    MF can fill in the missing values and predict ratings for unseen items or users

  • Reduce the dimensionality and complexity of the data

    MF can compress a large matrix into smaller matrices that capture the essential information and reduce the noise

  • Discover latent features and patterns that are not obvious or explicit in the data

    MF can reveal hidden similarities and preferences among users and items, which can enhance the quality and diversity of the recommendations

Disadvantage

  • Overfitting and underfitting

    To avoid these problems, matrix factorization needs to balance the trade-off between fitting the data and regularizing the feature vectors

  • Sensitive to the choice of parameters
  • Limited by the linearity and independence assumptions

    Matrix factorization assumes that the rating is a linear combination of the features, and that the features are independent of each other. However, in some cases, the rating may depend on nonlinear or interactive features, or on external factors such as context, time, or social influence. To address these limitations, matrix factorization may need to incorporate additional information or techniques, such as nonlinear functions, feature engineering, or hybrid models

Reference

This post is licensed under CC BY 4.0 by the author.

[ML] 01. Supervised vs Unsupervised Machine Learning

[ML] 02. Regression Model