Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

garret

[추천] Neural Collaborative Filtering 본문

Paper Review

[추천] Neural Collaborative Filtering

_Sun_ 2023. 1. 21. 18:31

Abstract

Collaborative Filtering에서 user와 item 피처 간 interaction을 모델링 할 때 matrix factorization(MF)을 주로 사용한다. 이는 user와 item의 interaction을 학습하는 방법으로 user-item 공간의 latent features의 inner product로 관계를 표현한다.
본 논문은 linear한 MF의 한계를 지적하고 non-linear한 neural architecture에 기반한 Neural network-based Collaborative Filtering(NCF)를 제안한다.

1. Introduction

Matrix Factorization (MF)
- user, item을 나타내는 latent를 Inner product를 사용해 user-item 공간에 project하는 방법
- latent model-based recommendation에서 가장 유명한 접근법
- Inner product는 user-item의 latent를 linear하게 결합하므로 user-item 간의 복잡한 구조를 잡아내기 충분하지 않다.
본논문에서는 간접적인 방식의 user 선호도 지표인 Implicit feedback에 집중
- Implicit feedback: 비디오를 보는 행동, 물품 구매, 아이템 클릭 등
- 장점 : explicit feedback(예시: ratings, reviews)와 달리, implicit feedback은 자동적으로 추적될 수 있어서 content 제공자가 더 쉽게 수집할 수 있다.
- 단점 : 유저 만족도가 관찰되지 않고 부정적 피드백의 natural scarcity 때문에, utilize하기 쉽지 않다.
본 논문의 기여
- user와 item의 latent를 모델링하는 neural network architecture를 제시하고 neural networks에 기반한 NCF의 일반적인 프레임워크 고안
- NCF의 한 종류로 MF를 생각할 수 있으며 높은 non-linearities의 NCF를 사용하기 위해 multi-layer perceptron 사용
- collaborative filtering을 위한 deep learning의 성과와 NCF 효과를 증명하기 위해 2개의 실제 데이터 셋으로 실험진행

2. Preliminaries

2.1 Learning from Implicit Data

User-item interaction matrix $Y \in \mathbb{R}^{M,N} $

$$
y_{ui}=\begin{cases}
1, & if \ interaction \ (user \ u,item \ i) \ is \ observed;\\\
0, & otherwise
\end{cases}
$$

$M$ : number of users
$N$ : number of items
user의 item 선호도 표현 아님

$$
\hat{y_{ui}} = f(u,i|\Theta)
$$

$\hat{y_{ui}}$ : $y_{ui} $의 추정값
$\Theta $ : model parameters
$f$ : the function that maps model parameters to the predicted score

2.2 Matrix Factorization

Inner product 원리

$$\hat{y_{ui}}=f(u,i|p_u,q_k) =p^T_uq_i =\sum_{k=1}^{K}p_{uk}q_{ik}$$

$p_i $ : latent vector for user u
$q_i $ : latent vector for item i
$K $ : dimension of the latent space

MF

two-way interaction of user, item latent
latent factor의 linear 모델

MF 한계점

$s_{23}(0.66)>s_{12}(0.5)>s_{13}(0.4) $ 바탕으로 p1,p2,p3 기하학적 관계 표시
$s_{41}(0.6)>s_{43}(0.4)>s_{42}(0.2) $
MF에 따르면 u4가 u1, u3, u2 순으로 비슷
기하학적으로는 u4가 u1,u2, u3 순으로 비슷
따라서 MF는 user-item 사이의 복잡한 관계를 정확하게 설명하기엔 부족

3. Neural Collaborative Filtering

3.1 General framework

Input Layer
- Identity of user, item을 원핫인코딩으로 변환한 binarized sparse vector
  - User feature vectors: $v^U_u $
  - Item feature vectors: $v^I_i $
Embedding layer
- Sparse한 인풋을 dense vector로 project하는 fully connected layer
- user와 item latent vector 도출
Neural CF layers
- Multi-layer neural architecture
- latent 벡터를 predicted score로 map
- 각 layer는 user-item interactions의 특정 latent 구조를 발견하게 커스터마이징 가능
- 마지막 hidden layer X의 dimension은 모델의 capability 결정
Output layer
- predicted score $\hat{y_{ui}}$
- 학습으로 $\hat{y_{ui}} $와 $y_{ui} $ 간의 pointwise loss 최소화
- 다른 학습방법 : pairwise learning (Bayesian Personalized Ranking, margin-based loss)

NCF’s Predictive model

$$\hat{y_{ui}}=f(P^Tv^U_u,Q^Tv^I_i|P,Q,\Theta_f)$$

$P\in \mathbb{R}^{M,K} $ : user latent matrix
$Q\in \mathbb{R}^{N,K} $ : item latent matrix
$\Theta_f $ : model parameters

$$f(P^Tv^U_u,Q^Tv^I_i)=\phi_{out}(\phi_X(...\phi_2(\phi_1(P^Tv^U_u,Q^Tv^I_i))...))$$

$\phi_{out} $ : output layer mapping function
$\phi_x $ : x-th neural CF layer
$X $ : total neural CF layers

Loss Function

Squared Loss

$$L_{sqr} =\sum_{(u,i)\in \mathcal Y \cup \mathcal Y^-}w_{ui}(y_{ui}-\hat{y}_{ui})^2$$

$\mathcal{Y}$ : set of observed interactions in Y
$\mathcal{Y^-}$: set of unobserved interactions, set of negative instances
$w_{ui} $ : weight of training instance (u,i)
Implicit data는 target value가 0 또는 1이라 observed value가 Gaussain 분포를 가정한 Squared loss는 적합하지 않음

Binary cross-entropy loss (log loss)

$$L=-\sum_{(u,i)\in \mathcal Y}\log \hat{y_{ui}} - \sum_{(u,i)\in \mathcal Y^-}\log (1-\hat{y_{ui}})
=-\sum_{(u,i)\in \mathcal Y \cup \mathcal Y^-}\ y_{ui}\log \hat{y_{ui}} + (1-\hat{y_{ui}})\log (1-\hat{y_{ui}})$$

해당 L을 최소화하는 파라미터 찾기
optimization : Stochastic Gradient Descent (SGD)

3.2 Generalized Matrix Factorization (GMF)

MF는 NCF 프레임워크의 특별한 경우로 해석가능

Mapping function of the 1st neural CF layer

$$(\phi_1(p_u,q_i)=p_u \odot q_i)$$

$p_u= P^Tv^U_u $ : user latent vector
$q_i = Q^Tv^I_i $ : item latent vector
$\odot $ : element-wise product of vectors
$\phi_1 $: user-item concatenation 함수

$$(\hat{y_{ui}}=a_{out}(h^T(p_u \odot q_i)))$$

$a_{out} $: activation function
- 본논문에서는 sigmoid function 사용
h : edge weights of the output layer
- $h^T=[h_1,...,h_k] $
- non-uniform한 값을 주어 내적할 때 각 텀에 다른 가중치를 줄 수 있게.

3.3 Multi-Layer Perceptron (MLP)

MF, 즉 user, item 간 단순 vector concatenation의 문제
- user, item 사이의 어떠한 interactions도 설명하지 않으며 이는 CF 효과를 모델링하기에 부족
위 문제를 해결하기 위해 standard MLP를 이용하여 concatenated vector에 hidden layers 추가

MLP 모델 구조

$\phi_x $ : x-th layer (non-linear한 구조)
$W_x $: weight matrix for x-th layer
$b_x $: bias vector for x-th layer
$a_x $ : activation function for x-th layer, 여기서는 ReLU function 사용

3.4 Fusion of GMF and MLP, NeuMF

GMF는 latent 간 interaction을 알아내기 위해 linear kernel 사용
MLP는 latent 간 interaction을 알아내기 위해 non-linear kernel 사용
(Neural Matrix Factorization) NeuMF는 GMF와 MLP의 결합
- MF의 linearity와 DNN의 non-linearity를 결합한 모델

NeuMF 구조

$p^G_u, p^M_u $ : the user embedding for GMF, MLP
$q^G_i,q^M_i $ : item embedding
GMP와 MLP 각자 학습한 후 마지막에 concatenation.
GMP, MLP가 separate embedding 사용하는 이유
- GMP, MLP 각각의 최적 embedding size 다르다.

Pre-training

random initializations으로 수렴할 때까지 GMF와 MLP 학습
1. Adaptive Moment Estimation (Adam) 적용
1번의 모델 파라미터를 NeuMF initialization에 사용
1. vanilla SGD로 optimize
2. $h \leftarrow {\alpha h^{GMF} \brack (1-\alpha)h^{MLP}}$
output layer에서 GMF, MLP 두 모델의 weights concatenate
1. $h^{GMF}$ , $h^{MLP}$ : h vector of pretrained GMF, MLP
2. $\alpha$ : 두 pre-trained 모델 간의 trade-off를 결정하는 hyper-parameter

4. Experiments

4.1 Experimental Settings

데이터셋

MovieLens : explicit 피드백 데이터 implicit data로 변환
Pinterest

Evaluation Protocols

leave-one-out evaluation 적용
- 샘플 수 N번의 모델을 만들고 각 모델을 만들 때 하나의 샘플만 제외하면서 그 제외한 샘플로 test set performance를 계산해 N개의 performance에 대해서 평균을 내는 방법
Ranked list 성능 평가
- Hit Ratio(HR) : 적중률
- Normalized Discounted Cumulative Gain(NDCG) : 랭킹추천분야에 사용되는 평가지표, 1에 가까울 수록 우수

NCF methods (GMF, MLP, NeuMF)를 아래 모델들과 비교

ItemPop : interaction 수에 따라 item popularity 순위화
ItemKNN : standard item-based CF method
BPR : MF model을 pairwise ranking loss로 최적화
eALS : MF method for item recommendation

Parameter settings

Keras 기반
batch size [128, 256, 512, 1024]
learning rate [0.0001, 0.0005, 0.001, 0.005]
predictive factors [8, 16, 32, 64]
ex) predictive factors = 8 일 때, neural CF layers는 32→16→8 의 구조, embedding 사이즈는 16

4.2 Performance Comparison

Top 10 item 추천 성능 비교

Top K item 추천 성능 비교

데이터셋별, predictive factor별 HR과 NDCG 비교.
NeuMF가 비교군보다 성능이 우수함을 확인 가능.

Utility of Pre-training

Pre-training 했을 때 대체로 더 나은 성능을 보인다

4.3 Log Loss with Negative Sampling

Movielens에서 GMF, MLP, NeuMF 성능 비교

10 Iteration이 가장 효과적이며 그 이상은 오버피팅 위험 존재
성능 : NeuMF > MLP > GMF

Sampling ratio for negative instances에 따른 성능 비교

적절한 negative instances가 성능에 도움이 된다.
성능 : NeuMF> MLP, GMF, BPR

4.4 Is Deep Learning Helpful?

Hidden layer 개수 변화에 따른 MLP 성능 비교

hidden layer 쌓는 게 대체로 성능에 도움이 되는 걸 실험으로 확인

Conclusion and Future work

NCF와 그 3가지 활용 - GMF, MLP, NeuMF

Future work
- extend NCF to model auxiliary information (user reviews, knowledge bases, temporal signals)
- multi-media items (images and videos )에 해당 모델 적용
- explore the potential of recurrent neural networks and hashing methods

Reference

NCF paper : https://arxiv.org/pdf/1708.05031.pdf
NCF github : https://github.com/hexiangnan/neural_collaborative_filtering

'Paper Review' 카테고리의 다른 글

[추천] Decoupled Side Information Fusion for Sequential Recommendation (0)	2023.05.25
[NLP] Neural Machine : Translation by jointly learning to align and translate (0)	2023.02.28
[추천] Deep Matrix Factorization Models for Recommender Systems (0)	2023.02.01
[추천] PMLF : Prediction-Sampling-based Multilayer-Structured Latent Factor Analysis (0)	2023.01.31

'Paper Review' Related Articles