DeepFM: A Factorization-Machine based Neural Network …

deepfm : A Factorization-Machine based Neural Network for CTR PredictionHuifeng Guo 1, Ruiming Tang2, Yunming Yey1, Zhenguo Li2, Xiuqiang He21 Shenzhen Graduate School, Harbin Institute of Technology, China2 Noah s Ark Research Lab, Huawei, , sophisticated feature interactions behinduser behaviors is critical in maximizing CTR forrecommender systems. Despite great progress, ex-isting methods seem to have a strong bias towardslow- or high-order interactions, or require exper-tise feature engineering. In this paper, we showthat it is possible to derive an end-to-end learn-ing model that emphasizes both low- and high-order feature interactions. The proposed model, deepfm , combines the power of factorization ma-chines for recommendation and deep learning forfeature learning in a new Neural Network architec-ture. Compared to the latest Wide & Deep modelfrom Google, deepfm has a shared input to its wide and deep parts, with no need of featureengineering besides raw features.

Comprehensiveexperiments are conducted to demonstrate the ef-fectiveness and efficiency of deepfm over the ex-isting models for CTR prediction, on both bench-mark data and commercial IntroductionThe prediction of click-through rate (CTR) is critical in rec-ommender system , where the task is to estimate the probabil-ity a user will click on a recommended item. In many rec-ommender systems the goal is to maximize the number ofclicks, so the items returned to a user should be ranked byestimated CTR; while in other application scenarios such asonline advertising it is also important to improve revenue, sothe ranking strategy can be adjusted as CTR bid across allcandidates, where bid is the benefit the system receives ifthe item is clicked by a user. In either case, it is clear that thekey is in estimating CTR is important for CTR prediction to learn implicit featureinteractions behind user click behaviors.

By our study in amainstream apps market, we found that people often down-load apps for food delivery at meal-time, suggesting that the(order-2) interaction between app category and time-stamp This work is done when Huifeng Guo worked as intern atNoah s Ark Research Lab, 1: Wide & deep architecture of deepfm . The wide and deepcomponent share the same input raw feature vector, which enablesDeepFM to learn low- and high-order feature interactions simulta-neously from the input raw be used as a signal for CTR. As a second observation,male teenagers like shooting games and RPG games, whichmeans that the (order-3) interaction of app category, user gen-der and age is another signal for CTR. In general, such inter-actions of features behind user click behaviors can be highlysophisticated, where both low- and high-order feature interac-tions should play important roles. According to the insightsof the Wide & Deep model[Chenget al.]

, 2016]from google,considering low- and high-order feature interactionssimulta-neouslybrings additional improvement over the cases of con-sidering either key challenge is in effectively modeling feature inter-actions. Some feature interactions can be easily understood,thus can be designed by experts (like the instances above).However, most other feature interactions are hidden in dataand difficult to identifya priori(for instance, the classic as-sociation rule diaper and beer is mined from data, insteadof discovering by experts), which can only be capturedauto-maticallyby machine learning. Even for easy-to-understandinteractions, it seems unlikely for experts to model them ex-haustively, especially when the number of features is their simplicity, generalized linear models, such asFTRL[McMahanet al., 2013], have shown decent perfor-mance in practice. However, a linear model lacks the abil-ity to learn feature interactions, and a common practice isto manually include pairwise feature interactions in its fea-ture vector.

Such a method is hard to generalize to modelhigh-order feature interactions or those never or rarely appearin the training data[Rendle, 2010]. factorization MachinesProceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)1725(FM)[Rendle, 2010]model pairwise feature interactions asinner product of latent vectors between features and showvery promising results. While in principle FM can modelhigh-order feature interaction, in practice usually only order-2 feature interactions are considered due to high a powerful approach to learning feature representa-tion, deep Neural networks have the potential to learn so-phisticated feature interactions. Some ideas extend CNNand RNN for CTR predition[Liuet al., 2015; Zhangetal., 2014], but CNN- based models are biased to the in-teractions between neighboring features while RNN-basedmodels are more suitable for click data with sequential de-pendency.

[Zhanget al., 2016]studies feature representa-tions and proposesFactorization- machine supported NeuralNetwork (FNN). This model pre-trains FM before applyingDNN, thus limited by the capability of FM. Feature interac-tion is studied in[Quet al., 2016], by introducing a prod-uct layer between embedding layer and fully-connected layer,and proposing theProduct- based Neural Network (PNN). Asnoted in[Chenget al., 2016], PNN and FNN, like other deepmodels, capture little low-order feature interactions, whichare also essential for CTR prediction. To model both low-and high-order feature interactions,[Chenget al., 2016]pro-poses an interesting hybrid Network structure (Wide & Deep)that combines a linear ( wide ) model and a deep model. Inthis model, two different inputs are required for the widepart and deep part , respectively, and the input of widepart still relies on expertise feature can see that existing models are biased to low- or high-order feature interaction, or rely on feature engineering.

Inthis paper, we show it is possible to derive a learning modelthat is able to learn feature interactions of all orders in an end-to-end manner, without any feature engineering besides rawfeatures. Our main contributions are summarized as follows: We propose a new Neural Network model deepfm (Figure 1) that integrates the architectures of FM anddeep Neural networks (DNN). It models low-order fea-ture interactions like FM and models high-order fea-ture interactions like DNN. Unlike the wide & deepmodel[Chenget al., 2016], deepfm can be trained end-to-end without any feature engineering. deepfm can be trained efficiently because its wide partand deep part, unlike[Chenget al., 2016], share thesame input and also the embedding vector. In[Chengetal., 2016], the input vector can be of huge size as it in-cludes manually designed pairwise feature interactionsin the input vector of its wide part, which also greatlyincreases its complexity.

We evaluate deepfm on both benchmark data and com-mercial data, which shows consistent improvement overexisting models for CTR Our ApproachSuppose the data set for training consists ofninstances( ;y), where is anm-fields data record usually involvinga pair of user and item, andy2 f0;1gis the associated la-bel indicating user click behaviors (y= 1means the userclicked the item, andy= 0otherwise). may include cat-egorical fields ( , gender, location) and continuous fields( , age). Each categorical field is represented as a vec-tor of one-hot encoding, and each continuous field is repre-sented as the value itself, or a vector of one-hot encoding af-ter discretization. Then, each instance is converted to(x;y)wherex= [xfield1;xfield2;:::;xfiledj;:::;xfieldm ]is ad-dimensional vector, withxfieldjbeing the vector representa-tion of thej-th field of . Normally,xis high-dimensionaland extremely sparse.

The task of CTR prediction is to build aprediction model^y=CTRmodel(x)to estimate the prob-ability of a user clicking a specific app in a given DeepFMWe aim to learn both low- and high-order feature this end, we propose a Factorization-Machine based neu-ral Network ( deepfm ). As depicted in Figure 11, DeepFMconsists of two components,FM componentanddeep com-ponent, that share the same input. For featurei, a scalarwiis used to weigh its order-1 importance, a latent vectorViisused to measure its impact of interactions with other fed in FM component to model order-2 feature interac-tions, and fed in deep component to model high-order featureinteractions. All parameters, includingwi,Vi, and the net-work parameters (W(l),b(l)below) are trained jointly for thecombined prediction model:^y=sigmoid(yF M+yDNN);(1)where^y2(0;1)is the predicted CTR,yF Mis the output ofFM component, andyDNNis the output of deep ComponentFigure 2: The architecture of FM component is a factorization machine , whichis proposed in[Rendle, 2010]to learn feature interactionsfor recommendation.

Besides a linear (order-1) interactionsamong features, FM models pairwise (order-2) feature inter-actions as inner product of respective feature latent all figures of this paper, aNormal Connectionin black refersto a connection with weight to be learned; aWeight-1 Connection,red arrow, is a connection with weight 1 by default;Embedding,blue dashed arrow, means a latent vector to be learned;Additionmeans adding all input together;Product, includingInner-andOuter-Product, means the output of this unit is the product of twoinput vector;Sigmoid Functionis used as the output function inCTR prediction;Activation Functions, such as relu and tanh, areused for non-linearly transforming the signal;The yellow and bluecircles in the sparse features layer represent one and zero in one-hotencoding of the input, of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)1726It can capture order-2 feature interactions much more effec-tively than previous approaches especially when the dataset issparse.

DeepFM: A Factorization-Machine based Neural Network …

Tags:

Information

Transcription of DeepFM: A Factorization-Machine based Neural Network …

Related search queries

DeepFM: A Factorization-Machine based Neural Network …

Tags:

Information

Documents from same domain

Related documents

Related search queries