Example: quiz answers

DRN: A Deep Reinforcement Learning Framework for News ...

DRN: A deep Reinforcement Learning Framework for NewsRecommendationGuanjie Zheng , Fuzheng Zhang , Zihan Zheng , Yang Xiang Nicholas Jing Yuan , Xing Xie , Zhenhui Li Pennsylvania State University , Microsoft Research Asia University Park, USA , Beijing, China this paper, we propose a novel deep Reinforcement Learningframework for news recommendation. Online personalized newsrecommendation is a highly challenging problem due to the dy-namic nature of news features and user preferences. Although someonline recommendation models have been proposed to address thedynamic nature of news recommendation, these methods havethree major issues. First, they only try to model current reward( , Click Through Rate).

namic nature of news features and user preferences. Although some ... have become the new state-of-art methods due to its capability of modeling complex user item (i.e., news) interactions. However, ... Recommender systems [3, 4] have been investigated extensively

Tags:

  User, System, Framework, Learning, Deep, News, Reinforcement, Preference, User preferences, Recommender, Deep reinforcement learning framework for news, Recommender systems

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of DRN: A Deep Reinforcement Learning Framework for News ...

1 DRN: A deep Reinforcement Learning Framework for NewsRecommendationGuanjie Zheng , Fuzheng Zhang , Zihan Zheng , Yang Xiang Nicholas Jing Yuan , Xing Xie , Zhenhui Li Pennsylvania State University , Microsoft Research Asia University Park, USA , Beijing, China this paper, we propose a novel deep Reinforcement Learningframework for news recommendation. Online personalized newsrecommendation is a highly challenging problem due to the dy-namic nature of news features and user preferences. Although someonline recommendation models have been proposed to address thedynamic nature of news recommendation, these methods havethree major issues. First, they only try to model current reward( , Click Through Rate).

2 Second, very few studies consider to useuser feedback other than click / no click labels ( , how frequentuser returns) to help improve recommendation. Third, these meth-ods tend to keep recommending similar news to users, which maycause users to get bored. Therefore, to address the aforementionedchallenges, we propose a deep Q- Learning based recommendationframework, which can model future reward explicitly. We furtherconsider user return pattern as a supplement to click / no click labelin order to capture more user feedback information. In addition,an effective exploration strategy is incorporated to find new attrac-tive news for users. Extensive experiments are conducted on theoffline dataset and online production environment of a commercialnews recommendation application and have shown the superiorperformance of our Learning , deep Q- Learning , news recommendation1 INTRODUCTIONThe explosive growth of online content and services has providedtons of choices for users.

3 For instance, one of the most popular on-line services, news aggregation services, such as Google news [15]can provide overwhelming volume of content than the amount thatusers can digest. Therefore, personalized online content recommen-dation are necessary to improve user groups of methods are proposed to solve the online per-sonalized news recommendation problem, including content basedmethods [19, 22, 33], collaborative filtering based methods [11, 28,This paper is published under the Creative Commons Attribution International(CC BY ) license. Authors reserve their rights to disseminate the work on theirpersonal and corporate Web sites with the appropriate 2018, April 23 27, 2018, Lyon, France 2018 IW3C2 (International World Wide Web Conference Committee), publishedunder Creative Commons CC BY ISBN 978-1-4503-5639-8/18 ], and hybrid methods [12,24,25].

4 Recently, as an extension andintegration of previous methods, deep Learning models [8,45,52]have become the new state-of-art methods due to its capabilityof modeling complex user item ( , news ) interactions. However,these methods can not effectively address the following three chal-lenges in news , the dynamic changes in news recommendations are difficultto dynamic change of news recommendation can beshown in two folds. First, news become outdated very fast. In ourdataset, the average time between the time that one piece of news ispublished and the time of its last click is hours. Therefore, newsfeatures and news candidate set are changing rapidly.

5 Second, users interest on different news might evolve during time. For instance,Figure 1 displays the categories of news that one user has readin 10 weeks. During the first few weeks, this user prefers to readabout Politics (green bar in Figure 1), but his interest graduallymoves to Entertainment (purple bar in Figure 1) and Technology (grey bar in Figure 1) over time. Therefore, it is necessary to updatethe model periodically. Although there are some online recommen-dation methods [11,24] that can capture the dynamic change ofnews features and user preference through online model updates,they only try to optimize the current reward ( , Click ThroughRate), and hence ignore what effect the current recommendationmight bring to the future.

6 An example showing the necessity ofconsidering future is given in Example a user Mike requests for news , the recom-mendation agent foresees that he has almost the same probabilityto click on two pieces of news : one about a thunderstorm alert, andthe other about a basketball player Kobe Bryant. However, accord-ing to Mike s reading preference , features of the news , and readingpatterns of other users, our agent speculates that, after readingabout the thunderstorm, Mike will not need to read news about thisalert anymore, but he will probably read more about basketball afterreading the news about Kobe. This suggests, recommending thelatter piece of news will introduce larger future reward.

7 Therefore,considering future rewards will help to improve recommendationperformance in the long , current recommendation methods [23,35,36,43] usuallyonly consider the click / no click labels or ratings as users , how soon one user will return to this service [48] willalso indicate how satisfied this user is with the of click for different categoriesAutoBusinessPoliticsEducationE ntertainmentMilitaryReal estateTechnologySocietySportsTravelOther sFigure 1: Distribution of clicked categories of an active userin ten weeks. user interest is evolving over , there has been little work in trying to incorporateuser return pattern to help improve third major issue of current recommendation methods is itstendency to keep recommending similar items to users, which mightdecrease users interest in similar the literature, some rein-forcement Learning methods have already proposed to add somerandomness ( , exploration) into the decision to find new Reinforcement Learning methods usually apply the sim-ple -greedystrategy [31] orUpper Confidence Bound(UCB) [23,43](mainly for Multi-Armed Bandit methods).

8 However, both strategiescould harm the recommendation performance to some extent in ashort period. -greedystrategy may recommend the customer withtotally unrelated items, whileUCBcan not get a relatively accuratereward estimation for an item until this item has been tried severaltimes. Hence, it is necessary to do more effective , in this paper, we propose a deep Reinforcement Learn-ing Framework that can help to address these three challenges inonline personalized news , in order to bet-ter model the dynamic nature of news characteristics and user pref-erence, we propose to use deep Q- Learning (DQN) [31] Framework can consider current reward and future rewardsimultaneously.

9 Some recent attempts using Reinforcement learn-ing in recommendation either do not model the future rewardexplicitly (MAB-based works [23,43]), or use discrete user logto represent state and hence can not be scaled to large systems(MDP-based works [35,36]). In contrast, our Framework uses aDQN structure and can easily scale , we consider user re-turn as another form of user feedback information, by maintainingan activeness score for each user . Different from existing work [48]that can only consider the most recent return interval, we con-sider multiple historical return interval information to better mea-sure the user feedback. In addition, different from [48], our modelcan estimate user activeness at any time (not just when user re-turns).

10 This property enables the experience replay update used , we propose to apply a Dueling Bandit Gradient Descent(DBGD) method [16, 17, 49] for exploration, by choosing random itemcandidates in the neighborhood of the current ex-ploration strategy can avoid recommending totally unrelated itemsand hence maintain better recommendation / no clickUseractivinessAction 1 Action 2 Action mUserNewsExplore 2: deep Reinforcement Recommendation SystemOur deep Reinforcement recommender system can be shown asFigure 2. We follow the common terminologies in reinforcementlearning [37] to describe the system . In our system , user pool andnews pool make up the environment, and our recommendationalgorithms play the role of agent.


Related search queries