Example: barber

Robots Learning How and Where to Approach …

Robots Learning How and Where to Approach PeopleOmar A. Islas Ram rezaHarmish KhambhaitabRaja ChatilaaMohamed ChetouaniaRachid AlamibAbstractRobot navigation in human environments has been in theeyes of researchers for the last few years. Robots operat-ing under these circumstances have to take human aware-ness into consideration for safety and acceptance , navigation have been often treated as goingtowards a goal point or avoiding people , without consid-ering the robot engaging a person or a group of people inorder to interact with them. This paper presents two navi-gation approaches based on the use of inverse reinforcementlearning (IRL) from exemplar situations. This allow us toimplement two path planners that take into account socialnorms for navigation towards isolated people .

Robots Learning How and Where to Approach People Omar A. Islas Ram rez aHarmish Khambhaitab Raja Chatila Mohamed Chetouania Rachid Alamib Abstract Robot navigation in human environments has been in the

Tags:

  Approach, People, Where, Learning, Robot, Robots learning how and where to approach, Robots learning how and where to approach people

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Robots Learning How and Where to Approach …

1 Robots Learning How and Where to Approach PeopleOmar A. Islas Ram rezaHarmish KhambhaitabRaja ChatilaaMohamed ChetouaniaRachid AlamibAbstractRobot navigation in human environments has been in theeyes of researchers for the last few years. Robots operat-ing under these circumstances have to take human aware-ness into consideration for safety and acceptance , navigation have been often treated as goingtowards a goal point or avoiding people , without consid-ering the robot engaging a person or a group of people inorder to interact with them. This paper presents two navi-gation approaches based on the use of inverse reinforcementlearning (IRL) from exemplar situations. This allow us toimplement two path planners that take into account socialnorms for navigation towards isolated people .

2 For the firstplanner, we learn an appropriate way to Approach a personin an open area without static obstacles, this information isused to generate robot s path plan. As for the second plan-ner, we learn the weights of a linear combination of contin-uous functions that we use to generate a costmap for theapproach-behavior. This costmap is then combined withothers, a costmap with higher cost around obstacles,and finally a path is generated with Dijkstra s Aware Navigation, Inverse Reinforce-ment Learning , Approaching People1 IntroductionWe Approach people everyday and interact with them, andit is an intuitive situation when one gathers with theirfriends or family. In this intuitive behavior, we know thatcertain motions or situations are not socially acceptableand we try to avoid them.

3 What do we do exactly? This isa simple question, but when we refer to a robot , we have tomodel and formalize its behavior, and implement it frompath planner to entire navigation this paper, two navigation strategies to Approach a hu-man were implemented using low level information abouthuman s position and orientation. The first one is a pathplanner that takes into account only a relative human polarframe as in Figure 1(a) and the second one is a costmaplayer [9] based on the same variables that can take intoaccount obstacles shown in Figure 1(b). The main dif-ference compared to other works in human aware naviga-tion [16, 4, 17] is that instead of a human operator givingaISIR-CNRS, Universit e Pierre et Marie Curie, Paris, Laboratory for Analysis and Architecture of Sys-tems, Toulouse, France{harmish, 1:a)Proposed path to Approach the person.}

4 Vi-olet line: MDP resolution in a deterministic or the mostprobable transition case. Green line: fitted curved treatedwith least squares and B ezier )Layered CostmapNavigation with IRL learned layera goal, it is our algorithm that provides the goal to reachand the appropriate work is partially performed within the SPENCER project (Social situation-aware perception and action forcognitive Robots ) which aims to deploy a fully autonomousmobile robot to assist passengers at the AmsterdamSchiphol Airport. In the Approach developed in this pa-per, the robot learns a policy from exemplary trajectoriesof humans approaching people to interact with them. Suchtrajectories define social norms as a reference for robot rest of the paper is organized as follows.

5 Section 2refers to related works. Given that our scenario is basedon Learning from demonstrations using IRL techniques, wedefine our model in Section model is appliedto demonstrations given by an expert, in our case thesedemonstrations are paths generated by a robot controlledby a person. These demonstrations are the input of theIRL algorithm. Learned policy from IRL output is usedto generate a path-plans in Section 4. Lastly, Section 5provides experimental results before a Related WorksIn robot navigation, path planners usually minimize time ordistance. However, this is often not the case for social pathplanning, because we need to respect the private and socialspaces of a person or group of people .

6 This topic is handledby Human Aware Navigation [6]. Some authors [5, 16] havetaken into account proxemics as costs and constraints in thepath planner to obtain acceptable paths with hard-codedproxemics values derived from sociology. However, thesevalues are not necessarily true in all situations, as theycould depend on the velocities of the people , as commentedin [10].Other works that deal with the subject of approachinghumans [14, 3], focus on tackling the problem of a taskplanner, consideringpedestrians intentionssuch as peo-ple interested in approaching the robot . As for the navi-gation part, they look for the robot s path intersecting aperson while he/she moves. Shomin s work [15] consid-ers approaching people in order to interact in collaborativetasks, nonetheless they used hard-coded waypoints in or-der to navigate.

7 In our work, we focus in the way the robotshall move in order to reach an engagement given previ-ously generated demonstrations. A main difference withthese related works is that we find the final position giventhe demonstrations instead of hardcoding Reinforcement Learning method enables a robotto learn a policy using discrete and finite MDP (MarkovDecision Process) in which the states are derived from therobot s relative position and orientation with respect tothe human. Lately, IRL has been shown to teach machinesto act as humans do. For example, in the Human AwareRobotics domain, recent works address robot navigation incrowds [2, 17] and other social spaces [4]. These examplestackle navigation from point A to point B while avoidingpeople, not for approaching them.

8 The closest work toours is [13] Where they develop a method based on IRLfor enabling a robot to move among people and in theirvicinity (4mx4m) in a human-like manner. We specificallyaddress the problem ofhow to Approach peopleto interactwith them, therefore our algorithm is able to provide aproper goal to be reached by the robot and a social path toreach this goal. This requires a specific model representingthe space around the humans and appropriate trajectoriesfor homing on Modeling StepsIn this section, we first recall the inverse reinforcementlearning problem based on the MDP. We then introducethe components of the MDP which composes our MDP and IRLA finite Markov Decision Process is classically defined bythe following five elements: A finite set of statesS.

9 A finite set of actionsA. A transition probability functionP(st,at 1,st 1),which is the probability to reach statestby achiev-ing actionat 1in statest 1. The transition ma-trixT(S,A,S) is composed of all such probabilitiesP(st|at 1,st 1) and its size isS A S. A reward functionR(st,at 1) Rthat depends onthe state-action pair. A discount factor [0,1) which reflects the impor-tance of short-term vs. long-term the MDP consists of finding an optimal policy,which provides an action for every state that should beselected in order to maximize the total Learning (RL) is a part of machine learn-ing in which the learner is not told which actions to take, asin most forms of machine Learning , instead it must discoverwhich actions yield the most reward by trying them Reinforcement Learning (IRL) on the other hand,deals with the problem of finding the reward from eitheran existent policy or from a demonstrated sequence (as inour case).]

10 We assume that the expert from which we want to learncan be modeled by an MDP. Our problem is defined by thetuple S,A,T,R,D, , which is an MDP plus the addedDvariable which represents demonstrations given by , we have a collection of IRL algorithms [11, 8,18, 12, 1].Since we want to find a reward function based on thestate-action pairs, we can represent a state-action pair as avector of features (s,a) = [f1(s,a),f2(s,a),..,fn(s,a)],wherefiis theith function of the state-action pair. Thus,we can represent our reward function as a linear combina-tion of these featuresR(s,a) =wT (s,a). Wherewis thevector of general, Learning the reward function is accomplishedas follows. At the very first time a random reward is cre-ated, for this case, a random weighted vectorw.


Related search queries