Example: dental hygienist

Recurrent Neural Network for Text Classification with ...

Recurrent Neural Network for TextClassification with Multi-Task LearningPengfei Liu Xipeng Qiu Xuanjing HuangShanghai Key Laboratory of Intelligent Information Processing, Fudan UniversitySchool of Computer Science, Fudan University825 Zhangheng Road, Shanghai, Network based methods have obtained greatprogress on a variety of natural language process-ing tasks. However, in most previous works, themodels are learned based on single-task super-vised objectives, which often suffer from insuffi-cient training data. In this paper, we use the multi-task learning framework to jointly learn across mul-tiple related tasks. Based on Recurrent Neural net-work, we propose three different mechanisms ofsharing information to model text with task-specificand shared layers.

Long short-term memory network (LSTM) was proposed by [Hochreiter and Schmidhuber, 1997] to specifically ad-dress this issue of learning long-term dependencies. The LSTM maintains a separate memory cell inside it that up-dates and exposes its content only when deemed necessary. A number of minor modifications to the standard LSTM unit have ...

Tags:

  Memory, Network, Terms, Texts, Short, Long, Neural, Recurrent, Stlm, Recurrent neural network for text, Long short term memory

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Recurrent Neural Network for Text Classification with ...

1 Recurrent Neural Network for TextClassification with Multi-Task LearningPengfei Liu Xipeng Qiu Xuanjing HuangShanghai Key Laboratory of Intelligent Information Processing, Fudan UniversitySchool of Computer Science, Fudan University825 Zhangheng Road, Shanghai, Network based methods have obtained greatprogress on a variety of natural language process-ing tasks. However, in most previous works, themodels are learned based on single-task super-vised objectives, which often suffer from insuffi-cient training data. In this paper, we use the multi-task learning framework to jointly learn across mul-tiple related tasks. Based on Recurrent Neural net-work, we propose three different mechanisms ofsharing information to model text with task-specificand shared layers.

2 The entire Network is trainedjointly on all these tasks. Experiments on fourbenchmark text Classification tasks show that ourproposed models can improve the performance of atask with the help of other related IntroductionDistributed representations of words have been widely usedin many natural language processing (NLP) tasks. Follow-ing this success, it is rising a substantial interest to learnthe distributed representations of the continuous words, suchas phrases, sentences, paragraphs and documents[Socheretal., 2013; Le and Mikolov, 2014; Kalchbrenneret al., 2014;Liuet al., 2015a]. The primary role of these models is to rep-resent the variable-length sentence or document as a fixed-length vector.

3 A good representation of the variable-lengthtext should fully capture the semantics of natural deep Neural networks (DNN) based methods usuallyneed a large-scale corpus due to the large number of parame-ters, it is hard to train a Network that generalizes well withlimited data. However, the costs are extremely expensiveto build the large scale resources for some NLP tasks. Todeal with this problem, these models often involve an un-supervised pre-training phase. The final model is fine-tunedwith respect to a supervised training criterion with a gradientbased optimization. Recent studies have demonstrated signif-icant accuracy gains in several NLP tasks[Collobertet al.,2011]with the help of the word representations learned fromthe large unannotated corpora.

4 Most pre-training methods Corresponding based on unsupervised objectives such as word predic-tion for training[Collobertet al., 2011; Turianet al., 2010;Mikolovet al., 2013]. This unsupervised pre-training is effec-tive to improve the final performance, but it does not directlyoptimize the desired learning utilizes the correlation between relatedtasks to improve Classification by learning tasks in by the success of multi-task learning[Caruana,1997], there are several Neural Network based NLP models[Collobert and Weston, 2008; Liuet al., 2015b]utilize multi-task learning to jointly learn several tasks with the aim ofmutual benefit. The basic multi-task architectures of thesemodels are to share some lower layers to determine commonfeatures.

5 After the shared layers, the remaining layers aresplit into the multiple specific this paper, we propose three different models of sharinginformation with Recurrent Neural Network (RNN). All the re-lated tasks are integrated into a single system which is trainedjointly. The first model uses just one shared layer for all thetasks. The second model uses different layers for differenttasks, but each layer can read information from other third model not only assigns one specific layer for eachtask, but also builds a shared layer for all the tasks. Besides,we introduce a gating mechanism to enable the model to se-lectively utilize the shared information. The entire Network istrained jointly on all these results on four text Classification tasks showthat the joint learning of multiple related tasks together canimprove the performance of each task relative to learningthem contributions are of two-folds: First, we propose three multi-task architectures forRNN.

6 Although the idea of multi-task learning is notnew, our work is novel to integrate RNN into the multi-learning framework, which learns to map arbitrary textinto semantic vector representations with both task-specific and shared layers. Second, we demonstrate strong results on several textclassification tasks. Our multi-task models outperformmost of state-of-the-art of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)28732 Recurrent Neural Network forSpecific-Task Text ClassificationThe primary role of the Neural models is to represent thevariable-length text as a fixed-length vector. These modelsgenerally consist of a projection layer that maps words, sub-word units or n-grams to vector representations (often trainedbeforehand with unsupervised methods), and then combinethem with the different architectures of Neural are several kinds of models to model text, such asNeural Bag-of-Words (NBOW) model, Recurrent Neural net-work (RNN)[Chunget al.]

7 , 2014], recursive Neural Network (RecNN)[Socheret al., 2012; 2013]and convolutional neuralnetwork (CNN)[Collobertet al., 2011; Kalchbrenneret al.,2014]. These models take as input the embeddings of wordsin the text sequence, and summarize its meaning with a fixedlength vectorial them, Recurrent Neural networks (RNN) are oneof the most popular architectures used in NLP problems be-cause their Recurrent structure is very suitable to process thevariable-length Recurrent Neural NetworkA Recurrent Neural Network (RNN)[Elman, 1990]is able toprocess a sequence of arbitrary length by recursively applyinga transition function to itsinternal hidden state vectorhtofthe input sequence.

8 The activation of the hidden statehtattime-steptis computed as a functionfof the current inputsymbolxtand the previous hidden stateht 1ht= 0t=0f(ht 1,xt)otherwise(1)It is common to use the state-to-state transition functionfas the composition of an element-wise nonlinearity with anaffine transformation of bothxtandht , a simple strategy for modeling sequence isto map the input sequence to a fixed-sized vector using oneRNN, and then to feed the vector to a softmax layer for clas-sification or other tasks[Choet al., 2014].Unfortunately, a problem with RNNs with transition func-tions of this form is that during training, components of thegradient vector can grow or decay exponentially over longsequences[Hochreiteret al.]

9 , 2001; Hochreiter and Schmid-huber, 1997]. This problem withexplodingorvanishing gra-dientsmakes it difficult for the RNN model to learn long -distance correlations in a short -term memory Network (LSTM) was proposedby[Hochreiter and Schmidhuber, 1997]to specifically ad-dress this issue of learning long -term dependencies. TheLSTM maintains a separate memory cell inside it that up-dates and exposes its content only when deemed number of minor modifications to the standard LSTM unithave been made. While there are numerous LSTM variants,here we describe the implementation used by Graves [2013].We define the LSTM unitsat each time steptto be a col-lection of vectors inRd: aninput gateit,aforget gateft, anoutput gateot,amemory cellctand a hidden number of the LSTM units.

10 The entries of the gating vec-torsit,ftandotare in[0,1]. The LSTM transition equationsh1h2h3 hTsoftmaxyx1x2x3xTFigure 1: Recurrent Neural Network for Classificationare the following:it= (Wixt+Uiht 1+Vict 1),(2)ft= (Wfxt+Ufht 1+Vfct 1),(3)ot= (Woxt+Uoht 1+Voct),(4) ct= tanh(Wcxt+Ucht 1),(5)ct=fit ct 1+it ct,(6)ht=ot tanh(ct),(7)wherextis the input at the current time step, denotes thelogistic sigmoid function and denotes elementwise multi-plication. Intuitively, the forget gate controls the amount ofwhich each unit of the memory cell is erased, the input gatecontrols how much each unit is updated, and the output gatecontrols the exposure of the internal memory Task-Specific Output LayerIn a single specific task, a simple strategy is to map the inputsequence to a fixed-sized vector using one RNN, and then tofeed the vector to a softmax layer for Classification or a text sequencex={x1,x2, ,xT}, we first use alookup layer to get the vector representation (embeddings)xiof the each wordxi.


Related search queries