Example: biology

QUEMDISSE? Reported speech in Portuguese

quemdisse ? Reported speech in PortugueseCl audia Freitas, Bianca Freitas, Diana SantosPUC-Rio & Linguateca, PUC-Rio, Linguateca & University of paper presents some work on direct and indirect speech in Portuguese using corpus-based methods: we report on a study whose aimwas to identify (i) Portuguese verbs used to introduce Reported speech and (ii) syntactic patterns used to convey Reported speech , in orderto enhance the performance of a quotation extraction system, dubbed quemdisse ?. In addition, (iii) we present a Portuguese corpusannotated with Reported speech , using the lexicon and rules provided by (i) and (ii), and discuss the process of their annotation and whatwas :quotation verbs; Reported speech ; corpus annotation; Portuguese1.

So, the presentation of direct speech in Portuguese is a case in point, from a rigid separation between oral scenes and narrative text to Saramago’s prose and free indirect speech

Tags:

  Direct, Reported, Speech, Indirect, Portuguese, Indirect speech, Direct speech, Quemdisse, Reported speech in portuguese

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of QUEMDISSE? Reported speech in Portuguese

1 quemdisse ? Reported speech in PortugueseCl audia Freitas, Bianca Freitas, Diana SantosPUC-Rio & Linguateca, PUC-Rio, Linguateca & University of paper presents some work on direct and indirect speech in Portuguese using corpus-based methods: we report on a study whose aimwas to identify (i) Portuguese verbs used to introduce Reported speech and (ii) syntactic patterns used to convey Reported speech , in orderto enhance the performance of a quotation extraction system, dubbed quemdisse ?. In addition, (iii) we present a Portuguese corpusannotated with Reported speech , using the lexicon and rules provided by (i) and (ii), and discuss the process of their annotation and whatwas :quotation verbs; Reported speech ; corpus annotation; Portuguese1.

2 Introduction and MotivationA considerable amount of language activities involves re-porting what others have said. In certain contexts, such asthe journalistic discourse, the use of Reported speech is cru-cial. (Bergler et al., 2004) found that there are pieces ofnews in which over 90% of the sentences include a natural language processing (NLP), automatic identifica-tion of Reported speech is called Quotation Extraction (QE),which aims to identify quotations in text, and relate them totheir authors. This is a task often associated, or subsidiary,to sentiment analysis, and it is the distinctive subtask for opinion-oriented information extraction (Pang and Lee,2008).

3 Its focus is to identify what is said, who said it, andthe speaker s and the writer s judgments on what was contexts for the exploration of Reported speech haverecently been provided by research in Digital Humanities,see (Mambrini et al., 2012). Through Reported speech , onegives voice to different characters both in a piece of newsor in fiction. Consequently, both the presence and the ab-sence of particular characters indicate choices by the textauthor. By means of quotation identification, it is possi-ble to measure which characters are given voice and, bycontrast, which ones are silenced. (Smith et al., 2014), re-searching female characters in popular movies in 11 coun-tries, established, among other findings, that only 23% ofthese women have lines in action movies.

4 Exploring largeamounts of texts either news or fictional works andwho they quote or silence may provide us with new find-ings about our a relatively regular structure of Reported speech although quite different in different languages, and evenacross varieties, see (Santos, 1998) for some discussion , rule-based approaches to QE are often extremely suc-cessful. However, purely formal marks that indicate thepresence of a quotation, such as quotes in English, are notunique to this purpose, hence recognizing the specific verbsthat are used in these contexts is highly relevant. Addition-ally, not all Reported speech has the aforementioned formalmarks.

5 indirect quotations, constituting almost half of thereported speech in English news (Pareti et al., 2013), aremore difficult to identify, and not always covered by QEsystems. On the other hand, 96% of the clues for reportedspeech in English found by (Pareti et al., 2013) are verbs,which makes us conclude that a lexicon of Reported speechverbs1is of great value for quotation DiscussionAs everything linguistic, there is not an easy way to decidewhat is and what is not a given phenomenon. Furthermore,different languages give more or less (and always different)attention to whatever one chooses to norms, in addition, are cultural norms, not ab-solute laws, and different written cultures (especially thoseinfluenced by translation) are especially prone to changeand to experimenting in different , the presentation of direct speech in Portuguese is a casein point, from a rigid separation between oral scenes andnarrative text to Saramago s prose and free indirect speechin lusophone literature, to the influence (and victory)

6 Ofa completely different graphic form of conveying it theanglophone one, using quotes, which is used overall inPortuguese-speaking countries in newspaper , the punctuation of direct speech is one ofthe areas in which American English differs most fromBritish English, see (Jones, 1996); see also Hofstader s fa-mous complaint when reading the quintessential AmericanSalinger in British clothing (Hofstader, 1997).News is probably the text genre where Portuguese suffersmore influence of globalization (and global English) and,therefore, where an anglophone style is more pronouncedand influential. In fact, it is probably uncontroversial tosay that in Brazil and in Portugal it changed completelyinto English style: using quotes as direct speech , or rather, direct focus and raison d etre of direct speech in narrativefiction (or even non-fiction) is obviously different: while afiction text tries to reproduce or create an oral exchange2, anews text is, on the other hand, interested in assigning re-sponsibility of an utterance to other (identified) actors.

7 In-stead of a dialogue or a conversation among several speak-ers (whose turns are indicated by long dashes in fiction),1 There is a larger class of speech verbs, but not all of these areused to report speech , hence the full namereported speech addition, a particular speech verb can sometimes be used toreport speech , and other times not: for example,falarinEle faloualto( he spoke loudly ) falou que viria( he said he wouldcome ).2 Although a fictive one, see (Brumme and Espunya, 2012) formore on this have a report for responsibility assignment. It is not thecolour, the dialect or the emotion that is at stake, but whatwas said for objective fact, it is this accountability issue that makes the studyof reporting and ( direct and indirect ) written speech rele-vant for a wider audience than literary experts or languagelearners.

8 In a time of extremely short-lived fame, who saidthat is mostly relevant if it is modified by, or served as hasjust said so , and one needs automatic reporting the other hand, it should be stressed that our inter-est is also linguistic, in the sense that we wish to de-fine and study a lexical field, that of language-talk- speech ,that contains the words related to this (central) property ofmankind. Not only from a lexicographic perspective, butalso as a semantic and contrastive topic, given the repeatedstatements that Portuguese differs widely from English inthis respect (Caldas-Coulthard, 1996), and, interestingly,also from Arabic, as noted in the translator s preface in(Jarouche, 2013).

9 3. Previous WorkIn their study of human tagging in English, (Bruce andWiebe, 1999) already showed that attribution was hard, andin (Wiebe et al., 2003) the problem of automated opinionmining is further and colleagues compiled a large quotation corpusin the financial domain (Drury et al., 2011; Drury andAlmeida, 2012) and used it to identify trends in that on another English corpus with 18,000 citations,(Pareti et al., 2013) described several machine learning ex-periments to identify indirect and mixed quotes. The au-thors did not use a specific verb lexicon in their work, theydeveloped a classifier to detect verbs that introduce order to build a quote extraction system, (Sagot et al.)

10 ,2010) focused on creating a lexicon of Reported speechverbs in French, dealing primarily with direct quotes in-troduced by appositional clauses and headed by a quotationverb. Their work influenced heavily our initial attempts inthis far as Portuguese is concerned, there are a number ofworks in this area, too. (Sarmento and Nunes, 2009) pro-posed the VERBATIM system, using a lexicon of 35 re-ported speech verbs and 19 lexico-syntactic patterns, while(Fernandes et al., 2011) used machine learning techniquesto identify quotations and correctly assign their authors tothem for the GloboQuotes corpus, created specifically forthis most relevant system to date is probably the EMMNews explorer (Pouliquen et al.


Related search queries