Transcription of Analysing Film Content: A Text-Based Approach
1 Analysing Film content : A Text-Based Approach Andrew Vassiliou Submitted for the Degree of Doctor of Philosophy From the University of Surrey Department of Computing School of Electronics & Physical Sciences University of Surrey Guildford, Surrey, GU2 7XH, UK. July 2006. A. Vassiliou 2006. Summary The aim of this work is to bridge the semantic gap with respect to the analysis of film content . Our novel Approach is to systematically exploit collateral texts for films, such as audio description scripts and screenplays. We ask three questions: first, what information do these texts provide about film content and how do they express it? Second, how can machine-processable representations of film content be extracted automatically in these texts ? Third, how can these representations enable novel applications for Analysing and accessing digital film data?
2 To answer these questions we have analysed collocations in corpora of audio description scripts (AD) and screenplays (SC), developed and evaluated an information extraction system and outlined novel applications based on information extracted from AD and SC scripts. We found that the language used in AD and SC contains idiosyncratic repeating word patterns, compared to general language. The existence of these idiosyncrasies means that the generation of information extraction templates and algorithms can be mainly automatic. We also found four types of event that are commonly described in audio description scripts and screenplays for Hollywood films: Focus_of_Attention, Change_of_Location, Non-verbal_Communication and Scene_Change events. We argue that information about these events will support novel applications for automatic film content analysis.
3 These findings form our main contributions. Another contribution of this work is the extension and testing of an existing, mainly-automated method to generate templates and algorithms for information extraction; with no further modifications, these performed with around 55% precision and 35% recall. Also provided is a database containing information about four types of events in 193 films, which was extracted automatically. Taken as a whole, this work can be considered to contribute a new framework for Analysing film content which synthesises elements of corpus linguistics, information extraction, narratology and film theory. Key words: Film content , Information Extraction, Film Corpora, Collocation, Semantic Gap, Text Analysis, Corpus Linguistics. Email: WWW: ii Acknowledgments First and foremost I would like to thank Dr Andrew Salway for all his guidance and support, and the long meetings that were always useful, interesting and never long enough.
4 I would also like to thank Professor Khurshid Ahmad for the opportunity to do a PhD and for his help with all things computational/corpus linguistics and local grammars. To Craig Bennett, Yew Cheng Loi, Ana Jakamovska and James Mountstephens: thanks for the welcome distractions and for allowing me to bounce ideas off you. A big thanks to the TIWO. project group members, Yan Xu and Elia Tomadaki for their help and guidance. Also to Haitham Trabousli, who spent a great deal of time and patience explaining new concepts to me, a heart felt thank you. Paschalis Loucaides, Bouha Kamzi and James William Russell Green for their proof- reading efforts, thanks guys. To all the people who took the time to do the evaluation needed in this work: thank you! I would also like to thank Dr. David Pitt for his time, effort and many ideas.
5 A special thank you to Darren Johnson, Jamie Lakritz, Dominic Fernando, Vidsesh Lingabavan and Matthew Knight who implemented my work and showed me it has a use . I would also like to warmly thank my parents, sister and family in general for their moral support and guidance, I am the person I am today because of them - Thank you! iii Contents 1 Introduction ..1. Films, Narrative and Collateral 2. Problem Statement: Film Video Data and the Semantic 7. Thesis Overview .. 17. 2 The Analysis of Video Approaches to Video content 21. Analysis of Film 26. Discussion: How Far are we across the Semantic Gap?.. 34. 3 Collocation Analysis of Film Corpora ..38. Theory and Techniques Chosen .. 39. Overall 42. Detailed Method, Results and Examples .. 47. Collocations FSA as Templates for Information Extraction .. 82. Conclusions.
6 94. 4 Automatic Extraction of Film content from Film Scripts ..96. Information Extraction .. 97. Design and 99. Evaluation and Collecting a Gold Standard Data 109. Towards Novel Applications for Accessing Film Video Data .. 120. 146. 5 Contributions, Conclusions and Future Opportunities ..148. Claims and Contributions .. 149. Summary and 151. Opportunities for Future 154. References .. 157. Websites .. 162. Film Scripts .. 164. APPENDIX A Collocations .. 167. APPENDIX B Local Grammar Finite State Automata Graphs 172. APPENDIX C Local Grammar FSA used in Information Extraction 175. APPENDIX D Instructions for Gathering Gold Standard Event Data .. 179. APPENDIX E 183. iv Abbreviations Abbreviations A list of abbreviations used throughout the thesis report. AD Audio Description- A standard for describing what is happening in films to the visually impaired through a separate soundtrack complementary to the film's soundtrack.
7 SC Screenplay- The directing script of a Hollywood film appears in many forms, first, second, third, early, final drafts, post production and pre-production scripts etc. FSA Finite State Automaton and in-plural Finite State Automata- a way of formally representing events and states. In our case they are used to represent restricted phrases of language that contain paths that can be followed to make a coherent phrase. LSP Language for Special Purpose- A language used by experts in a specific domain that exhibits jargon and idiosyncrasies in that domain. LG Local Grammar- A constrained set of words that could concurrently be used in a specific statement or context. IE Information Extraction - A technology dedicated to the extraction of structured information from texts to fill pre-defined templates. COL Change of Location Event- Developed as a template for IE of information pertaining to characters changing location from one area to another in a film.
8 FOA Focus of Attention Event - Developed as a template for IE of information pertaining to characters focussing their attention on other characters and on objects in a film. NVC Non-Verbal Communication Event- Developed as a template for IE for when a character is non-verbally communicating with a body part. ScCh Scene Change Event Template that provides information of when a film's scene is changing. v Ch 1. Introduction 1 Introduction The amount of digitised film and video data available to us through online repositories of movies, DVDs and online video search engines is substantial. The challenge of automatically extracting meaningful video content that a user requires is still unresolved and thus there are minimal services provided to accommodate queries related to high-level semantics ( character behaviours and interactions, emotions, plot points).
9 The concept of a semantic gap, where there is a lack of interpretation between what a computer system can extract from video data and human users' interpretation of that same data, exists, that is non-trivial. For visual data: The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation. Smeulders et al. [[89] pg 8]. The semantic gap is particularly pertinent for video content , where the information that can be extracted by existing audio-visual analysis techniques, from video data, and a user's interpretation of the video data content are not in agreement. This work aims to help bridge that semantic gap, for film content , through the analysis of texts that represent film: film scripts.
10 The digitisation of film data has made film more easily accessible, processable and easier to analyse than ever before. However, many issues still exist concerning the annotation, extraction and retrieval of semantic content from film data, which is still a non-trivial task. A user may want to query high-level concepts in film content , such as: emotional scenes (happy, sad) and atmospheres (suspense, dark) in a film, a specific action or dialogue scene, a scene in an ice hockey game where a specific player started a fight or an automatic summary of a sitcom they missed last week. The bridging of the gap between what the viewer interprets in film content and to what degree a computer system can match that viewer interpretation is of great concern in this work. Thus, there is a need for machine-processable representations of film content .