Transcription of Data Mining Tools for Technology and …
1 VTT RESEARCH NOTES 2451 data Mining Tools for Technology and Competitive IntelligenceESPOO 2008 VTT RESEARCH NOTES 2451 Approximately 80 % of scientific and technical information can be foundfrom patent documents alone, according to a study carried out by theEuropean Patent Office. Patents are also a unique source of informationsince they are collected, screened and published according to internationallyagreed standards. In addition to being an extremely valuable source oftechnology intelligence, patent documents offer a business competitiveintelligence. Being aware of the state of the art of relevant Technology areasis crucial for a company's innovation process. Knowledge of developedtechniques and products forestalls overlapping R&D projects and therebyprevents unnecessary investment. Equally important is the recognition ofother actors operating in the field. Benchmarking and evaluating acompetitor's R&D and market strategies aids in managing one's ownprocesses and locating possible parties for collaboration or the patent system was established, more than 60 million patentapplications have been published.
2 It would be impossible to find andanalyze relevant documents manually. This publication describes theresults and observations obtained in a study testing four sophisticatedpatent analysis and visualization Tools . The Tools were tested with twocases, evaluating their ability to offer Technology and business intelligencefrom patent documents for companies' daily RuotsalainenData Mining Tools for Technologyand Competitive IntelligenceJulkaisu on saatavanaPublikationen distribueras avThis publication is available fromVTTVTTVTTPL 1000PB Box 100002044 VTT02044 VTTFI-02044 VTT, FinlandPuh. 020 722 4520 Tel. 020 722 4520 Phone internat. + 358 20 722 4520 978-951-38-7240-3 (soft back ed.)ISBN 978-951-38-7241-0 (URL: )ISSN 1235-0605 (soft back ed.)ISSN 1455-0865 (URL: ) VTT TIEDOTTEITA RESEARCH NOTES 2451 data Mining Tools for Technology and Competitive Intelligence Laura Ruotsalainen ISBN 978-951-38-7240-3 (soft back ed.)
3 ISSN 1235-0605 (soft back ed.) ISBN 978-951-38-7241-0 (URL: ) ISSN 1455-0865 (URL: ) Copyright VTT 2008 JULKAISIJA UTGIVARE PUBLISHER VTT, Vuorimiehentie 5, PL 1000, 02044 VTT puh. vaihde 020 722 111, faksi 020 722 7001 VTT, Bergsmansv gen 5, PB 1000, 02044 VTT tel. v xel 020 722 111, fax 020 722 7001 VTT Technical Research Centre of Finland, Vuorimiehentie 5, Box 1000, FI-02044 VTT, Finland phone internat. +358 20 722 111, fax + 358 20 722 7001 Cover figure by Tuomo Hokkanen. Top left: Aureka s ThemeMap. Registered trademark of Thomson Reuters. Top right: OmniViz s ThemeMap. Software developed by Biowisdom. Bottom left: Visualization made with STN AnaVist. Registered trademark of the American Chemical right: Thomson data Analyzer's FactorMap. Registered trademark of Thomson Reuters. Technical editing Leena Ukskoski Edita Prima Oy, Helsinki 2008 3 Ruotsalainen, Laura. data Mining Tools for Technology and Competitive Intelligence.
4 Espoo 2008. VTTVTT Tiedotteita Research Notes 2451. 63 p. Keywords patent data , text Mining , data Mining , patent Mining , patent mapping, competitive intelligence, Technology intelligence, visualization Abstract Approximately 80% of scientific and technical information can be found from patent documents alone, according to a study carried out by the European Patent Office. Patents are also a unique source of information since they are collected, screened and published according to internationally agreed standards. In addition to being an extremely valuable source of Technology intelligence, patent documents offer a business competitive intelligence by revealing a competitor s strengths and strategies. Information gained from patents can also help in locating partners for cross-licensing and collaboration. Since the patent system was established, more than 60 million patent applications have been published.
5 It would be impossible to find and analyze relevant documents manually. The need for analysis and evaluation Tools for patents has been acknowledged by many solution providers. New solutions are continuously coming onto the market; Tools for reading and evaluating individual patents and Tools for analyzing sets of patent documents. Solutions of the latter type can still be roughly divided into two groups: Tools for retrieving and preparing basic statistics for patent documents, and Tools for visualization and progressive analysis of patents. The former group deals only with data in a structured form, whereas the latter also analyzes unstructured text and other data . In this study, four efficient Tools for analyzing patent documents were tested: Thomson Reuter s Aureka and Thomson data Analyzer, Biowisdom s OmniViz, and STN s STN AnaVist. All four Tools analyze structured and unstructured data alike.
6 They all visualize the results achieved from clustering the text fields of patent documents and either provide basic statistics graphs themselves or contain filters for performing them with other solutions. The Tools were tested with two cases, evaluating their ability to offer Technology and business intelligence from patent documents for companies daily business. Being aware of the state of the art of relevant Technology areas is crucial for a company s innovation process. Knowledge of developed techniques and products forestalls overlapping R&D projects and thereby prevents unnecessary investment. Equally important is the recognition of other actors operating in the field. Benchmarking and evaluating a competitor s R&D and market strategies aids in managing one s own processes and locating possible parties for collaboration or cross-licensing.
7 4 This study took the point of view of a patent analyst with a basic understanding of patent data but no special knowledge of data Mining techniques or the Tools tested. All the Tools evaluated are very useful for the task and quite easy to adopt for daily work. All four had some strengths and weaknesses in comparison to each other. As a conclusion it could be stated that OmniViz and Thomson data Analyzer are Tools for sophisticated and diversified mathematical analysis of the data . Aureka and AnaVist are convenient for easily visualizing basic statistics and top lists of the data and for making stylish patent maps. The unique features of OmniViz, when compared to the other Tools tested, are the possibility to visualize clustered data from many different points of view and the possibility to evaluate some attributes with patent map animations.
8 Thomson data Analyzer offers efficient Tools for comparing different subsets of the data , for identifying unique values of an attribute. Aureka is the only tool to allow citation analyses and has the most illustrative patent map. STN AnaVist is superior in the possibility to retrieve basic statistics fast and smoothly. The results obtained with all four Tools were very much alike, even though different databases for retrieving the data were used. The top assignees and inventors lists were uniform, as were the year trends and both technological and geographical business areas. Only the reciprocal orders and amounts of documents varied. However, the conclusions drawn from the results, and business decisions made with them, would all be similar regardless of the tool used. 5 Preface This research has been carried out as a diploma work for training programme Patentit-Teollisuus-Tekniikka.
9 The training programme dealt with intellectual property rights and was arranged by Helsinki University of Technology s Lifelong Learning Institute Dipoli. The work was made possible by the support of Thomson Reuters, BioWisdom Ltd. and American Chemical Society by offering free trials for the Tools evaluated and access to data used in the study. The offers and assistance are gratefully acknowledged. October 2008 Laura Ruotsalainen 6 7 Contents Preface ..5 Terminology ..8 1. 2. Technology and Competitive Intelligence from Patent Documents ..10 Patent data ..11 3. Study ..14 data Test Databases and data Sets Used for Testing ..18 Tools for Analysis ..20 4. Analysis with Technology -based data ..25 Landscape ..25 Closer Inspection of Specific Technology ..34 Yearly Trends in Comparing the Patent Portfolio of Two Companies in the Technology Patenting Around One Significant Invention.
10 44 5. Analysis with Company-based data ..47 Landscape ..47 Yearly Trends in Co-operation ..57 6. References ..63 8 Terminology EPO European Patent Office PCT International patent application system, based on Patent Cooperation Treaty IPC International Patent Classification 9 1. Introduction Approximately 80% of scientific and technical information can be found from patent documents alone, according to a study carried out by the European Patent Office. They are also a unique source of information since they are collected, screened and published according to internationally agreed standards. In addition to being an extremely valuable source of Technology intelligence, patent documents offer a business competitive intelligence by revealing a competitor s strengths and strategies. Information gained from patents can help in locating partners for cross-licensing and collaboration.