Transcription of Facilitating InformationSharing
1 Principles ofData ManagementFacilitatingInformation SharingKeith Gordon 2007 Keith right of Keith Gordon to be identified as author of this work has been asserted by him in accordance with Sections 77 and78 of the Copyright, Designs and Patents Act rights reserved. Apart from any fair dealing for the purposes of research or private study, or criticism or review, aspermittedby the Copyright Designs and Patents Act 1988, no part of thispublication may be reproduced, stored or transmitted in anyform or by any means, except with the prior permission in writing of the publisher, or in the case of reprographic reproduction,in accordance with the terms of the licences issued by the Copyright Licensing Agency. Enquiries for permission to reproducematerial outside those terms should be directed to the British Computer SocietyPublishing and Information ProductsFirst Floor, Block DNorth Star HouseNorth Star AvenueSwindonSN2 978-1-902505-84-8 British Cataloguing in Publication CIP catalogue record for this book is available at the British trademarks, registered names etc.
2 Acknowledged in this publication are to be the property of their respective :The views expressed in this book are those of the author and do not necessarily reflect the views of the British Computer Societyexcept where explicitly stated as every care has been taken by the authors and the British Computer Society in the preparation of the publication,no warranty is given by the authors or the British Computer Society as publisher as to the accuracy or completeness of theinformation contained within it and neither the authors nor the British Computer Society shall be responsible or liable for anyloss or damage whatsoever arising by virtue of such information or any instructions or advice contained within this publicationor by any of the and tablesxiAbout the authorxvForewordxviiGlossaryxixPrefacexx vIntroductionxxix1 Data and the Enterprise1 Information is a key business resource1 The relationship between information and data2 The importance of the quality of data4 The common problems with data5An enterprise-wide view of data7 Managing data is a business issue8 Summary102 Database Development11 The database architecture of an information system11An overview of the database development process16 Conceptual data modelling (from a project-level perspective)
3 20 Relational data analysis33 The roles of a data model49 Physical database design50 Summary533 What is Data Management?54 The problems encountered without data management54 Data management responsibilities57 Roles within data management60 The benefits of data management62 The relationship between data management and enterprisearchitecture62 Summary64 Contentsviii4 Corporate Data Modelling65 Why develop a corporate data model?65 More data modelling concepts66 The nature of a corporate data model72 How to develop a corporate data model74 Corporate data model principles78A final thought82 Summary835 Data Definition and Naming Conventions84 The elements of a data definition84 Data naming conventions88 Summary906 Metadata91 What is metadata?91 Metadata for data management91 Metadata for content management92 Metadata for describing data values93 Summary937 Data Quality95 What is data quality?95 Issues associated with poor-quality data95 The causes of poor-quality data96 The dimensions of data quality97 Data model quality98 Improving data quality99 Summary1018 Data Accessibility102 Data security102 Data integrity107 Data recovery109 Summary1119 Database Administration112 Database administration responsibilities112 Performance monitoring and tuning114 Summary11510 Repository Administration116 Repositories, data dictionaries, encyclopaedias.
4 Catalogs anddirectories116 Repository features118 The repository as a centralised source of information120 Metadata models122 ContentsixSummary12211 The Management of Data Management123 Techniques and skills for data administration123 Techniques and skills for database administration124 Techniques and skills for repository administration125 The positioning of data management within the enterprise125 Summary13012 Industry Trends and their Effects on Data Management131 The use of packages131 Distributed data and databases133 Data warehousing and data mining139 Object orientation and databases145 Multimedia and databases152 Data and web technology155 Summary158 AComparison of Data Modelling Notations159 BHierarchical and Network Databases169 CGeneric Data Models177 DAn Example of a Data Naming Convention183 EMetadata Models195FA Data Mining Example201 GHTML and XML205 HXML and Relational Databases213 References219 Further reading221 Index223xiFigures and tablesFigure The relationship between data and information4 Figure A model of a database system12 Figure The three level schema architecture15 Figure A simplified view of the database development process17 Figure A conceptual data model diagram18 Figure A portion of an SQL create script19 Figure The EMPLOYEE entity21 Figure The attributes of the EMPLOYEE entity22 Figure The ADDRESS entity23 Figure The resident at relationship24 Figure The QUALIFICATION and EMPLOYEEQUALIFICATION entities26 Figure The GRADE and EMPLOYEE GRADE entities27 Figure The DEPARTMENT and ASSIGNMENT entities28 Figure The one-to-one managed by relationship29 Figure The many-to-many managed by relationship30 Figure The resolution of the many-to-many managed by relationship30 Figure The EMPLOYEE NEXT OF KIN entity32 Figure A relation shown as a
5 Table34 Figure The human resources paper record36 Figure The un-normalised EMPLOYEE relation 37 Figure The first normal form relations39 Figure Diagram of the first normal form relations41 Figure The second normal form relations43 Figure Diagram of the second normal form relations44 Figure The third normal form relations46 Figure Diagram of the third normal form relations47 Figure Data management activities58 Figure Data management deliverables60 Figure The business drives; data management steers60 Figure The relationship between data management andinformation management61 Figure The human resources conceptual data modeldeveloped in Chapter 267 Figure Entity subtypes68 Figure An exclusive arc70 Figure Subtyping instead of using an exclusive arc71 Figure A first top-down starter model76 Figures and tablesxiiFigure A second top-down starter model77 Figure A third top-down starter model77 Figure The supply-chain model79 Figure The improved supply-chain model80 Figure More than one type of support?
6 80 Figure The combined support model81 Figure The trade-off triangle82 Figure A data definition with validation criteria and validoperations87 Figure A data definition with valid values87 Figure An entity definition88 Figure The dimensions of data quality98 Figure The five dimensions of data model quality99 Figure Total Quality data Management methodology101 Figure Table privilege statements103 Figure A function privilege statement103 Figure A database object privilege statement104 Figure A view statement and an associated table privilegestatement104 Figure A user-specific view statement and an associatedtable privilege statement104 Figure The role of directories or catalogs116 Figure The relationship between a CASE tool and itsencyclopaedia or data dictionary117 Figure The architecture of a repository118 Figure The scope of a repository119 Figure The repository procurement process120 Figure A repository as a centralised source of information121 Figure The data administration skill set123 Figure Business-based data management127 Figure Independent data management in the IT/IS department 127 Figure Data management within systems development128 Figure Dispersed IT/IS based data management129 Figure Distributed data management129 Figure Vertical fragmentation136 Figure Hybrid fragmentation136 Figure A typical data warehouse architecture139 Figure A multidimensional data model141 Figure A typical relational schema for a data warehouse144 Figure A simple conceptual data model147 Figure The ODL schema definitions for the simpleconceptual data model148 Figure Structured type declarations150 Figure Table declarations using structured types and collections 150 Figures and tablesxiiiFigure Revised simple conceptual data model
7 With entitysubtypes151 Figure Creating structured types152 Figure Creating tables based on the structured types153 Figure A data model in Ellis Barker notation160 Figure Ellis Barker data model with attribute annotationand unique identifiers161 Figure A data model in Chen notation162 Figure A data model in Information Engineering notation163 Figure A data model in IDEF1X notation165 Figure An object class model in UML notation166 Figure Comparison of the relationship notations167 Figure Comparison of the overall data model notations168 Figure Conceptual data model170 Figure Relational database occurrences170 Figure Hierarchical database schema171 Figure Data definition statements for a hierarchical database172 Figure Hierarchical database occurrences173 Figure Hierarchical database records in sequence173 Figure Network database schema174 Figure Data definition statements for a network database175 Figure Network database occurrences176 Figure The generic to specific continuum178 Figure The cost-balance of flexible design180 Figure A metadata model describing conceptual datamodel concepts196 Figure A conceptual data model snippet197 Figure A metadata model describing physical SQLdatabase concepts198 Figure A metadata model showing mapping between elements 199 Figure The ICL Data Dictionary System200 Figure An example of an HTML document206 Figure The HTML document rendered in Mozilla Firefox207 Figure An example of an XML document208 Figure The tree structure of the XML document209 Figure Specimen data for XML representation examples213 Figure The employee table represented as a valid XML document 214 Figure The employee table represented as XML without aroot element215 Figure An example SQL query to
8 Create an XML document215 Figure An example of an XML document216 Figure An edge table created by shredding an XML document 218 Figure A query on an XML document218 Figure The result of the query on an XML document218 Figures and tablesxivTable Restricted terms used in the naming of entity types191 Table Restricted terms used in the naming of domains192 Table Restricted terms used in the naming of attributes192 Table Restricted terms used in the naming of relationships193 Table Examples of formal attribute names194 Table algorithm: Step 1 results201 Table algorithm: Step 2 results202 Table algorithm: Step 3 results202 Table algorithm: Step 4 results203 Table algorithm: Step 5 results203 Table algorithm: Step 6 results204xvAbout the authorKeithGordonwasaprofessionalsoldier for38years, of technical, educational and managerial appointments and gaineda Higher National Certificate in Electrical and Electronic Engineering, a Cer-tificate in Education from the University of London Institute of Education, aBA from the Open University and an MSc from Cranfield Institute of Tech-nology.
9 From 1992 until his retirement in 1998, he was first a member of andthen head of the Army s data management is now an independent consultant and lecturer specialising in courses, he is also a tutor for the Open is a Chartered Member of the British Computer Society and a Memberof the Chartered Institute of Personnel and holds the Diploma in Business Systems Development specialising inData Management from the Information Systems Examination Board (ISEB) is the secretary of the Data Management Specialist Group ofthe BritishComputer Society and is both a founder member and current committeememberoftheUKchapterofDAMAI nternational,theworldwideassociationof data management author of this book is a soldier through and through but he also has acomprehensiveunderstandingofthe principlesofdatamanagementandis ahighly skilled professional educator. This rather unusual blend of experiencemakes this book very management can be seen as a chore best left to people with noimagination but Keith Gordon taught me that it can be a matterof life all know that any collective enterprise must have recordsthat areboth reasonably accurate and readily accessible.
10 In a commercial operation,failures in data management can lead to bankruptcy. In a public service itcan put the lives of thousands of people at risk and waste public moneyon a grand scale. For a soldier in the heat of battle, any weakness in theavailability, quality or timeliness of information can lead to a poor decisionthat may result in what has this to do with the principles of data management ? It servesas a reminder that a computer application is only as good as the data onwhich it is common for the development of computer systems to startfrom thedesired facilities and work backwards to identify the objects involved andso to the data by which these objects are described. One bad result of thisapproach is that the data resource gets skewed by the design of specificfacilities that it is required to the business decides that these facilities have to be changed, thedata resource must be modified. Does this matter?