Example: air traffic controller

REVIEW Seriation and Matrix Reordering Methods: …

REVIEWS eriation and Matrix Reordering methods : An historical OverviewInnar Liiv Department of Informatics, Tallinn University of Technology, Tallinn, EstoniaReceived 11 October 2009; revised 18 January 2010; accepted 19 January 2010 online 11 March 2010 in Wiley InterScience ( ).Abstract: Seriation is an exploratory combinatorial data analysis technique to reorder objects into a sequence along aone-dimensional continuum so that it best reveals regularity and patterning among the whole series. Unsupervised learning,using Seriation and Matrix Reordering , allows pattern discovery simultaneously at three information levels: local fragmentsof relationships, sets of organized local fragments of relationships, and an overall structural pattern. This paper presents anhistorical overview of Seriation and matrixreordering methods , several applicationsfrom the following disciplines are includedin the retrospective REVIEW : archaeology and anthropology; cartography, graphics, and information visualization; sociology andsociometry; psychology and psychometry;ecology; biology and bioinformatics; cellular manufacturing; and operations research.

REVIEW Seriation and Matrix Reordering Methods: An Historical Overview Innar Liiv∗ Department of Informatics, Tallinn University of Technology, Tallinn, Estonia

Tags:

  Methods, Overview, Matrix, Historical, Seriation, Reordering, An historical overview, Seriation and matrix reordering methods

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of REVIEW Seriation and Matrix Reordering Methods: …

1 REVIEWS eriation and Matrix Reordering methods : An historical OverviewInnar Liiv Department of Informatics, Tallinn University of Technology, Tallinn, EstoniaReceived 11 October 2009; revised 18 January 2010; accepted 19 January 2010 online 11 March 2010 in Wiley InterScience ( ).Abstract: Seriation is an exploratory combinatorial data analysis technique to reorder objects into a sequence along aone-dimensional continuum so that it best reveals regularity and patterning among the whole series. Unsupervised learning,using Seriation and Matrix Reordering , allows pattern discovery simultaneously at three information levels: local fragmentsof relationships, sets of organized local fragments of relationships, and an overall structural pattern. This paper presents anhistorical overview of Seriation and matrixreordering methods , several applicationsfrom the following disciplines are includedin the retrospective REVIEW : archaeology and anthropology; cartography, graphics, and information visualization; sociology andsociometry; psychology and psychometry;ecology; biology and bioinformatics; cellular manufacturing; and operations research.

2 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 70 91, 2010 Keywords: Seriation ; Matrix Reordering ; Bertin s matrices; co-clustering; biclustering; combinatorial data analysis1. INTRODUCTIOND ifferent traditions and disciplines struggle for the lead-ing position toward providing the best insights about thedata. Seriation , which is the focus of this paper, can be posi-tioned at the intersection of data mining, information visual-ization, and network science. All those areas share similaressential goals, but have minimal overlap in mainstreamresearch progress. Prominent authors in the discipline ofinformation visualization [1] (p. 351) have identified thatthe data mining community gives minimal attention toinformation visualization, but believe that there are hope-ful signs that the narrow bridge between data mining andinformation visualization will be expanded in the comingyears . Shneiderman [2] has pointed out that most bookson data mining have only a brief discussion of informationvisualization and vice versa and that the process of com-bining statistical methods with visualization tools will takesome time because of the conflicting philosophies of thepromoters.

3 A Seriation and Matrix visualization result contains theclustering of data with additional information about howone cluster is related to another, what the bridging objectsare, and what the transition of the objects is like inside thecluster. From the perspective of association rules, a result ofCorrespondence to:Innar Liiv can also be interpreted as a chain of associationsbetween objects, which is a non-redundant and optimalpresentation of a possibly very lengthy list of associationrules. Moreover, it also introduces structural context tothose ath [3] (p. 212) considered such Matrix permutationapproaches to have a great advantage in contrast to thecluster algorithms, because no information of any kind islost, and because the number of clusters does not have tobe presumed; it is easily and naturally visible . Murtagh[4], Arabie and Hubert [5,6] referred to similar advantagescalling such an approach a non-destructive data analysis ,emphasizing the essential property that no transformationor reduction of the data itself takes described the procedure [7] (p.)

4 6) as simplifyingwithout destroying and was convinced (p. 7) that simplifi-cation was no more than regrouping similar things . Seriation is closely related to clustering, although theredoes not exist an agreement across the disciplines aboutdefining their distinction. In this paper, Seriation is con-sidered different from clustering as shown in Fig. 1. Thelinking element between those two methods is a clusteringwith optimal leaf ordering [8 10], which, from the per-spective of clustering, is a procedure to order the clustersat each level so that the objects on the edge of each clusterare adjacent to that object outside the cluster to which it isnearest [8]. From the perspective of Seriation , the result is 2010 Wiley Periodicals, : Seriation and Matrix Reordering Methods71 SERIATIONCLUSTERINGWITH OPTIMALLEAF ORDERINGCLUSTERINGTHE FOCUS OF THIS PAPERFig. 1 Seriation and optimal ordering and rearrangement, but an additionalgrouping procedure is necessary to identify cluster bound-aries and clusters [11 13].

5 The goal of this paper is to present an historical overviewof Seriation and Matrix Reordering methods in order to cross-fertilize and align research activities and applications indifferent the following section, a definition and an example ofseriation will be presented, along with the introduction ofthe three most common forms of data representations foundin related Seriation : A DefinitionSeriation has the longest roots among the disciplines ofarchaeology and anthropology, where, for the moment, ithas reached also the maturest level of research. A recentmonograph about Seriation by O Brien and Lyman [14]includes an extended discussion of the terminology, anda consensual definition by Marquardt [15] (p. 258):[ Seriation is] a descriptive analytic technique, the pur-pose of which is to arrange comparable units in a singledimension (that is, along a line) such that the position ofeach unit reflects its similarity to other Brien and Lyman [14] (p.)

6 60) point out that nowherein Marquardt s definition is the termtimementioned ,which perfectly coincides with our interpretation and , to make the definition more general and compat-ible with the rest of this paper and set the scene for theconstruction of our own definition, we suggest to: understand and interpret the phrase ofcomparableunitsasunits from the same modeaccording toTucker s terminology [16], making it less ambiguouswhether it is allowed or not to arrange column-conditional and other explicitly non-comparable unitsalong a continuum; give more emphasis to simultaneous pattern discov-ery at several information levels from local patternsto global. It would make the definition compatiblewith the requirements set to such Matrix permutationsby Bertin [7] (p. 12). He saw information as a rela-tionship, which can exist among elements, subsets,or sets, and was convinced that the eye perceivesthe three levels of informations spontaneously [7](p.

7 181).Consequently, we are able to construct a definition ofseriation to reflect our emphasis and focus as follows:Seriationis an exploratory data analysis technique toreorder objects into a sequence along a one-dimensionalcontinuum so that it best reveals regularity and patterningamong the whole higher-mode Seriation can be simultaneously per-formed on more than one set of entities, however, entitiesfrom different sets are not mixed in the sequence and pre-serve a separate one-dimensional kind of definition is compatible with less rigorousdefinitions and highly subjective goals ( , to maximizethe human visual perception of patterns and the overalltrend), but encourages the interpretation from the perspec-tive of minimum description length principle [17,18] anddata compressibility [19-21], [22, p. 531].For didactic purposes, examples in this section will onlyuse binary values, however, we consider and discuss severalcommon value types of the data, where applicable.

8 Thescope is additionally limited to entity-to-entity and entity-to-attribute data tables, or using Tucker s [16] terminologyand Carroll Arabie [23] taxonomy, we are concentratingon two-way one-mode (N N; square tables, where rowsand columns refer to the same entities) and two-mode(N M; rectangular tables, where rows and columns referto two different sets of entities) data tables. It should beemphasized that such a scope definition does not restrictthe one-mode data table to be symmetric, or make entity-to-entity data table to be exclusively one-mode, , therecan be relations between entities from different sets, us look at the following example of Seriation alongwith the introduction of the three most common formsof how data are presented in the related literature andthroughout this paper: a Matrix , a double-entry table withlabels, and a color-coded graphical plot. We may often usethe wordmatrixin the paper interchangeably to refer to allof those example dataset is first presented as a graph in Fig.

9 2,where nodes (vertices) are, for example, companies and arcsStatistical Analysis and Data Mining Analysis and Data Mining, Vol. 3 (2010)0307040105060208 Fig. 2 An example the nodes represent a value stream in the will first construct an asymmetric adjacency matrixthat reflects the structure of such a directed graph ( , anon-diagonal entryaijis the number of arcs from nodeito nodej; see refs [24 26] for a discussion about thedifferences between node-link diagrams ( , Fig. 2) andmatrix-based representation):A= 1010001001000001001000101001001001011001 010011010010001001000001 Another way to present the same structure is by usinga double-entry table with node labels, where the positiveelements have been shaded and formatted differently forbetter visual perception (Table 1).Inspired by Czekanowski [27] and Bertin [28], it is oftenreasonable to present the Matrix with a graphical plot, wherenumerical values are color coded.

10 With binary data, themost typical way is to use filled and empty cells to denote ones and zeros , respectively. Using such an approach,we can visualize the above structure as presented in Fig. table of the example 3 A graphical Bertin plot for the example such plain matrices, tables and plots, it is stillrather complicated to identify the underlying relationshipsin the data, find patterns and an overall trend. Objects insuch an adjacency Matrix are ordered arbitrarily, typicallyin the order of data acquisition/generation, or just sortedalphabetically by labels or names. Changing the order ofrows and columns, therefore, does not change the structure:there aren!(orn!m!in case of a two-mode Matrix )permutations of the same Matrix that will explicitly reflectthe identical structure of the system under observation. Thegoal of Seriation is to find such a permutation, , to reorderthe objects from the same mode in a sequence so thatit best reveals regularity and patterning among the wholeseries.


Related search queries