Example: barber

Enterprise Data Analysis and Visualization: An …

Enterprise data Analysis and visualization : An Interview StudySean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey HeerAbstract Organizations rely on data analysts to model customer engagement, streamline operations, improve production, informbusiness decisions, and combat fraud. Though numerous Analysis and visualization tools have been built to improve the scale andefficiency at which analysts can work, there has been little research on how Analysis takes place within the social and organizationalcontext of companies. To better understand the Enterprise analysts ecosystem, we conducted semi-structured interviews with 35data analysts from 25 organizations across a variety of sectors, including healthcare, retail, marketing and finance. Based on ourinterview data , we characterize the process of industrial data Analysis and document how organizational features of an enterpriseimpact it.

Enterprise Data Analysis and Visualization: An Interview Study Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey Heer Abstract—Organizations rely on data analysts to model customer engagement, streamline operations, improve production, inform business decisions, and combat fraud.

Tags:

  Analysis, Data, Enterprise, Visualization, Enterprise data analysis and visualization

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Enterprise Data Analysis and Visualization: An …

1 Enterprise data Analysis and visualization : An Interview StudySean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey HeerAbstract Organizations rely on data analysts to model customer engagement, streamline operations, improve production, informbusiness decisions, and combat fraud. Though numerous Analysis and visualization tools have been built to improve the scale andefficiency at which analysts can work, there has been little research on how Analysis takes place within the social and organizationalcontext of companies. To better understand the Enterprise analysts ecosystem, we conducted semi-structured interviews with 35data analysts from 25 organizations across a variety of sectors, including healthcare, retail, marketing and finance. Based on ourinterview data , we characterize the process of industrial data Analysis and document how organizational features of an enterpriseimpact it.

2 We describe recurring pain points, outstanding challenges, and barriers to adoption for visual analytic tools. Finally, wediscuss design implications and opportunities for visual Analysis Terms data , Analysis , visualization , gather increasingly large and complex data sets eachyear. These organizations rely on data Analysis to model customerengagement, streamline operations, improve production, inform salesand business decisions, and combat fraud. Within organizations, an in-creasing number of individuals with varied titles such as businessanalyst , data analyst and data scientist perform such analysts constitute an important and rapidly growing user pop-ulation for Analysis and visualization analysts perform their work within the context of a largerorganization. Analysts often work as a part of an Analysis team orbusiness unit. Little research has observed how existing infrastruc-ture, available data and tools, and administrative and social conven-tions within an organization impact the Analysis process within theenterprise.

3 Understanding how these issues shape analytic workflowscan inform the design of future better understand the day-to-day practices of Enterprise analysts,we conducted semi-structured interviews with 35 analysts from sectorsincluding healthcare, retail, finance, and social networking. We askedanalysts to walk us through the typical tasks they perform, the toolsthey use, the challenges they encounter, and the organizational contextin which Analysis takes this paper, we present the results and Analysis of these find that our respondents are well-described bythree archetypesthat differ in terms of skill set and typical workflows. We find thatthese archetypes vary widely in programming proficiency, reliance oninformation technology (IT) staff and diversity of tasks, and vary lessin statistical proficiency. We then discuss how organizational featuresof an Enterprise , such as the relationship between analysts and IT staffor the diversity of data sources, affect the Analysis process.

4 We alsodescribe how collaboration takes place between analysts. We find thatanalysts seldom share resources such as scripts and intermediate dataproducts. In response, we consider possible impediments to sharingand we characterize the Analysis process described to us by ourrespondents. Our model includesfive high-level tasks:discovery,wrangling, profiling, modelingandreporting. We find that discov- Sean Kandel is with Stanford University, e-mail: Andreas Paepcke is with Stanford University, Joseph M. Hellerstein is with UC Berkeley, Jeffrey Heer is with Stanford University, e-mail: received 31 March 2012; accepted 1 August 2012; posted online14 October 2012; mailed on 5 October information on obtaining reprints of this article, please sende-mail to: and wrangling, often the most tedious and time-consuming aspectsof an Analysis , are underserved by existing visualization and analysistools.

5 We discuss recurring pain points within each task as well asdifficulties in managing workflows across these tasks. Example painpoints include integrating data from distributed data sources, visual-izing data at scale and operationalizing workflows. These challengesare typically more acute within large organizations with a diverse anddistributed set of data conclude with a discussion of future trends and the implica-tions of our interviews for future visualization and Analysis tools. Weargue that future visual Analysis tools should leverage existing infras-tructures for data processing to enable scale and limit data avenue for achieving better interoperability is through systemsthat specify Analysis or data processing operations in a high-level lan-guage, enabling retargeting across tools or platforms. We also notethat the current lack of reusable workflows could be improved via lessintrusive methods for recording data researchers have studied analysts and their processes within in-telligence agencies [5,18,24,25,30].

6 This work characterizes in-telligence analysts process, discusses challenges within the process,and describes collaboration among analysts. Although there is muchoverlap in the high-level analytic process of intelligence and enterpriseanalysts, these analysts often work on different types of data with dif-ferent analytic goals and therefore perform different low-level example, Enterprise analysts more often perform Analysis on struc-tured data than on documents and have characterized tasks and challenges within the analysisprocess [1,21,26,34,35]. Amar et al. [1] propose a set of low-levelanalysis tasks based on the activities of students in an Information Vi-sualization class. Their taxonomy largely includes tasks subsumed byourprofileandmodeltasks and does not address the other tasks wehave identified. Russell et al. [34] characterize high-level sensemak-ing activities necessary for Analysis .

7 We instead identify specific tasksperformed by Enterprise analysts. Sedlmair et al. [35] discuss diffi-culties evaluating visualization tools in large corporations, includingacquiring and integrating data . We discuss common challenges withinthese subtasks. Kwon and Fisher [26] discuss challenges novices en-counter when using visual analytic tools. In contrast, our subjects arelargely expert users of their et al. [9] performed an ethnographic study of cyber-securityanalysts. They find that visual analytic tools in this domain have lim-ited interoperability with other tools, lack support for performing adhoc transformations of data , and typically do not scale to the necessaryvolume and diversity of data . We find similar issues across researchers have articulated the importance of capturingprovenance to manage analytic workflows [2,11,12,15]. Such sys-tems often include logs of automatically recorded interaction historiesand manual annotations such as notes.

8 Here, we discuss the difficultyof recording provenance in Enterprise workflows, which typically spanmultiple tools and evolving, distributed research projects have demonstrated benefits for collab-orative Analysis and developed tools to foster such et al. [20,21] discuss design considerations for supportingsynchronous, co-located collaboration. Similar to intelligence ana-lysts [25], we have found that most Enterprise analysts collaborateasynchronously. We discuss how and when these analysts [7,14,37] discuss design considerations to support work par-allelization and communication in asynchronous social data discuss the types of resources that analysts must share to enablecollaboration and the impediments they have also advocated the use of visualization acrossmore phases of the Analysis life-cycle [22]. Our Analysis corrobo-rates this suggestion.

9 Examples include visualizations to assist schemamapping for data integration [13,33] and visual analytics for data de-duplication [6,23]. Our interviews indicate that such tools are sorelyneeded, and that visualization might be further applied to tasks suchas discovery and data conducted semi-structured interviews with Enterprise analysts tobetter understand their process and needs. We use the term analyst to refer to anyone whose primary job function includes working withdata to answer questions that inform business ParticipantsWe interviewed 35 analysts (26 male / 9 female) from 25 organiza-tions. Our interviewees held a number of job titles, including data an-alyst , data scientist , software engineer , consultant , and chieftechnical officer . The organizations were from 15 sectors includ-ing healthcare, retail, social networking, finance, media, marketing,and insurance (see Figure1for the complete list).

10 The organizationsranged in size from startups with fewer than 10 employees to corpora-tions with tens of thousands of employees. The analysts ranged graduates in their first year of work to Chief data Scientists with10-20 years of recruited interviewees by emailing contacts at organizationswithin our personal and professional networks. In some cases, weemailed analysts directly. In others, we emailed individuals who for-warded us to analysts within their organization. This recruitment strat-egy introduces potential bias in our results. For example, the majorityof our interviewees were based in Northern California. Also, many ofthe analysts were sophisticated programmers. To be clear, our researchgoal is to characterize the space of analytic workflows and challenges,notto quantify the prevalence of any specific activity. Other meth-ods, such as surveys or analyzing online job postings, would be bettersuited for quantifying our InterviewsWe conducted semi-structured interviews with 1 to 4 analysts at a began each interview with a quick introduction describing the pur-pose of the interview: to understand analysts day-to-day work prac-tices and any challenges they face.


Related search queries