Example: biology

REDD: A Public Data Set for Energy Disaggregation Research

REDD: A Public Data Set forEnergy Disaggregation ResearchJ. Zico KolterComputer Science and Artificial IntelligenceLaboratoryMassachusetts Institute of TechnologyCambridge, J. JohnsonLaboratory for Information and Decision SystemsMassachusetts Institute of TechnologyCambridge, and sustainability issues raise a large number ofproblems that can be tackled using approaches from datamining and machine learning, but traction of such problemshas been slow due to the lack of publicly available data. Inthis paper we present the Reference Energy DisaggregationData Set (REDD), a freely available data set containing de-tailed power usage information from several homes, which isaimed at furthering Research on Energy Disaggregation (thetask of determining the component appliance contributionsfrom an aggregated electricity signal). We discuss past ap-proaches to Disaggregation and how they have influenced ourdesign choices in collecting data, we describe the hardwareand software setups for the data collection, and we presentinitial benchmark Disaggregation results using a well-knownFactorial Hidden Markov Model (FHMM) INTRODUCTIONE nergy and sustainability problems represent one of thegreatest challenges facing society.

relevance of data mining and machine learning techniques, there has been relatively little work in these areas, at least compared to other applications areas such as computational Permission to make digital or hard copies of all or part of this work for personal or classroom …

Tags:

  Technique, Digital

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of REDD: A Public Data Set for Energy Disaggregation Research

1 REDD: A Public Data Set forEnergy Disaggregation ResearchJ. Zico KolterComputer Science and Artificial IntelligenceLaboratoryMassachusetts Institute of TechnologyCambridge, J. JohnsonLaboratory for Information and Decision SystemsMassachusetts Institute of TechnologyCambridge, and sustainability issues raise a large number ofproblems that can be tackled using approaches from datamining and machine learning, but traction of such problemshas been slow due to the lack of publicly available data. Inthis paper we present the Reference Energy DisaggregationData Set (REDD), a freely available data set containing de-tailed power usage information from several homes, which isaimed at furthering Research on Energy Disaggregation (thetask of determining the component appliance contributionsfrom an aggregated electricity signal). We discuss past ap-proaches to Disaggregation and how they have influenced ourdesign choices in collecting data, we describe the hardwareand software setups for the data collection, and we presentinitial benchmark Disaggregation results using a well-knownFactorial Hidden Markov Model (FHMM) INTRODUCTIONE nergy and sustainability problems represent one of thegreatest challenges facing society.

2 More than 83% of theworld s Energy comes from (unsustainable) fossil fuels, withrenewable Energy from wind, solar, geothermal and biomassmaking up only approximately 2% of the total [11]. Mean-while, the demand for Energy is constantly growing: world-wide Energy production grew by 46% in the the 20 yearsfrom 1987 to 2007 [11]. The simple physical limits of ourcurrent Energy resources, as well as the environmental andclimate impact of burning massive amounts of fossil fuels,make a Research focus on issues of sustainability , there are numerous problems in sustainabilitythat are fundamentally data analysis and prediction tasks,areas where techniques from data mining and machine learn-ing can prove the importance of sustainability Research and therelevance of data mining and machine learning techniques,there has been relatively little work in these areas, at leastcompared to other applications areas such as computationalPermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page.

3 To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a 2011 August 2011, San Diego, CA, USAC opyright 2011 ACM 978-1-4503-0840-3 ..$ :0006:0012:0018:0000:0001000200030004000 500060007000 Time of DayWattsLightingElectronicsRefrigeratorB athroom GFID ishwaserMicrowaveKitchen OutletsWasher DryerFigure 1: An example of Energy consumption overthe course of a day for one of the houses in or machine vision. We argue that this situation is atleast partly due to the scarcity ofpublicly available dataforsuch domains. For example, although there are vast amountsof data relevant to Energy domains (the Energy consumptionof each individual building and household in the country, theloading of each electrical transmission and distribution line)the majority of this data is unavailable to researchers. Fur-thermore, there is significant evidence that publicly avail-able data sets have spurred previous applications areas inmachine learning and data mining: biological applicationshave been aided greatly by the data sharing mandates ofbiological journals and government organizations [16, 12];many early successes in natural language processing werespawned by the now-classic Wall Street Journal corpus [10];and machine vision Research has been aided greatly by com-mon benchmark datasest such as MNIST digit recognition[9], CalTech 101 [3], and the PASCAL challenge [2].

4 De-spite some initial progress towards this same goal for energyand sustainability domains [17], there are currently few suchdata sets geared to the ML and data mining this paper, we present our work on developing a publicdata set of this type, termed the Reference Energy Disag-gregation Data Set (REDD). The data is specifically gearedtowards the task ofenergy Disaggregation : determining thecomponent devices from an aggregated electricity consists of whole-home and circuit/device specificelectricity consumption for a number of real houses overseveral months time. For each monitored house, we record(1) the whole home electricity signal (current monitors onboth phases of power and a voltage monitor on one phase)recorded at a high frequency (15kHz); (2) up to 24 individ-ual circuits in the home, each labeled with its category ofappliance or appliances, recorded at Hz; (3) up to 20plug-level monitors in the home, recorded at 1 Hz, with afocus on logging electronics devices where multiple devicesare grouped to a single circuit.

5 An example of this type ofdata is shown in Figure 1. As of the time of writing (June15th, 2011), we have 10 homes monitored, with a total of 119days of data (combined over all homes), 268 unique moni-tors, and more than 1 terabyte of raw data. To the best ofour knowledge, REDD represents the largest publicly avail-able data set for Disaggregation with the true loads of eachhouse identified. The entirety of the data as well as code forparsing the data and running basic algorithms is publiclyavailable on the web: we present some basic results on disaggregationhere, the focus of this paper is the data set itself: the designdecisions that went into the data collection, as well as thehardware and software system. We begin by presenting abrief overview of existing work on Disaggregation and dis-cuss how this influenced our choices of which data to collectfrom each home and at what frequency. We then describethe software and hardware systems we have built for thistask, and discuss their strengths and limitations.

6 Finally,we present brief results on the data, and highlight severaldirections for future algorithmic Energy DISAGGREGATIONE nergy Disaggregation , also referred to as a non-intrusiveload monitoring (NILM),1is the task of using an aggregateenergy signal, such as that coming from a whole-home powermonitor, to make inferences about the different individualloads of the system. The value of this technology is thatinformation about individual appliances is much more use-ful to consumers than simply total electricity usage; stud-ies have shown that user feedback of this type can inducebehavior chances that improve user efficiency by 15% [1,13]. Disaggregation technology is also seen as an intermedi-ate between existing electricity meters (which merely recordwhole-home power usage at some frequency) and a fullyenergy-aware home appliance network, where each devicereports its consumption to a central location; an oft-statedgoal of Disaggregation Research is to push Energy awarenessto a ubiquitous level, paving the way for more detailed en-ergy monitoring in the work on Energy Disaggregation began with thework of Hart et.

7 Al [6] in the 1980s and 1990s. The ini-tial approaches look for sharp edges (corresponding to de-vice on/off events) in both the real and reactive power sig-nals, and cluster devices according to these changes in con-sumption. Later work has explored a number of differentdirections: Using more complex device models with multi-ple states, integrating frequency analysis and other featuresof the AC waveforms, and making use of external featuressuch as time of day or weather conditions. A recent reviewof numerous existing techniques for Energy disaggregationcan be found in [18]. In this paper, we highlight some of thekeydistinctionswhich have characterized past work in en-ergy Disaggregation and how they have informed our choicesfor of work has spanneda broad range in terms of the frequency of Energy measure-1 Some authors make subtle distinctions between Energy dis-aggregation and NILM, but for our purposes we treat theseterms as used for Disaggregation : some work has used averagepower measurements over periods as long as an hour [8],while others have analyzed the harmonics of AC waveformsusing MHz resolutions [14, 5].

8 2 Most approaches fall some-where in between these two extremes, with many studieseither using power readings on the order of a 1 Hz rate orAC current measurements on the order of several higher-frequency measurements can be sub-sampled toproduce lower-frequency data, for our purposes of data col-lection it makes sense to collect data at the highest frequencypossible up to the feasibility of storing the data. We chose15kHz monitoring (for the whole-home data) as a trade-offbetween these / Reactive work has also differedin whether the methods consider only the real power sig-nal or both the real and reactive decision isconnected to the point above, since real and reactive powerscan be computed using measurements of the AC waveform,but reactive power is a common enough quantity to meritits own distinction. For REDD, since we are collecting theAC waveform itself, we can easily compute both real andreactive of External past approach use ex-ternal features such as time of day, day of year, or weatherinformation, whereas some merely use the power signal it-self.

9 All data in REDD is recorded with UTC time stamps,along with general geographical information (only up to acity level, for privacy reasons), so that it can be associatedwith such external / Unsupervised approachesto Energy Disaggregation have beensupervised, in that thesystem is trained on individual device power signals (or isgiven manually identified device change-points in a whole-home Energy signal). Alternatively, some recent work hasadvocatedunsupervisedapproaches that consider the wholehome signal without labeling, and automatically separatedifferent signals [7]. To facilitate supervised approaches andto aid in evaluating all approaches, REDD includes as much supervised information as possible: we monitor each in-dividual circuit in the home (especially important for largeloads that cannot be easily monitored by a plug load) as wellas many large plugs loads as is / Testing key dis-tinction (which has not been greatly considered in past en-ergy Disaggregation work) is generalizing from training datato test data.

10 The vast majority of previous disaggregationapproaches (at least those with rigorous quantitative evalu-ation) have typically evaluated the algorithms on the samedevices (but in different conditions) as they were trainedon; that is, they attempt to build a model that can dis-aggregate a given appliance even in new conditions, but donot attempt to build models that explicitlygeneralizeacrossmultiple different devices of the same category. In our own2 The work of Patel el al. is substantially different frommost other approaches to Disaggregation , as they use highfrequency measurements to look for transients of thevoltagesignal of the home, and not necessarily the the data mining community may not be familiarwith this terminology, briefly, real power corresponds to thepower that is actual consumed by an appliance, whereas re-active power corresponds to current that flows through acircuit, but is put back into the system typically via an in-ductive load in the appliance.


Related search queries