Example: confidence

Automatic Database Management System Tuning …

Automatic Database Management System Tuning ThroughLarge-scale machine LearningDana Van AkenAndrew PavloGeoffrey J. GordonBohan ZhangCarnegie Mellon University Carnegie Mellon University Carnegie Mellon UniversityPeking Management System (DBMS) configuration Tuning is anessential aspect of any data-intensive application effort. But thisis historically a difficult task because DBMSs have hundreds ofconfiguration knobs that control everything in the System , suchas the amount of memory to use for caches and how often datais written to storage. The problem with these knobs is that theyare not standardized ( , two DBMSs use a different name for thesame knob), not independent ( , changing one knob can impactothers), and not universal ( , what works for one application maybe sub-optimal for another). Worse, information about the effectsof the knobs typically comes only from (expensive) overcome these challenges, we present an automated approachthat leverages past experience and collects new information to tuneDBMS configurations: we use a combination of supervised and un-supervised machine learning methods to (1) select the most impact-ful knobs, (2) map unseen Database workloads to previous work-loads from which we can transfer experience, and (3) recommendk

Automatic Database Management System Tuning Through Large-scale Machine Learning Dana Van Aken Andrew Pavlo Geoffrey J. Gordon Bohan Zhang Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University Peking University

Tags:

  Database, System, Management, Machine, Automatic, Through, Tuning, Automatic database management system tuning, Automatic database management system tuning through

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Automatic Database Management System Tuning …

1 Automatic Database Management System Tuning ThroughLarge-scale machine LearningDana Van AkenAndrew PavloGeoffrey J. GordonBohan ZhangCarnegie Mellon University Carnegie Mellon University Carnegie Mellon UniversityPeking Management System (DBMS) configuration Tuning is anessential aspect of any data-intensive application effort. But thisis historically a difficult task because DBMSs have hundreds ofconfiguration knobs that control everything in the System , suchas the amount of memory to use for caches and how often datais written to storage. The problem with these knobs is that theyare not standardized ( , two DBMSs use a different name for thesame knob), not independent ( , changing one knob can impactothers), and not universal ( , what works for one application maybe sub-optimal for another). Worse, information about the effectsof the knobs typically comes only from (expensive) overcome these challenges, we present an automated approachthat leverages past experience and collects new information to tuneDBMS configurations: we use a combination of supervised and un-supervised machine learning methods to (1) select the most impact-ful knobs, (2) map unseen Database workloads to previous work-loads from which we can transfer experience, and (3) recommendknob settings.

2 We implemented our techniques in a new tool calledOtterTune and tested it on three DBMSs. Our evaluation shows thatOtterTune recommends configurations that are as good as or betterthan ones generated by existing tools or a human INTRODUCTIONThe ability to collect, process, and analyze large amounts of datais paramount for being able to extrapolate new knowledge in busi-ness and scientific domains [35, 25]. DBMSs are the critical com-ponent of data-intensive ( Big Data ) applications [46]. The per-formance of these systems is often measured in metrics such asthroughput ( , how fast it can collect new data) and latency ( ,how fast it can respond to a request).Achieving good performance in DBMSs is non-trivial as they arecomplex systems with many tunable options that control nearly allaspects of their runtime operation [24]. Such configuration knobsallow the Database administrator (DBA) to control various aspectsof the DBMS s runtime behavior.

3 For example, they can set howmuch memory the System allocates for data caching versus thetransaction log buffer. Modern DBMSs are notorious for havingPermission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from 17, May 14 - 19, 2017, Chicago, IL, USA 2017 Copyright held by the owner/author(s). Publication rights licensed to 978-1-4503-4197-4/17/05.. $ : configuration knobs [22, 47, 36].

4 Part of what makes DBMSsso enigmatic is that their performance and scalability are highly de-pendent on their configurations. Further exacerbating this problemis that the default configurations of these knobs are notoriously an example, the default MySQL configuration in 2016 assumesthat it is deployed on a machine that only has 160 MB of RAM [1].Given this, many organizations resort to hiring expensive expertsto configure the System s knobs for the expected workload. But asdatabases and applications grow in both size and complexity, opti-mizing a DBMS to meet the needs of an application has surpassedthe abilities of humans [11]. This is because the correct configura-tion of a DBMS is highly dependent on a number of factors that arebeyond what humans can reason attempts at Automatic DBMS configuration tools havecertain deficiencies that make them inadequate for general purposedatabase applications.

5 Many of these Tuning tools were createdby vendors, and thus they only support that particular company sDBMS [22, 33, 37]. The small number of Tuning tools that do sup-port multiple DBMSs still require manual steps, such as having theDBA (1) deploy a second copy of the Database [24], (2) map depen-dencies between knobs [49], or (3) guide the training process [58].All of these tools also examine each DBMS deployment indepen-dently and thus are unable to apply knowledge gained from previ-ous Tuning efforts. This is inefficient because each Tuning effort cantake a long time and use a lot of this paper, we present a technique to reuse training data gath-ered from previous sessions to tune new DBMS deployments. Thecrux of our approach is to train machine learning (ML) modelsfrom measurements collected from these previous tunings, and usethe models to (1) select the most important knobs, (2) map previ-ously unseen Database workloads to known workloads, so that wecan transfer previous experience, and (3) recommend knob settingsthat improve a target objective ( , latency, throughput).

6 Reusingpast experience reduces the amount of time and resources it takesto tune a DBMS for a new application. To evaluate our work, weimplemented our techniques using Google TensorFlow [50] andPython sscikit-learn[39] in a Tuning tool, calledOtterTune,and performed experiments for two OLTP DBMSs (MySQL, Post-gres) and one OLAP DBMS (Vector). Our results show that Ot-terTune produces a DBMS configuration for these workloads thatachieves 58 94% lower latency compared to their default settingsor configurations generated by other Tuning advisors. We also showthat OtterTune generates configurations in under 60 min that arewithin 94% of ones created by expert remainder of this paper is organized as follows. Sect. 2 be-gins with a discussion of the challenges in Database Tuning . We thenprovide an overview of our approach in Sect. 3, followed by a de-scription of our techniques for collecting DBMS metrics in Sect.

7 4,1009 Buffer pool size (MB)5001000150020002500 Log file size (MB)20040060080099th %-tile (sec) (a)Dependencies50010001500200025003000 Buffer pool size (MB) %-tile (sec)(b)Continuous SettingsConfig #1 Config #2 Config # %-tile (sec)Workload #1 Workload #2 Workload #3(c)Non-Reusable Configurations20002004200820122016 Release date0200400600 Number of knobsMySQLP ostgres(d) Tuning ComplexityFigure 1: Motivating Examples Figs. 1a to 1c show performance measurements for the YCSB workload running on MySQL ( ) using differentconfiguration settings. Fig. 1d shows the number of tunable knobs provided in MySQL and Postgres releases over the knobs that have the most impact in Sect. 5, and rec-ommending settings in Sect. 6. In Sect. 7, we present our experi-mental evaluation. Lastly, we conclude with related work in Sect. MOTIVATIONT here are general rules or best practice guidelines availablefor Tuning DBMSs, but these do not always provide good resultsfor a range of applications and hardware configurations.

8 Althoughone can rely on certain precepts to achieve good performance on aparticular DBMS, they are not universal for all applications. Thus,many organizations resort to hiring expensive experts to tune theirsystem. For example, a 2013 survey found that 40% of engagementrequests for a large Postgres service company were for DBMS tun-ing and knob configuration issues [36].One common approach to Tuning a DBMS is for the DBA to copythe Database to another machine and manually measure the perfor-mance of a sample workload from the real application. Based onthe outcome of this test, they will then tweak the DBMS s configu-ration according to some combination of Tuning guidelines and in-tuition based on past experiences. The DBA then repeats the exper-iment to see whether the performance improves [47]. Such a trial-and-error approach to DBMS Tuning is tedious, expensive, andinefficient because (1) many of the knobs are not independent [24],(2) the values for some knobs are continuous, (3) one often cannotreuse the same configuration from one application to the next, and(4) DBMSs are always adding new now discuss these issues in further detail.

9 To highlight theirimplications, we ran a series of experiments using MySQL ( )that execute variations of the YCSB workload with different knobsettings. We present the details of our operating environment forthese experiments in Sect. :DBMS Tuning guides strongly suggest that aDBA only change one knob at a time. This is wise but woefullyslow given the large number of knobs. It is also not entirely helpfulbecause changing one knob may affect the benefits of another. Butit is difficult enough for humans to understand the impact of oneknob let alone the interactions between multiple ones. The differ-ent combinations of knob settings means that finding the optimalconfiguration isNP-hard [49]. To demonstrate this point, we mea-sured the performance of MySQL for different configurations thatvary the size of its buffer pool1and the size of its log in Fig. 1a show that the DBMS achieves better performancewhen both the buffer pool and log file sizes are large.

10 But in gen-eral, the latency is low when the buffer pool size and log file sizeare balanced. If the buffer pool is large and the log file size issmall, then the DBMS maintains a smaller number of dirty pagesand thus has to perform more flushes to Settings:Another difficult aspect of DBMS tuningis that there are many possible settings for knobs, and the differ-1 MySQL Knob:innodb_buffer_pool_size2 MySQL Knob:innodb_log_file_sizeences in performance from one setting to the next could be irreg-ular. For example, the size of the DBMS s buffer pool can be anarbitrary value from zero to the amount of DRAM on the some ranges, a GB increase in this knob could be incon-sequential, while in other ranges, a GB increase could causeperformance to drop precipitously as the DBMS runs out of phys-ical memory. To illustrate this point, we ran another experimentwhere we increase MySQL s buffer pool size from 10 MB to 3 results in Fig.


Related search queries