Example: stock market

www.it-ebooks - cnblogs.com

For Python DevelopersA concise guide to implementing Spark big data analytics for Python developers and building a real-time and insightful trend tracker data-intensive appAmit NandiBIRMINGHAM - for Python DevelopersCopyright 2015 Packt PublishingAll rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied.

Credits Author Amit Nandi Reviewers Manuel Ignacio Franco Galeano Rahul Kavale Daniel Lemire Chet Mancini Laurence Welch Commissioning Editor Amarabha Banerjee

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of www.it-ebooks - cnblogs.com

1 For Python DevelopersA concise guide to implementing Spark big data analytics for Python developers and building a real-time and insightful trend tracker data-intensive appAmit NandiBIRMINGHAM - for Python DevelopersCopyright 2015 Packt PublishingAll rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied.

2 Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this published: December 2015 Production reference: 1171215 Published by Packt Publishing Place35 Livery StreetBirmingham B3 2PB, NandiReviewersManuel Ignacio Franco GaleanoRahul KavaleDaniel LemireChet ManciniLaurence WelchCommissioning EditorAmarabha BanerjeeAcquisition EditorSonali VernekarContent Development EditorMerint Thomas MathewTechnical EditorNaveenkumar JainCopy EditorRoshni BanerjeeProject CoordinatorSuzanne CoutinhoProofreaderSafis EditingIndexerPriya SaneGraphicsKirk D'PenhaProduction CoordinatorShantanu N.

3 ZagadeCover WorkShantanu N. the AuthorAmit Nandi studied physics at the Free University of Brussels in Belgium, where he did his research on computer generated holograms. Computer generated holograms are the key components of an optical computer, which is powered by photons running at the speed of light. He then worked with the university Cray supercomputer, sending batch jobs of programs written in Fortran. This gave him a taste for computing, which kept growing. He has worked extensively on large business reengineering initiatives, using SAP as the main enabler. He focused for the last 15 years on start-ups in the data space, pioneering new areas of the information technology landscape.

4 He is currently focusing on large-scale data-intensive applications as an enterprise architect, data engineer, and software developer. He understands and speaks seven human languages. Although Python is his computer language of choice, he aims to be able to write fluently in seven computer languages want to express my profound gratitude to my parents for their unconditional love and strong support in all my book arose from an initial discussion with Richard Gall, an acquisition editor at Packt Publishing. Without this initial discussion, this book would never have happened. So, I am grateful to him. The follow ups on discussions and the contractual terms were agreed with Rebecca Youe.

5 I would like to thank her for her support. I would also like to thank Merint Mathew, a content editor who helped me bring this book to the finish line. I am thankful to Merint for his subtle persistence and tactful support during the write ups and revisions of this are standing on the shoulders of giants. I want to acknowledge some of the giants who helped me shape my thinking. I want to recognize the beauty, elegance, and power of Python as envisioned by Guido van Rossum. My respectful gratitude goes to Matei Zaharia and the team at Berkeley AMP Lab and Databricks for developing a new approach to computing with Spark and Mesos. Travis Oliphant, Peter Wang, and the team at are doing a tremendous job of keeping Python relevant in a fast-changing computing landscape.

6 Thank you to you the ReviewersManuel Ignacio Franco Galeano is a software developer from Colombia. He holds a computer science degree from the University of Quind o. At the moment of publication of this book, he was studying to get his MSc in computer science from University College Dublin, Ireland. He has a wide range of interests that include distributed systems, machine learning, micro services, and so on. He is looking for a way to apply machine learning techniques to audio data in order to help people learn more about Kavale works as a software developer at TinyOwl Ltd. He is interested in multiple technologies ranging from building web applications to solving big data problems.

7 He has worked in multiple languages, including Scala, Ruby, and Java, and has worked on Apache Spark, Apache Storm, Apache Kafka, Hadoop, and Hive. He enjoys writing Scala. Functional programming and distributed computing are his areas of interest. He has been using Spark since its early stage for varying use cases. He has also helped with the review for the Pragmatic Scala Lemire has a BSc and MSc in mathematics from the University of Toronto and a PhD in engineering mathematics from the Ecole Polytechnique and the Universit de Montr al. He is a professor of computer science at the Universit du Qu bec. He has also been a research officer at the National Research Council of Canada and an entrepreneur.

8 He has written over 45 peer-reviewed publications, including more than 25 journal articles. He has held competitive research grants for the last 15 years. He has been an expert on several committees with funding agencies (NSERC and FQRNT). He has served as a program committee member on leading computer science conferences (for example, ACM CIKM, ACM WSDM, ACM SIGIR, and ACM RecSys). His open source software has been used by major corporations such as Google and Facebook. His research interests include databases, information retrieval and high-performance programming. He blogs regularly on computer science at Mancini is a data engineer at Intent Media, Inc in New York, where he works with the data science team to store and process terabytes of web travel data to build predictive models of shopper behavior.

9 He enjoys functional programming, immutable data structures, and machine learning. He writes and speaks on topics surrounding data engineering and information is a contributor to Apache Spark and other libraries in the Spark ecosystem. Chet has a master's degree in computer science from Cornell files, eBooks, discount offers, and moreFor support files and downloads related to your book, please visit you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more , you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and you need instant solutions to your IT questions?

10 PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of subscribe? Fully searchable across every book published by Packt Copy and paste, print, and bookmark content On demand and accessible via a web browserFree access for Packt account holdersIf you have an account with Packt at , you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate [ i ]Table of ContentsPreface vChapter 1: Setting Up a Spark Virtual Environment 1 Understanding the architecture of data-intensive applications 3 Infrastructure layer 4 Persistence layer 4 Integration layer 4 Analytics layer 5 Engagement layer 6 Understanding Spark 6 Spark libraries 7 PySpark in action 7 The Resilient Distributed Dataset 8 Understanding Anaconda 10 Setting up the Spark powered environment 12 Setting up an Oracle VirtualBox with Ubuntu 13 Installing Anaconda with Python 13 Installing Java 8 14 Installing Spark 15 Enabling IPython Notebook 16 Building our first app with PySpark 17 Virtualizing the environment with Vagrant 22 Moving to the cloud 24 Deploying apps in


Related search queries