Example: barber

Intro to Apache Spark - Stanford University

Intro to Apache Spark ! slides: : Getting Startedinstalls + intros, while people arrive: 20 minBest to download the slides to your laptop: Be sure to complete the course survey: In addition to these slides, all of the code samples are available on GitHub gists: : Online Course MaterialsBy end of day, participants will be comfortable with the following: open a Spark Shell use of some ML algorithms explore data sets loaded from HDFS, etc. review Spark SQL, Spark Streaming, Shark review advanced topics and BDAS projects follow-up courses and certification developer community resources, events, etc. return to workplace and demo use of Spark !

By end of day, participants will be comfortable with the following:! • open a Spark Shell! • use of some ML algorithms! • explore data sets loaded from HDFS, etc.! • review Spark SQL, Spark Streaming, Shark! • review advanced topics and BDAS projects! • follow-up courses and certification! • developer community resources, events, etc.! • return to workplace and demo …

Tags:

  Spark

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Intro to Apache Spark - Stanford University

1 Intro to Apache Spark ! slides: : Getting Startedinstalls + intros, while people arrive: 20 minBest to download the slides to your laptop: Be sure to complete the course survey: In addition to these slides, all of the code samples are available on GitHub gists: : Online Course MaterialsBy end of day, participants will be comfortable with the following: open a Spark Shell use of some ML algorithms explore data sets loaded from HDFS, etc. review Spark SQL, Spark Streaming, Shark review advanced topics and BDAS projects follow-up courses and certification developer community resources, events, etc. return to workplace and demo use of Spark !

2 Intro : Success Criteria intros what is your background? who needs to use AWS instead of laptops? PEM key, if needed? See tutorial: Connect to Your Amazon EC2 Instance from Windows Using PuTTYI ntro: PreliminariesInstallation01: Getting Startedhands-on lab: 20 minLet s get started using Apache Spark , in just four easy (for class, please copy from the USB sticks) follow the license agreement instructions then click the download for your OS need JDK instead of JRE (for Maven, etc.) (for class, please copy from the USB sticks)Step 1: Install Java JDK 6/7 on MacOSX or Windowsthis is much simpler on ! sudo apt-get -y install openjdk-7-jdkStep 1: Install Java JDK 6/7 on Linuxwe ll be using Spark see this URL with a browser click the archive file to open it into the newly created directory (for class, please copy from the USB sticks)Step 2: Download Sparkwe ll run Spark s interactive.

3 /bin/ Spark -shell!then from the scala> REPL prompt, let s create some val data = 1 to 10000 Step 3: Run Spark Shellcreate an RDD based on that val distData = (data)!then use a filter to select values less than (_ < 10).collect()Step 4: Create an RDDcreate an val distData = (data)then use a filter to select values less than dStep 4: Create an RDDC heckpoint: what do you get for results? #file-01-repl-txtFor Python , check out Anaconda by Continuum Analytics for a full-featured platform: : Optional Downloads: PythonJava builds later also require Maven, which you can download at: : Optional Downloads: MavenSpark Deconstructed03: Getting Startedlecture: 20 minLet s spend a few minutes on this Scala Deconstructed:// load error messages from a log into memory!

4 // then interactively search for various patterns!// !!// base RDD!val lines = ("hdfs://..")!!// transformed RDDs!val errors = ( ("ERROR"))!val messages = ( ("\t")).map(r => r(1))! ()!!// action 1! ( ("mysql")).count()!!// action 2! ( ("php")).count() Spark Deconstructed: Log Mining ExampleDriverWorkerWorkerWorkerSpark Deconstructed: Log Mining ExampleWe start with Spark running on a submitting code to be evaluated on it:// base RDD!val lines = ("hdfs://..")!!// transformed RDDs!val errors = ( ("ERROR"))!val messages = ( ("\t")).map(r => r(1))! ()!!// action 1! ( ("mysql")).count()!!// action 2! ( ("php")).count() Spark Deconstructed: Log Mining Examplediscussing the other partSpark Deconstructed: Log Mining Examplescala> !

5 Res5: String = !MappedRDD[4] at map at <console>:16 (3 partitions)! MappedRDD[3] at map at <console>:16 (3 partitions)! FilteredRDD[2] at filter at <console>:14 (3 partitions)! MappedRDD[1] at textFile at <console>:12 (3 partitions)! HadoopRDD[0] at textFile at <console>:12 (3 partitions)At this point, take a look at the transformed RDD operator graph:DriverWorkerWorkerWorkerSpark Deconstructed: Log Mining Example// base RDD!val lines = ("hdfs://..")!!// transformed RDDs!val errors = ( ("ERROR"))!val messages = ( ("\t")).map(r => r(1))! ()!!// action 1! ( ("mysql")).count()!!// action 2! ( ("php")).count()discussing the other partDriverWorkerWorkerWorkerblock 1block 2block 3 Spark Deconstructed: Log Mining Example// base RDD!

6 Val lines = ("hdfs://..")!!// transformed RDDs!val errors = ( ("ERROR"))!val messages = ( ("\t")).map(r => r(1))! ()!!// action 1! ( ("mysql")).count()!!// action 2! ( ("php")).count()discussing the other partDriverWorkerWorkerWorkerblock 1block 2block 3 Spark Deconstructed: Log Mining Example// base RDD!val lines = ("hdfs://..")!!// transformed RDDs!val errors = ( ("ERROR"))!val messages = ( ("\t")).map(r => r(1))! ()!!// action 1! ( ("mysql")).count()!!// action 2! ( ("php")).count()discussing the other partDriverWorkerWorkerWorkerblock 1block 2block 3readHDFS blockreadHDFS blockreadHDFS blockSpark Deconstructed: Log Mining Example// base RDD!

7 Val lines = ("hdfs://..")!!// transformed RDDs!val errors = ( ("ERROR"))!val messages = ( ("\t")).map(r => r(1))! ()!!// action 1! ( ("mysql")).count()!!// action 2! ( ("php")).count()discussing the other partDriverWorkerWorkerWorkerblock 1block 2block 3cache 1cache 2cache 3process,cache dataprocess,cache dataprocess,cache dataSpark Deconstructed: Log Mining Example// base RDD!val lines = ("hdfs://..")!!// transformed RDDs!val errors = ( ("ERROR"))!val messages = ( ("\t")).map(r => r(1))! ()!!// action 1! ( ("mysql")).count()!!// action 2! ( ("php")).count()discussing the other partDriverWorkerWorkerWorkerblock 1block 2block 3cache 1cache 2cache 3 Spark Deconstructed: Log Mining Example// base RDD!

8 Val lines = ("hdfs://..")!!// transformed RDDs!val errors = ( ("ERROR"))!val messages = ( ("\t")).map(r => r(1))! ()!!// action 1! ( ("mysql")).count()!!// action 2! ( ("php")).count()discussing the other part// base RDD!val lines = ("hdfs://..")!!// transformed RDDs!val errors = ( ("ERROR"))!val messages = ( ("\t")).map(r => r(1))! ()!!// action 1! ( ("mysql")).count()!!// action 2! ( ("php")).count()DriverWorkerWorkerWorker block 1block 2block 3cache 1cache 2cache 3 Spark Deconstructed: Log Mining Examplediscussing the other partDriverWorkerWorkerWorkerblock 1block 2block 3cache 1cache 2cache 3processfrom cacheprocessfrom cacheprocessfrom cacheSpark Deconstructed: Log Mining Example// base RDD!

9 Val lines = ("hdfs://..")!!// transformed RDDs!val errors = ( ("ERROR"))!val messages = ( ("\t")).map(r => r(1))! ()!!// action 1! ( ( mysql")).count()!!// action 2! ( ("php")).count()discussing the other partDriverWorkerWorkerWorkerblock 1block 2block 3cache 1cache 2cache 3 Spark Deconstructed: Log Mining Example// base RDD!val lines = ("hdfs://..")!!// transformed RDDs!val errors = ( ("ERROR"))!val messages = ( ("\t")).map(r => r(1))! ()!!// action 1! ( ( mysql")).count()!!// action 2! ( ("php")).count()discussing the other partLooking at the RDD transformations and actions from another Deconstructed:actionvalueRDDRDDRDD transformationsRDD// load error messages from a log into memory!

10 // then interactively search for various patterns!// !!// base RDD!val lines = ("hdfs://..")!!// transformed RDDs!val errors = ( ("ERROR"))!val messages = ( ("\t")).map(r => r(1))! ()!!// action 1! ( ("mysql")).count()!!// action 2! ( ("php")).count() Spark Deconstructed:RDD// base RDD!val lines = ("hdfs://..")RDDRDDRDD transformationsRDDS park Deconstructed:// transformed RDDs!val errors = ( ("ERROR"))!val messages = ( ("\t")).map(r => r(1))! ()actionvalueRDDRDDRDD transformationsRDDS park Deconstructed:// action 1! ( ("mysql")).count()Simple Spark Apps04: Getting Startedlab: 20 minSimple Spark Apps: WordCountvoid map (String doc_id, String text):!


Related search queries