本文共 4344 字,大约阅读时间需要 14 分钟。
编译操作系统:centos7spark版本:2.3.0intellj idea版本:2019.1
编译主机上需要预先安装jdk,我已经安装了jdk8。
一.下载源代码包,下载地址:
二 .解压spark源代码包,并修改pom.xml1.修改maven默认仓库地址,选择阿里云仓库可以加速依赖包的下载速度,阿里云仓库地址为http://maven.aliyun.com/nexus/content/groups/public
central Maven Repository http://maven.aliyun.com/nexus/content/groups/public true false central http://maven.aliyun.com/nexus/content/groups/public true false
2.修改maven内存环境变量。
_COMPILE_JVM_OPTS="-Xmx4g -Xms4g -XX:ReservedCodeCacheSize=1024m"
3.编译
# 最简单的编译./build/mvn -DskipTests clean package# 支持hadoop,yarn和hive./build/mvn -Pyarn -Phadoop-2.7 -Phive-thriftserver -DskipTests clean package
编译完成后
[INFO] Spark Project Parent POM ........................... SUCCESS [ 2.563 s][INFO] Spark Project Tags ................................. SUCCESS [ 2.017 s][INFO] Spark Project Sketch ............................... SUCCESS [ 2.030 s][INFO] Spark Project Local DB ............................. SUCCESS [ 2.520 s][INFO] Spark Project Networking ........................... SUCCESS [ 5.194 s][INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 3.779 s][INFO] Spark Project Unsafe ............................... SUCCESS [ 4.370 s][INFO] Spark Project Launcher ............................. SUCCESS [ 6.703 s][INFO] Spark Project Core ................................. SUCCESS [02:02 min][INFO] Spark Project ML Local Library ..................... SUCCESS [ 15.699 s][INFO] Spark Project GraphX ............................... SUCCESS [ 14.218 s][INFO] Spark Project Streaming ............................ SUCCESS [ 40.059 s][INFO] Spark Project Catalyst ............................. SUCCESS [01:31 min][INFO] Spark Project SQL .................................. SUCCESS [03:37 min][INFO] Spark Project ML Library ........................... SUCCESS [02:33 min][INFO] Spark Project Tools ................................ SUCCESS [ 6.483 s][INFO] Spark Project Hive ................................. SUCCESS [02:42 min][INFO] Spark Project REPL ................................. SUCCESS [ 13.618 s][INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 8.528 s][INFO] Spark Project YARN ................................. SUCCESS [ 18.741 s][INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 27.792 s][INFO] Spark Project Assembly ............................. SUCCESS [ 7.543 s][INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 11.258 s][INFO] Kafka 0.10 Source for Structured Streaming ......... SUCCESS [ 9.873 s][INFO] Spark Project Examples ............................. SUCCESS [ 32.471 s][INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 2.612 s][INFO] ------------------------------------------------------------------------[INFO] BUILD SUCCESS[INFO] ------------------------------------------------------------------------[INFO] Total time: 16:26 min[INFO] Finished at: 2020-03-17T05:23:34-04:00[INFO] Final Memory: 94M/3027M[INFO] ------------------------------------------------------------------------
./make-distribution.sh --name hadoop2.7 --tgz -PR -Phadoop-2.7.3 -Phive -Phive-thriftserver -Pyarn
执行上述命令,会在dist目录中生成发布软件包,与在spark官网下载的软件包是一样的。
编译发布参考文档:intellj idea具有强大的代码编辑,阅读与调试功能,这里选择它作为代码阅读与调试工具。
1.用intellij idea打开编译好的源代码包。
注意:intellij idea依赖的scala sdk的版本一定要与spark 源代码包中依赖的scala版本一致,否则后面可能报错。pom.xml
2.11.8 2.11
intellij idea
2.idea导入完成后,spark所有的例子代码位于examples项目下。
3.运行SparkPi
运行报错1:Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Seq
解决方法:
Menu -> File -> Project Structure -> Modules -> spark-examples_2.11 -> Dependencies 添加依赖 jars -> { spark dir}/assembly/target/scala-2.11/jars/
运行报错2:
org.apache.spark.SparkException: A master URL must be set in your configuration
解决方法:
# 代码中指定masterval spark = SparkSession .builder .appName("Spark Pi").master("local[*]") .getOrCreate()
再次运行
至此源代码阅读与调试环境也就搭建完成,可以愉快的Read The Fucking Source Code。转载地址:http://necmb.baihongyu.com/