데이터엔지니어
spark dynamic allocation 지원
devmoons
2021. 11. 9. 12:05
버전
- hadoop 2.6.5
- spark 2.1.0
spark-defaults.conf 에 이렇게만 추가하고 돌리니
spark.dynamicAllocation.enabled=true
spark.shuffle.service.enabled=true
알 수 없는 에러가 났다.
2021-11-08 17:22:17,230 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:758) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:845) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:826) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) at java.lang.Thread.run(Thread.java:748)
yarn 클러스터(resource manager, node manager) 에서 추가 작업이 필요하다.
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
jar 복사
sudo cp /path/to/spark/yarn/spark-2.1.0-yarn-shuffle.jar /path/to/hadoop/share/hadoop/yarn/lib/
그리고 위에서 명시한 spark-defaults.conf
...
spark.dynamicAllocation.enabled=true
spark.shuffle.service.enabled=true
참고
- https://support.huaweicloud.com/intl/en-us/devg-mrs/mrs_06_0225.html
- https://stackoverflow.com/questions/53458237/enabling-dynamic-allocation-on-spark-on-yarn-mode
- https://m.blog.naver.com/slykid/221260047113
- https://spark.apache.org/docs/2.1.0/configuration.html (spark.dynamicAllocation.enabled)