Software Engineering Note

spark dynamic allocation 지원 본문

데이터엔지니어

spark dynamic allocation 지원

devmoons 2021. 11. 9. 12:05

버전

- hadoop 2.6.5

- spark 2.1.0

 

spark-defaults.conf 에 이렇게만 추가하고 돌리니 

spark.dynamicAllocation.enabled=true
spark.shuffle.service.enabled=true

 

알 수 없는 에러가 났다.

2021-11-08 17:22:17,230 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:758) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:845) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:826) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) at java.lang.Thread.run(Thread.java:748)

 

yarn 클러스터(resource manager, node manager) 에서 추가 작업이 필요하다.

 

yarn-site.xml

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle,spark_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
    <value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>

 

jar 복사 

sudo cp /path/to/spark/yarn/spark-2.1.0-yarn-shuffle.jar /path/to/hadoop/share/hadoop/yarn/lib/

 

그리고 위에서 명시한 spark-defaults.conf

...
spark.dynamicAllocation.enabled=true
spark.shuffle.service.enabled=true

 

참고

 

'데이터엔지니어' 카테고리의 다른 글

CRDTs with Akka Distributed Data  (0) 2019.01.03