일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 | 29 |
30 | 31 |
Tags
- 테스트주도개발
- 알고스팟
- datalake
- hackercup2017
- 데이터플랫폼
- 클린코드
- 개발7년차매니저1일차
- 데이터야놀자
- 단위테스트
- 2017회고
- 회고
- 개발자로살아남기
- 켄트백
- 해커컵
- 데이터레이크
- coursera
- 박종천
- wait region split
- 코딩인터뷰
- spray
- clean code
- 2016년회고
- 데이터유통
- 실전사례
- kafka
- 동시성
- 개발자
- Raw-Request-URI
- functional thinking
- 함수형 사고
Archives
- Today
- Total
Software Engineering Note
spark dynamic allocation 지원 본문
버전
- hadoop 2.6.5
- spark 2.1.0
spark-defaults.conf 에 이렇게만 추가하고 돌리니
spark.dynamicAllocation.enabled=true
spark.shuffle.service.enabled=true
알 수 없는 에러가 났다.
2021-11-08 17:22:17,230 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at LAUNCHED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:758) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:845) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:826) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) at java.lang.Thread.run(Thread.java:748)
yarn 클러스터(resource manager, node manager) 에서 추가 작업이 필요하다.
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
jar 복사
sudo cp /path/to/spark/yarn/spark-2.1.0-yarn-shuffle.jar /path/to/hadoop/share/hadoop/yarn/lib/
그리고 위에서 명시한 spark-defaults.conf
...
spark.dynamicAllocation.enabled=true
spark.shuffle.service.enabled=true
참고
- https://support.huaweicloud.com/intl/en-us/devg-mrs/mrs_06_0225.html
- https://stackoverflow.com/questions/53458237/enabling-dynamic-allocation-on-spark-on-yarn-mode
- https://m.blog.naver.com/slykid/221260047113
- https://spark.apache.org/docs/2.1.0/configuration.html (spark.dynamicAllocation.enabled)
'데이터엔지니어' 카테고리의 다른 글
CRDTs with Akka Distributed Data (0) | 2019.01.03 |
---|