Study on Spark performance tuning strategy based on Skewed 
Partitioning and Locality-aware Partitioning

Guikun Cao; Haiyuan Yu; Liujia Chang; Heng Zhao

doi:10.18686/esta.v10i4.572

Study on Spark performance tuning strategy based on Skewed Partitioning and Locality-aware Partitioning

Guikun Cao, Haiyuan Yu, Liujia Chang, Heng Zhao

Abstract

Apache Spark is a large-scale data processing engine, widely used in a variety of big data analysis tasks. However, data
skew and data locality issues can cause performance degradation in Spark applications. This paper investigates Spark performance tuning
strategies based on Skewed Partitioning and Locality-aware Partitioning. Firstly, the infl uence of data skew and data localizability problems
in Spark is analyzed, and then a perfo rmance tuning method combining Skewed Partitioning and Locality-aware Partitioning is proposed.
Experimental results show that this method can significantly improve the efficiency of Spark jobs when dealing with large data sets,
compared with traditional HashPartitioner.

Keywords

Skewed Partitioning; Locality-aware Partitioning; Performance tuning; Spark; Data skew

Full Text:

PDF

Included Database

References

[1] Wei Jin,Peng Liu,Ru Li. A review of data skew problems based on Spark [J]. Computer Science, 2021, 48(2): 89-97.

[2] Wanhang Xie,Hongwei Yuan,Fanyi Liu, etal. Overview of Spark SQL Optimization Algorithm [J]. Computer Engineering and Design, 2020, 41(1): 7-13.

[3] Weichao Guo,Yong Yang,Bo Pan, etal. Research review of Spark framework in Big Data analysis [J]. Computer Engineering and Design, 2019, 40(5):

1052-1060.

[4] Fei Hu. Research on Optimization of large-scale Data Processing Architecture Based on Spark [J]. Modern Computer, 2018(22): 72-74.

[5] Yichen Fang,Liping Zhang. Review of data analysis and processing methods based on Spark [J]. Well Logging Technology, 2018, 42(1): 69-75.

[6] Hongjie Chen,Yu Huang. A review of data analysis technology based on Apache Spark. Computer and Digital Engineering, 2017, 45(7): 1321-1329.

DOI: https://doi.org/10.18686/esta.v10i4.572

Refbacks

There are currently no refbacks.

About Us

Universe Scientific Publishing Pte. Ltd. was established in Singapore in June, 2011 with a global orientation. The core business of the company focuses on publication of academic journals and organization of international academic conferences.

Main Menu

Valuable Links

Contact Information

Universe Scientific Publishing Pte. Ltd.
Editorial Office
35 Hougang Avenue 7
#04-07
Singapore 538802
Phone: 0065-6753-8753
Email: contact@usp-pl.com