A study on communication optimization of distributed gradient descent algorithms based on large-scale machine learning
Abstract
In recent years, the rapid development of new generation information technology has resulted in an unprecedented expansion
of information capacity. Machine learning algorithms are also increasingly used to compute information sets and build information systems
to solve problems whose complexity makes algorithmic solutions infeasible. Examples include autonomous vehicles, speech recognition
or user determination (recommendation systems). The complexity of the machine learning model, combined with the larger amount of data
collected, makes it much more expensive to use the model on a single machine, or even impossible to train. Using the computing power of
distributed systems is a straightforward, simple solution to the problem. Today, powerful computer clusters are used to train complex deep
neural networks on large data sets. However, in large-scale clustered environments, the commonly used distributed synchronous stochastic
gradient descent algorithms require frequent node communication to ensure consistency of the gradients (parameters). This has led to the
communication bandwidth being a key constraint for distributed machine learning systems.
of information capacity. Machine learning algorithms are also increasingly used to compute information sets and build information systems
to solve problems whose complexity makes algorithmic solutions infeasible. Examples include autonomous vehicles, speech recognition
or user determination (recommendation systems). The complexity of the machine learning model, combined with the larger amount of data
collected, makes it much more expensive to use the model on a single machine, or even impossible to train. Using the computing power of
distributed systems is a straightforward, simple solution to the problem. Today, powerful computer clusters are used to train complex deep
neural networks on large data sets. However, in large-scale clustered environments, the commonly used distributed synchronous stochastic
gradient descent algorithms require frequent node communication to ensure consistency of the gradients (parameters). This has led to the
communication bandwidth being a key constraint for distributed machine learning systems.
Keywords
distributed systems; machine learning; distributed optimization; communication effi ciency
Full Text:
PDFReferences
[1] Zhang Yazhong, Li Yuxiao, Liu Yuxiang, He Wei. OAM alignment algorithm based on machine learning gradient descent method [J]. Communication
Technology, 2019, 052(006):1316-1319.
[2] Zhang YZ, Li YX, Liu YX, et al. OAM alignment algorithm based on machine learning gradient descent method[J]. Communication Technology, 2019,
52(6):4.
[3] Xie Zaipeng, Li Bowen, Zhang Ji, et al. A distributed coding-based stochastic gradient descent optimization method:, CN111104215A [P]. 2020.
DOI: https://doi.org/10.18686/esta.v10i2.384
Refbacks
- There are currently no refbacks.
Copyright (c) 2023 Hao Wu