ATPO-LSTM: Adaptive Two-Phase Optimization with Entropy-Driven Genetic Algorithm for High-Dimensional LSTM Hyperparameter Tuning

Xuanrui Zhang

doi:10.54691/9hw7f627

Authors

Xuanrui Zhang

DOI:

https://doi.org/10.54691/9hw7f627

Keywords:

Hyperparameter optimization, genetic algorithm, LSTM networks, time series forecasting, adaptive mutation.

Abstract

Aiming at the high-dimensional non-convex hyperparameter optimisation difficulties faced by long short-term memory networks (LSTMs) in time series forecasting, this paper proposes an adaptive two-stage optimisation framework based on entropy-driven genetic algorithm (ATPO-GA). The method dynamically switches the variational modes of Cauchy and Gaussian distributions through Shannon entropy, combines a hybrid selection strategy of tournament selection and simulated annealing, and introduces a dimension-aware adaptation assessment that includes computational complexity constraints, which effectively solves the problems of precocious convergence and diversity loss of the traditional evolutionary algorithms when dealing with more than 20-dimensional parameters. Experiments on 6 industrial datasets (energy, finance, healthcare, etc.) show that ATPO-LSTM reduces the mean absolute error (MAE) by 18.7% (p<0.01) and improves the convergence speed by 23% compared with particle swarm optimisation (PSO-LSTM). In a practical deployment in a regional grid system, 12.6% cost savings are achieved through accurate load forecasting. The theoretical analysis demonstrates the global convergence of the algorithm and maintains linear computational complexity in a 50-dimensional high-dimensional space. The results provide a new paradigm for efficient hyperparameter optimisation of industrial LSTM models.

Downloads

Download data is not yet available.

References

[1] Loshchilov, I., & Hutter, F.(2019). Decoupled weight decay regularization. Proceedings of the 36th International Conference on Machine Learning (ICML), 1436–1445.

[2] Duchi, J., Hazan, E., & Singer, Y.(2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research,*12*(7), 2121–2159.

[3] Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T.(2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation,*6*(2), 182–197.

[4] Jansen, T., & Wegener, I.(2001). Evolutionary algorithms-how to cope with plateaus of constant fitness.IEEE Transactions on Evolutionary Computation,*5*(6), 589–599.

[5] Zitzler, E., Laumanns, M., & Thiele, L. (2004). SPEA2: Improving the strength Pareto evolutionary algorithm. TIK Report, *103*.

[6] Snoek, J., Larochelle, H., & Adams, R. P.(2012). Practical Bayesian optimization of machine learning algorithms.Advances in Neural Information Processing Systems (NeurIPS),*25*, 2960–2968.

[7] Brochu, E., Cora, V. M., & de Freitas, N. (2010). A tutorial on Bayesian optimization of expensive cost functions. arXiv preprint arXiv:1012.2599.

[8] Swersky, K., Duvenaud, D., Snoek, J., Hutter, F., & Osborne, M. A.(2014). Raiders of the lost architecture: Kernels for Bayesian optimization in conditional parameter spaces.NeurIPS Workshop on Bayesian Optimization.

[9] Pham, H., Guan, M., Zoph, B., Le, Q. V., & Dean, J.(2018). Efficient neural architecture search via parameter sharing. Proceedings of the 35th International Conference on Machine Learning (ICML), 4092–4101.

[10] Liu, H., Simonyan, K., & Yang, Y.(2019). DARTS: Differentiable architecture search. International Conference on Learning Representations (ICLR).