The Influence of Regularization Intensity on the Bias Variance of Linear Regression

Wenhao Wang

doi:10.54691/xx3cer44

Authors

Wenhao Wang

DOI:

https://doi.org/10.54691/xx3cer44

Keywords:

Linear Regression; Regularization; Hyperparameter Optimization; Gradient Descent; Mean Square Error.

Abstract

This study proposes an L2 regularization-based framework for optimizing linear regression models' generalization ability. Through comparative analysis of ordinary least squares (OLS) and ridge-like models on synthetic data, we investigate regularization's role in bias-variance trade-off. The experimental protocol involves: (1) generating linear data (y = 3X + 5 + ϵ) with Gaussian noise (σ = 2), (2) estimating OLS parameters via normal equations, and (3) implementing gradient descent with regularization terms (λ ∈ {0.0,0.01,0.1,1.0}), using 2λθ_j for weight correction. Results show the λ = 0.1 model achieves optimal MSE(Mean Squared Error) performance (MSE = 4.21), 15.3% better than OLS (MSE = 4.97), with parameters (intercept = 5.12,coefficient = 2.98) closer to true values. Visual analysis confirms the regularized model's superior robustness in feature distribution edges, contrasting with OLS's overfitting tendency. The proposed grid search and gradient correction methods provide an interpretable framework for lightweight model optimization, extendable to elastic networks and deep neural networks in high-dimensional scenarios.

Downloads

Download data is not yet available.

References

[1] Chen B , Zhai W .Unified algorithms for distributed regularized linear regression model[J].Mathematics and Computers in Simulation, 2025, 229:867-884.DOI:10.1016/j.matcom. 2024.10.018.

[2] Yu F , Shen L , Song G .Hyperparameter Estimation for Sparse Bayesian Learning Models[J].SIAM/ASA Journal on Uncertainty Quantification, 2024(3):12.

[3] Katona, Tamás,Tóth, Gábor,Petró, Mátyás,et al.Developing New Fully Connected Layers for Convolutional Neural Networks with Hyperparameter Optimization for Improved Multi-Label Image Classification[J].Mathematics (2227-7390), 2024, 12(6).DOI:10.3390/math12060806.

[4] Xie Z , Li Z , He X ,et al.ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning[J]. 2024.

[5] Sheikhottayefe M , Esmaily Z , Dehghani F .Spawning Gradient Descent (SpGD): A Novel Optimization Framework for Machine Learning and Deep Learning[J].SN Computer Science, 2025, 6(3):1-28.DOI:10.1007/s42979-025-03750-7.

[6] Tikhonov A N .Solution of Incorrectly Formulated Problems and the Regularization Method[J].Observatory, 1962.DOI:http://dx.doi.org/.

[7] Pham D L , Prince J L .An Adaptive Fuzzy C-Means Algorithm for Image Segmentation in the Presence of Intensity Inhomogeneities[J].Pattern Recognition Letters, 1998, 20(1):57-68.DOI:10.1016/ S0167-8655(98)00121-4.

[8] Hongmei C , Haifeng X , Lifang Z ,et al.Competitive and collaborative representation for classification[J].Pattern Recognition Letters, 2020, 132:46-55.DOI:10.1016/j.patrec.2018.06.019.

[9] Bergstra J , Bengio Y .Random Search for Hyper-Parameter Optimization[J].Journal of Machine Learning Research, 2012, 13(1):281-305.DOI:10.1016/j.chemolab.2011.12.002.

[10] Zalán Borsos, Khorlin A , Gesmundo A .Transfer NAS: Knowledge Transfer between Search Spaces with Transformer Agents[J]. 2019.DOI:10.48550/arXiv.1906.08102.

[11] Viana D , Teixeira R , Soares T ,et al.Generative Adversarial Networks forSynthetic Meteorological Data Generation[C]//EPIA Conference on Artificial Intelligence.Springer, Cham, 2025.DOI:10.1007/978-3-031-73500-4_17.

[12] Hao Wu J L S .Does overfitting affect performance in estimation of distribution algorithms[J].ACM, 2006.DOI:10.1145/1143997.1144078.

[13] Xu R , Liu B , Zhang K ,et al.Noise-robust few-shot classification via variational adversarial data augmentation[J].Computational Visual Media, 2025, 11(1):227-239.DOI:10.26599/CVM.2025.9450403.

[14] Kakade S M , Foster D P .Multi-view Regression Via Canonical Correlation Analysis[J].Springer Berlin Heidelberg, 2007.DOI:10.1007/978-3-540-72927-3_8.

[15] ZHIQIANG,GAO,PANOS,et al.Stability of the pseudo-inverse method for reconfigurable control systems[J].International Journal of Control, 2007, 53(3).DOI:10.1080/00207179108953643.

[16] Zou H , Hastie T .Regularization and variable selection via the elastic net[J].Journal of the Royal Statistical Society, 2005, 67(5):768-768.DOI:10.1111/j.1467-9868.2005.00527.x.

[17] Yoon J C , Lee I K .Stable and controllable noise.[J].Graphical Models, 2008, 70:105-115.DOI:10.1016/j.gmod.2008.04.001.

[18] Bridges T J , Kostianko A , Zelik S .Validity of the hyperbolic Whitham modulation equations in Sobolev spaces[J]. 2020.DOI:10.1016/j.jde.2020.11.019.