Research on Construction of Offline Data Warehouse for Ship Shore Power based on DolphinScheduler and Hive

Authors

  • Zewen Zhang
  • Xin Zhang
  • Taizhi Lv

DOI:

https://doi.org/10.54691/23cqqy15

Keywords:

Ship Shore Power; Offline Data Warehouse; Hadoop; Hive; DolphinScheduler.

Abstract

In today's networked, intelligent, and data-driven era, the shore power industry is facing the challenge of rapidly growing data. This paper presents the construction of a professional offline data warehouse system for shore power based on DolphinScheduler and Hive. Firstly, MySQL is adopted as the backend database, combined with Sqoop to synchronize business data to HDFS, ensuring data reliability and integrity. Secondly, the Flume-Kafka-Flume architecture is utilized to achieve real-time collection and caching of user behavior data, providing data support for subsequent analysis. Thirdly, HQL statements are written in Hive to clean, merge, and analyze shore power data, calculating key indicators such as electricity consumption and usage trends. Fourthly, data visualization is achieved through the integration of Superset, displaying data analysis results via a web interface. Fifthly, DolphinScheduler is employed for timed scheduling, ensuring dependency control among various tasks and the smooth operation of the project. This system fully leverages the replication mechanism of HDFS to enhance reliability, dynamically adds nodes to achieve system scalability, and fully utilizes the fault tolerance of the Yarn scheduler. It saves time and computational costs for the shore power industry, realizing higher value and benefits.

Downloads

Download data is not yet available.

References

[1] Zou, Yujuan, Peiyi Tang, and Taizhi Lv. "Design and implementation of ship shore power data analysis system based on Doris data warehouse." 2022 3rd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE). IEEE, 2022: 367-370.

[2] Chen, Juntao, Jinmei Zhan, and Fei Tian. "Research on the Construction of a Data Warehouse Model for College Student Performance." International Conference of Pioneering Computer Scientists, Engineers and Educators. Singapore: Springer Nature Singapore, 2023: 408-419.

[3] Zhai, Yanlong, et al. "Hadoop perfect file: A fast and memory-efficient metadata access archive file to face small files problem in hdfs." Journal of Parallel and Distributed Computing 156 (2021): 119-130.

[4] Małysiak-Mrozek, Bożena, et al. "High-efficient fuzzy querying with hiveql for big data warehousing." IEEE Transactions on Fuzzy Systems 30.6 (2021): 1823-1837.

[5] Mantri, A. "Optimizing HDFS Storage and Managing TTL for Unused Hive Tables: Strategies for Improved Data Efficiency." J Artif Intell Mach Learn & Data Sci 2023 1.4: 680-683.

[6] Sleeman IV, William C., and Bartosz Krawczyk. "Multi-class imbalanced big data classification on spark." Knowledge-Based Systems 212 (2021): 106598.

[7] Qiu, Yuanhui, et al. "TsQuality: Measuring Time Series Data Quality in Apache IoTDB." Proceedings of the VLDB Endowment 16.12 (2023): 3982-3985.

Downloads

Published

2024-09-20

Issue

Section

Articles