Fog computing is a distributed computing paradigm that allows flexible allocation of processes and data to computers close to edge devices, user terminals, or IoT sensors to achieve low latency, save network bandwidth, and preserve security and privacy. Applications on a fog computing infrastructure can flexibly change their configuration and behavior depending on processing demands and environmental conditions. However, assurance of system qualities like performance, availability, and reliability becomes complicated as it is not statically determined at the design time. Drone-based image processing systems, for instance, can be more reliable by offloading the computation tasks to any fog nodes through a wireless communication network. Desirable computation modes (e.g., drone processing or offloading mode) can change depending on environmental uncertainties like network connection reliability and workload intensities. Therefore, system quality design needs to take into account such environmental uncertainties in addition to system configurations. In this study, we use Stochastic Reward Nets to model such complex stochastic system behaviors, analyze system performance and availability quantitatively, and optimize system design.
Recent advances in machine learning algorithms with increased computing power and available big data further expand the applications of machine learning systems. Autonomous vehicles, for instance, use deep learning to recognize the traffic signs, obstacles, pedestrians that appeared in the images captured by the camera. However, machine learning is not perfect as it can produce errors for real-world input data. Engineers need to prepare error outputs from machine learning functions and adopt relevant system architecture considering reliability and safety. In this study, we leverage the idea of N-version programming, which is a well-known software fault-tolerant technique, to improve the reliability of machine learning systems. We propose different types of N-version machine learning architectures and develop reliability models to assess the reliabilities of these architectures. [More details]
As many IoT application systems require long continuous operations of software, operational software reliability becomes one of the critical concerns of dependable systems. It is well known that long-running software often confronts the deterioration of performance and reliability over time, which is referred to as software aging. Software aging is typically caused by software bugs, which are not easily detected in software systems consisting of many dependent software components. Therefore, system monitoring and statistical analysis are the clues to find the trend and the root causes of software aging phenomena. By predicting a potential aging trend and failure time, we can effectively apply preventive software maintenance techniques like software rejuvenation and software life-extension. In this study, we use stochastic models to evaluate the effectiveness of software preventive maintenance techniques and find the optimal maintenance schedules.
Last update: 2024.1.19