Introductory Chapter: Introduction to Dependability Engineering Introductory Chapter: Introduction to Dependability Engineering

Cloud computing presents some challenges that are needed to be overcome, such as planning infrastructures that maintain availability when failure events and repair activities occur [1]. Cloud infrastructure planning, which addresses the dependability aspects, is an essential activity because it ensures business continuity and client satisfaction. Redundancy mechanisms cold standby, warm standby, and hot standby can be allocated to components of the cloud infrastructure to maintain the availability levels agreed in SLAs. Mathematical formalisms based on state space, such as stochastic Petri nets and based on combinatorial as reliability block diagrams [2], can be adopted to evaluate the dependability of cloud infrastructures considering the allocation of different redundancy mechanisms to its components [3]. Chapter 1 shows the adoption of the mathematical formalisms’ stochastic Petri nets and reliability block diagrams to dependability evaluation of cloud infrastructures with different redundancy mechanisms.


Introduction
Cloud computing presents some challenges that are needed to be overcome, such as planning infrastructures that maintain availability when failure events and repair activities occur [1]. Cloud infrastructure planning, which addresses the dependability aspects, is an essential activity because it ensures business continuity and client satisfaction. Redundancy mechanisms cold standby, warm standby, and hot standby can be allocated to components of the cloud infrastructure to maintain the availability levels agreed in SLAs. Mathematical formalisms based on state space, such as stochastic Petri nets and based on combinatorial as reliability block diagrams [2], can be adopted to evaluate the dependability of cloud infrastructures considering the allocation of different redundancy mechanisms to its components [3]. Chapter 1 shows the adoption of the mathematical formalisms' stochastic Petri nets and reliability block diagrams to dependability evaluation of cloud infrastructures with different redundancy mechanisms.
International research projects involve large distributed teams made up of multiple institutions. Chapter 2 describes research artifacts that need to work together in order to demonstrate and ship the project results. Yet, in these settings, the project itself is almost never in the core interest of the partners in the consortium. This leads to a weak integration incentive and, consequently, to last minute efforts. This in turn results in Big Bang Integration that imposes huge stress on the consortium and produces only non-sustainable results. In contrast, the industry has been profiting from the introduction of agile development methods backed by "continuous delivery," "continuous integration," and "continuous deployment" [4]. Chapter 2 identifies shortcomings of this approach for research projects. It shows how to overcome those in order to adopt all three continuous methodologies regarding that scope. It also presents a conceptual, as well as a tooling framework, to realize the approach as "continuous anything." As a result, integration becomes a core element of the project plan. It distributes and shares the responsibility of integration work among all partners, while at the same time clearly holding individuals responsible for dedicated software components. Through a high degree of automation, it keeps the overall integration work low, but still provides immediate feedback on the quality of the software. Overall, it is found that this concept is useful and beneficial in several EU-funded research projects, where it significantly lowered integration effort and improved quality of the software components, while also enhancing collaboration as a whole.
Software fault injection (SFI) is an acknowledged method for assessing the dependability of software systems. After reviewing the state of the art of SFI, Chapter 3 addresses the challenge of integrating it deeper into software development practice. It is presented with a well-defined development methodology incorporating SFI (fault injection driven development, FIDD), which begins by systematically constructing a dependability and failure cause model [5], from which relevant injection techniques, points, and campaigns are derived [6]. The possibilities and challenges are analyzed for the end-to-end automation of such campaigns. The suggested approach can substantially improve the accessibility of dependability assessment in everyday software engineering practice.
Availability quantification and prediction of IT infrastructure in data centers are of paramount importance for online business enterprises. Chapter 4 presents comprehensive availability models for practical case studies in order to demonstrate a state space stochastic reward net model for typical data center systems for quantitative assessment of system availability [7]. A stochastic reward net model of a virtualized server system, and also a data center network based on DCell topology and a conceptual data center for disaster tolerance are presented. The systems are then evaluated against various metrics of interest, including steady state availability, downtime and downtime cost, and sensitivity analysis.
A majority of transistors in a modern microprocessor are used to implement static random access memories (SRAM) [8]. Therefore, it is important to analyze the reliability of SRAM blocks. During SRAM design, it is important to build in design margins to achieve an adequate lifetime. The two main wear-out mechanisms that increase a transistor's threshold voltage are bias temperature instability (BTI) and hot carrier injections (HCI). BTI and HCI can degrade transistors' driving strength, and further weaken circuit performance. In a microprocessor, first level (L1) caches are frequently accessed, which makes it especially vulnerable to BTI and HCI. In Chapter 5, the cache lifetimes due to BTI and HCI are studied for different cache configurations, namely, cache size, associativity, cache line size, and replacement algorithm. To give a case study, the failure probability (reliability) and the hit rate (performance) of the L1 cache in a LEON3 microprocessor are analyzed while the microprocessor is running a set of benchmarks [9]. Essential insights can be provided from the results to give better performance reliability trade-offs for cache designers.
The vast amounts of data to be processed by today's applications demand higher computational power [10]. To meet application requirements and achieve reasonable application performance, it becomes increasingly profitable or even necessary, to exploit any available hardware parallelism. For both new and legacy applications, successful parallelization is often subject to high cost and price [11]. Chapter 6 proposes a set of methods that employ an optimistic semiautomatic approach, which enables programmers to exploit parallelism on modern hardware architectures. It provides a set of methods, including an LLVM-based tool, to help programmers identify the most promising parallelization targets and understand the key types of parallelism. The approach reduces the manual effort needed for parallelization. A contribution of this work is an efficient profiling method to determine the control and data dependences for performing parallelism discovery or other types of code analysis. A method for detecting code sections is presented, where parallel design patterns might be applicable and suggesting relevant code transformations. The approach efficiently reports detailed runtime data dependences. It accurately identifies opportunities for parallelism and the appropriate type of parallelism to use as task based or loop based.
Quality of service is the ability to provide different priorities to applications, users or data flows, or to guarantee a certain level of performance to a data flow [12,13]. Chapter 7 uses timed Petri nets to model techniques that provide the quality of service in packet-switched networks and illustrate the behavior of developed models by performance characteristics of simple examples. These performance characteristics are obtained by discrete event simulation of analyzed models [14,15].
Condition monitoring system is usually employed in structural health monitoring [16,17]. The reliability analysis of more complicated structures usually deals with the finite element method (FEM) models. The random fields (material properties and loads) have to be represented by random variables assigned to random field elements. The adequate distribution functions and covariance matrices should be determined for a chosen set of random variables [18]. This procedure is called discretization of a random field. Chapter 8 presents the discretization of the random field for material properties with the help of the spatial averaging method of the one-dimensional homogeneous random field and midpoint method of discretization of the random field. The second part of Chapter 8 deals with the discretization of random fields representing distributed loads. In particular, the discretization of the distributed load imposed on a Bernoulli beam is presented in detail. A numerical example demonstrates very good agreement of the reliability indices computed with the help of stochastic finite element method (SFEM) and first-order reliability method (FORM) analyses with the results obtained from analytical formulae.
Electric arc furnace (EAF)-based process route in modern steelmaking for the production of plates and special quality bars requires a series of stations for the secondary metallurgy treatment (ladle furnace (LF), and potentially vacuum degasser), till the final casting for the production of slabs and blooms in the corresponding continuous casting machines. However, since every steel grade has its own melting characteristics, the melting (liquidus) temperature per grade is generally different and plays an important role to the final casting temperature, which has to exceed by somewhat the melting temperature by an amount called superheat. The superheat is adjusted at the LF station by the operator who decides mostly on personal experience but, since the ladle has to pass from downstream processes, the liquid steel loses temperature, not only due to the duration of the processes till casting but also due to the ladle refractory history. Simulation software was developed in Chapter 9 in order to reproduce the phenomena involved in a melt shop and influence downstream superheats. Data science models were deployed in order to check the potential of controlling casting temperatures by adjusting liquid steel exit temperatures at LF [19].
The electricity industry worldwide is turning increasingly to renewable sources of energy to generate electricity [20,21]. Rural electrification is the key in developing countries to encourage youth and skilled personnel to stay in the rural area for production/income generation activities. Current situation of lack of grid network discourage skilled personnel to live in the rural areas, rather they migrate to urban. Tanzania as other countries has diverse renewable energy which needs to be developed for electricity generation. Most of these sources are found in rural areas, where there is no reliable electricity, that is, grid network is not extended due to low population density. The government of Tanzania has put in place the policy which encourages small power producers (up 10 MW) to develop and install electricity generation using renewable energy resources. Energy produced by small power producer would be sold to the community directly or to the government-owned company for grid integration. Chapter 10 discussed three major renewable energy sources which are environmentally friendly found in Tanzania, such as wind energy, solar energy, and hydropower energy. Also, the government is setting the strategies of empowering people in rural areas, particularly women and youth through organizations, such as local cooperatives, and by applying the bottom-up approach, so that livelihoods in the rural areas, to be enhanced through effective participation of rural people and rural communities in the management of their own social, economic, and environmental.
Reliability is a key important criterion in every single system in the world, and it is not different in engineering [22]. Reliability in power systems or electric grids can be generally defined as the availability time (capable of fully supplying the demand) of the system compared to the amount of time it is unavailable (incapable of supplying the demand) [23]. For systems with high uncertainties, such as renewable energy-based power systems, achieving a high level of reliability is a formidable challenge due to the increased penetrations of the intermittent renewable sources, such as wind and solar [24]. A careful and accurate planning is of the utmost importance to achieve high reliability in renewable energy-based systems [25]. Chapter 11 assesses wind-based power system's reliability issues and provides a case study that proposes a solution to enhance the reliability of the system.