Applications cluster computing pdf




















An all-pair sequential computation over a large number of elements may become prohibitively complex. Interactions between pairs of elements happen in the order specified by a precedence graph [4]; fortunately, some of the interactions between pairs of elements are independent of each other and can, therefore, be executed in parallel.

It is possible to specify an all-pairs computation as a generic parallel algorithm that implements process control and communication in a problem- independent manner. Such a generic algorithm can be glued together with domain- specific sequential code in order to derive particular all-pairs parallel computations. In Section 2 we propose a generic all-pairs pipeline algorithm for parallel computations on clusters of workstations. In Sections 3 we use this generic algorithm to derive specific parallel algorithms for three different problems: n-body simulation, bubble sort, and Gaussian elimination.

In Section 4 we describe implementations of the algorithms on a homogeneous cluster of workstations using PVM, the parallel virtual machine software package. In the same section, we present performance measurements on an Intel Pentium II cluster of workstations.

Concluding remarks are presented in Section 5. Master e10 e11 integrate interact interact interact.. An all-pairs pipeline in progress.

An all-pairs computation on a system of n elements can be parallelized by means of a master and several pipelined nodes. The master sends all elements to travel left-to-right through the pipeline. Each node first retains its own subsystem of elements and makes them interact with each other, thus performing a sequential all-pairs computation on its retained elements. After that, the node continues to receive elements from its left neighbor, makes them interact with its own retained elements, and then sends them to its right neighbor.

Then the master integrates the system of received elements by eventually performing additional problem-specific modifications on the elements. After that, the master may send the whole system through the pipeline again; the whole process is repeated a predefined number of steps. This type of computation can be efficient on a cluster of workstations provided the interaction between pairs of elements is computationally intensive and the individual elements are not very large.

The only assumption made in the generic algorithm is that procedure interact can operate and eventually update two individual elements, ei and ej, while procedure integrate can update the state of the whole system, s.

The generic algorithm allows any desirable domain- specific element type without restrictions. As shown in Fig. After that, the node continues to receive transient elements through its left channel.

The node performs an application-dependent interaction between each transient element, ej, with every retained element, e[j], eventually changing the states of the interacting elements. After handling all transient elements, the node sends, through its top channel, all retained elements to the master. After these interactions, the master receives back all retained elements. Through a bottom net of channels that connect each pipeline node to the master.

In order to make this communication more efficient, the master employs a whole net of bottom channels, one for each individual pipeline node. At the end of each cycle, the master performs an application-dependent integration of the whole system see Fig. Pipeline node. Master node. Figure 4. All-pairs pipeline. Finally, the algorithm from Fig. Given this generic parallel algorithm, one can derive a parallel algorithm for a particular problem by specifying the particular type of its elements and the problem-specific, sequential procedures interact and integrate.

Finally, procedure compute is invoked in the master to solve concrete problem instances. We achieve this by defining the type of the elements for each specific problem, and by defining concrete sequential versions of methods interact and integrate. The sequential versions of interact and integrate are linked together with the generic parallel algorithm in order to obtain a specific parallel algorithm.

Complete specifications of all derived algorithms can be found in [9]. From the generic all-pairs pipeline, we derive a parallel n-body simulation algorithm that is similar to the one presented in [4, Chapter 6]. Starting again with the generic all-pairs pipeline, we derive a parallel bubble sort algorithm. For the derivation of bubble sort, we introduce a new generic parameter, a relation less that is capable of comparing any two generic elements.

Then we define interact to swap elements ei and ej if less ei, ej is true: procedure interact var ei, ej: element ; begin if less ei, ej then swap ei, ej ; end; This version of interact makes smaller elements move towards the end of the pipeline while large elements are kept closer to the beginning of the pipeline.

As a result, the whole system becomes sorted in descending order. The derived bubble sort algorithm is specialized yet generic version of the all- pairs pipeline because it can be used to sort any type of elements. Starting again with the generic all-pairs pipeline, we derive a parallel Gaussian elimination algorithm for solving such a system. Then, the original all-pairs elements are specified to represent rows of the extended matrix: Given two rows ei and ej, method interact eliminates xj from ei by multiplying and subtracting ej from ei.

This sequential post-processing does not affect the parallel performance significantly. Like bubble sort, Gaussian elimination requires a single pipeline step.

We believe that this approach simplifies the development and debugging of cluster computing applications. Note that the cluster implementations of n-body simulation, Gaussian elimination, and bubble sort all use the same generic parallel implementation of the all-pairs pipeline. Each specific cluster algorithm defines its domain-specific sequential components: element type, functions interact, integrate, and functions to initialize and finalize the system of elements.

The number of operations per node is inversely proportional to its position in the pipeline, e. The exhibit gene expression patterns in each image. This is a output of this process is images with annotations. We use image scaling and image image and then annotate the image using ontology terms filtering methods to rescale and denoise the images. The output of this process is standardised and denoised images, which can be represented as 2- dimensional arrays.

The resulting features of wavelet transforms are 2-dimensional arrays. Either feature selection or feature extraction or both can do this. Feature selection selects a subset of the most significant features for constructing classifiers.

Feature extraction performs the transformation on the original features for the dimensionality reduction to obtain a representative feature vectors for building up classifiers. Parallel the image patten recognition task 1 Overview of parallel approach It is well known that the speedup of an application to solve large computational problems is mainly gained by the parallelisation at either hardware or software levels or both e.

Hardware parallelism focuses on signal and circuit levels and normally is constrained by manufacturers. Software parallelism at component and system levels can be classified into two types: automatic parallelisation of applications without modifying existing sequential applications and construction of parallel programming models using various b software technologies to describe parallel algorithms and Figure 1.

An image pattern recognition task then match applications with the underlying hardware platforms. Mostly, it is hard to directly transform a sequential algorithm into parallel ones. While parallel programming models try to address how to develop parallel applications and therefore can maximally utilise the parallelisation to obtain high performance, it does need more development effort on parallelisation of specific applications. There are mainly two common methods for dealing with the first two questions: data parallelism and task parallelism.

Data parallelism represents workloads are distributed into different computing nodes and the same task can be executed on different subsets of the data simultaneously. Task Figure 2. The processes of the task parallelism means the tasks are independent and can be In terms of the task graph shown in Figure 2 and the executed purely in parallel.

A task is processed applied here. The invoked without any internal status to be maintained between extent of parallelisation is determined by dependencies of these instances. After we get image samples from the process each individual part of the algorithms and tasks.

Therefore, the data can be parallelised. A number of communication models have been decompose a process itself into parts and executed these developed [7][8]. In this case, the decomposition of the Interface has been developed for HPC parallel applications algorithms mainly focuses on feature selection and extraction with distributed memory architectures and has become the i. There is a set of implementations of MPI, i. Each node of the graph is a functional module, a process or a subtask and the edge connected between nodes is data flow, which shows true data dependency between them.

A direct acyclic graph for the training stage is shown in Figure 2. The left-hand side shows the atomic processes of the training stage. The right-hand side shows a higher-level abstraction of the task. In terms of the nature of the algorithm used in this case, parallel approaches used are mainly data parallelism and a typical task parallelism pipelining. Figure 3. However, to buy and maintain physical resources is costly and time consuming.

We therefore make use of IaaS in the form of cloud computing to perform our evaluation. Overview of Cloud Computing Cloud computing is an evolution of various forms of distributed computing systems: from original distributed computing, parallel computing cluster, to service-oriented computing e. Cloud computing extends these various distributed computing forms by introducing a business model, in which the nature of on-demand, self- service and pay-by-use of resources is used.

Cloud computing sells itself as a cost-effective solution and reliable platform. It focuses on delivery of services guaranteed Figure 4.

We have developed and then deployed our application onto rented compute infrastructure. Unlike traditional physical resource leasing, Therefore, the feature selection can be decomposed into Amazon EC2 uses virtualisation techniques e.

Table 1 executed with subsets of image samples on nodes. The lists standard virtual instances from Amazon as an example parallel form of the algorithm can be presented in Figure 4. In this case, the samples are represented Stand. For all of m1.

To large 4 7. The sample images can be partitioned to subsets. The task of the distance B. Experimentatation and Evaluation in the Cloud calculation between subsets of training samples and the 1 Performance metrics unclassified testing sample are executed at different stages We have used speedup as a performance indicator. The execution on multiple computing nodes includes any of these situations: distributed data, distributed processes and distributed storage. We launch instances by specifying the instance types and virtual Figure 6 Average execution time of the task with images that we have created for the image pattern increasing number of images for different number of virtual recognition task.

In this study, we have used 12 small nodes running concurrently instances for experimentation at this moment Amazon EC2 limits users to a maximum of 20 instances running concurrently. The specific configuration is listed in the second row in Table 1 m1. We report averages over 30 independent runs for each scenario the choice of 30 runs is based on statistics. The evaluation result is shown in Figure 5, Figure 6 and Figure 7. To validate the cost-effectiveness of Infrastructure-as-a-Service via cloud provision, we calculated the cost shown in Figure 8.

Figure 5 shows the speedup. The X-axis represents input Figure 7 Average communication time vs. The result of nodes for increasing number of nodes with different demonstrates that the speedup increases with the increase of number of images the number of computing nodes.

The speedup increases with the increase of input size when the number of machines is It fully embodies the advantage of parallel computing when processing heavy loads i. Figure 8 Average cost for performing the task using Amazon EC2 in US Dollars with , , and images when using 1, 2, 4, 8 and 12 virtual nodes running Figure 5 Average speedup for experiments with , , concurrently and images when using 2, 4, 8 and 12 nodes running concurrently Figure 6 shows the execution time under different input sizes and different numbers of computing nodes.

With the increase of input size, the execution time increases; with the increase of numbers of nodes, the execution time decreases. Figure 7 shows the relationship between communication time and numbers of nodes. LANI , pp. Figure 7, the communication time with input size at two nodes has a spike.

This is mainly caused by the latency [6] L. Silva, and R. Parallel Programming Models of the Cloud during the execution after we have checked and Paradigms, pp. ISBN Prentice Hall various similar experiments. In this case, since the Publishers, Inc. Duda and P.



0コメント

  • 1000 / 1000