留学生计算机代考 | Parallel Computation Exam COMP322101

在最简单的意义上，并行计算是同时使用多个计算资源来解决一个计算问题。

一个问题被分解成可以同时解决的离散部分
每个部分被进一步分解为一系列的指令
每个部分的指令在不同的处理器上同时执行
采用一个整体控制/协调机制

计算问题应该能够：

被分解成可以同时解决的离散工作片段。
在任何时候都能执行多个程序指令。
用多个计算资源解决的时间比用单个计算资源解决的时间短。
计算资源通常是。
一台具有多个处理器/核的计算机
由网络连接的任意数量的此类计算机

使用并行编程的主要原因

节省时间和/或金钱

从理论上讲，在一项任务上投入更多的资源会缩短完成任务的时间，有可能节省成本。并行计算机可以用廉价的商品部件建造。

解决更大/更复杂的问题

许多问题非常大和/或复杂，使用串行程序来解决它们是不切实际或不可能的，特别是考虑到计算机内存有限。

提供并发性

一个计算资源一次只能做一件事。多个计算资源可以同时做很多事情。

例子。协作网络提供了一个全球性的场所，来自世界各地的人可以 “虚拟 “见面并开展工作。

利用非本地资源的优势

在本地计算资源稀缺或不足时，使用广域网甚至互联网上的计算资源。

例如。SETI@home（setiathome.berkeley.edu）在全世界几乎每个国家都有超过170万用户（2018年5月）。

更好地利用底层并行硬件

现代计算机，甚至是笔记本电脑，都是具有多个处理器/核的并行架构。并行软件是专门为具有多个内核、线程等的并行硬件而设计的。在大多数情况下，在现代计算机上运行的串行程序会 “浪费 “潜在的计算能力。

下面是一个C语言并行计算的代写高分案例：

Many parallel algorithms require, at some stage, variables distributed across multiple processing units to be reduced to a single value by a binary operation. This reduced value must then be made accessible to all processing units. For instance, in a series of vector operations, it may happen that the result of the scalar product of two vectors must then be made available to all processing units for the next stage in the calculations.

(a) For shared memory systems, where the processing units are threads, all threads can read the memory location containing the reduced value. However, they should not begin subsequent calculations until the reduction calculation has been completed.

(i) For a GPU, suppose the reduction had been completed by threads within a single work group. Why is it beneficial to use local, rather than global, memory for intermediate calculations in this situation?

(ii) Still for a GPU, how would you ensure the result of the reduction performed by a single work group, in local memory, has been completed, and can be read by all threads for the subsequent calculations? Explain your answer.

(b) For distributed memory systems, where processing units are processes, the issue becomes communicating the result of the reduction to all processes. Suppose that, after the reduction, the reduced value is known only to one process, e.g. rank 0 for MPI.

(i) What form of collective communication should be used to send the reduced value to all processes? You do not need to give the actual MPI function name, but may do so if you like.

(ii) Someone suggests using point-to-point instead of collective communication, and you rightly point out that this will likely be slower than using collective communication. Justify this claim by estimating how the communication time tcomm varies with the total number of processes numProcs for both methods. You should assume that the collective communication uses a binary tree.

(iii) Given barriers are not used in the binary tree, how might the necessary synchronisation be achieved?

(iv) In fact, MPI already provides a function MPI Allreduce that both reduces, and distributes the final answer to all processes. One possible implementation is essentially a combination of binary trees. An example is given in Fig. 1 for numProcs=4. Redraw Fig. 1 for the case numProcs=2, for which there will be 2 levels rather than 3, and therefore 4 nodes in total.

(v) How many communications are there in total?

(vi) Returning to Fig. 1, note that in the final row of communications, some processes send two partial sums whereas others send none. How would you alter this final exchange of partial sums to make the communication better balanced, i.e. so processes send at most one partial sum? Use the given rank numbers in your answer.

(c) Notice that Fig. 1 is a task graph. Assume that each task (node) corresponds to the same amount of time, including those on the top and bottom rows.

(i) What is the work and span of the task graph given in Fig. 1? What is the maximum performance as predicted by the work-span model?

(ii) Suppose there are p = 2m processes. What is the work, span, and prediction of the work-span model now, for arbitrary m?

(iii) It has been assumed that each task takes the same time to execute. Suppose each task now takes a different, but known, time to execute. Describe in general terms how you would modify the definition of work and span, and the prediction of the work–span model, for this situation. You do not need to derive expressions or perform actual calculations, but should explain your answer.

contact

Assignment Exmaple

Recent Case

盘点留学生常用5 种最佳笔记方法，哪一种适合您

2024年7月12日

也许你还没注意到，我们人类很容易忘记事情。不相信我

sci论文从投稿到发表需要多久？——TopMask代写服务平台

2024年7月11日

SCI期刊论文的发表过程是一项复杂而细致的工作，涉

轻松掌握万能Essay模板，撰写一篇合格的英文论文

2024年7月8日

在英国读书的小伙伴，时常都需要撰写essay，有一

专业Python代写：100%原创代码，100%售后保障——TopMask作业代写平台

2024年7月5日

Python可谓是当下最火的编程语言，同时，Pyt

专业靠谱的Python代写代考服务——TopMask作业代写平台

2024年7月4日

在竞争激烈的学术领域，学生们往往需要应对Pytho

如何降低作业代写对于学术诚信的影响——TopMask作业代写平台

2024年7月3日

是不是有时候觉得作业多得像山一样，压得你喘不过气来

Service Scope

留学生计算机代考 | Parallel Computation Exam COMP322101

使用并行编程的主要原因

contact

Assignment Exmaple

Recent Case

盘点留学生常用5 种最佳笔记方法，哪一种适合您

sci论文从投稿到发表需要多久？——TopMask代写服务平台

轻松掌握万能Essay模板，撰写一篇合格的英文论文

专业Python代写：100%原创代码，100%售后保障——TopMask作业代写平台

专业靠谱的Python代写代考服务——TopMask作业代写平台

如何降低作业代写对于学术诚信的影响——TopMask作业代写平台

Service Scope

立即下单，首单95折，团购折上折！

联系方式

服务范围

Copyright 2024 © TopMaskdaixie. All Rights Reserved

友情链接：

DDLPASS代写

论文代写

留学生计算机代考 | Parallel Computation Exam COMP322101

使用并行编程的主要原因

contact

Assignment Exmaple

Recent Case

Service Scope

Tag

立即下单，首单95折，团购折上折！

联系方式

服务范围

Copyright 2024 © TopMaskdaixie. All Rights Reserved

友情链接：