Multiplication of a matrix and its transpose consider matrix A of size N by M and its transpose AT of size M by N.C Programming

1. Multiplication of a matrix and its transpose: Consider matrix A of size N by M and its transpose AT of size M by N. Your task is to design and implement a parallel algorithm for multiplication of a matrix and its transpose, i.e., C = AAT, for distributed-memory multi-computers in which the processors are organized as a one-dimensional linear array.


In the parallel algorithm design you must consider efficiency issues, i.e., try to minimize computation and communication costs and balance the workloads among all processors. Since the resulting matrix C is symmetric, i.e., cij = cji, for example, in your algorithm only the elements in the upper (or lower) triangular of the matrix need to be calculated. (In other words you must not calculate both cij and cji as they are the same.)


In the implementation
a. You must use MPI non-blocking send/recv communication functions to overlap computation and communication.
b. You can assume N ≥ p for p being the number of processes organized as a one-dimensional linear array.
c. Your program must produce correct results for p being greater than or equal to one.
d. For simplicity you may restrict p to be either an odd, or even number to achieve the best possible load balancing.
e. Your program needs to ask for the matrix sizes N and M as user defined parameters, and must print out the results in the row-wise order as shown in an example below. 
c00 c01 c02 c03 c04
       c11 c12 c13 c14
             c22 c23 c24
                   c33 c34


After the parallel computation, you main program must conduct a self-checking, i.e., first perform a sequential computation using the same data set and then compare the two results.

