Menu

[Solved]Consider Special Case Matrix Multiplication C C B B C N X N Matrices Performed Using 2n3 F Q37168777

We consider a special case of matrix multiplication:
C := C + A*B
where A, B, and C are n x n matrices. This can be performed using2n3 floating point operations (n3 adds, n3 multiplies), as in thefollowing pseudocode:

for i = 1 to n for j = 1 to n

for k = 1 to n
C(i,j) = C(i,j) + A(i,k) * B(k,j)

end end

end

In this assignment, we will design and implement a C programthat performs computations on large matrices. The size of a matrixis large enough so that the execution of program causes paging.Read p. 225 to p. 228, and p. 413 to p. 417 of Patterson &Hennessy textbook.

Purpose:
Different choices of copying the matrix may have different impactson the program runtime. You are required to notice such impacts andeventually propose a design that efficiently leverages themechanisms described below to achieve the best performance

Requirements:

1. Complete the given C program and use multiple optimizationmechanisms to improve the execution runtime.

  1. You are required to use all the mechanisms discussed in theclass and recitation

  2. You are only allowed to use standard libraries and intrinsiclibrary to implement your program

  3. You are not allowed to create new threads in yourimplementation.

  4. When using gcc to compile your code, you are allowed to useoptimization level3(-O3).

Mechanisms:

  1. Caching: You are required to try different cache block size inyour code and use the block size that gives you minimum runtimewhen integrated with other techniques (SIMD and superscalarmechanism)

  2. SIMD: You are required to make use of Single InstructionMultiple Data (SIMD). It means performing a single operation onmultiple data points simultaneously.

  3. Superscalar mechanisms: A superscalar processor executes morethan one instruction during a clock cycle by simultaneouslydispatching multiple instructions to different components on theprocessor.

Experiments:

  • Per attempted optimization, you can have different size ofmatrix (for example, n = 128, 256, 512, 1024, 2048), differentblock sizes (m = 2^x), different number of unrolled instructions,etc.

  • You need to measure the runtime of each experiment.

  • You need to verify the correctness of the attemptedoptimization.

#Implement the opitmized_dgemm in the C code

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <string.h>
#include <math.h>
#include <immintrin.h>
#include <x86intrin.h>

#define ALIGN __attribute__ ((aligned (32)))

#define SIZE 1024
double ALIGN a[SIZE * SIZE];
double ALIGN b[SIZE * SIZE];
double ALIGN c[SIZE * SIZE];
double ALIGN c1[SIZE * SIZE];

// naïve matrix multiplication void dgemm(int n)
void dgemm(int n){
   int i,j,k;
   for(i=0; i<n; i++){
       for(j=0; j<n; j++){
           double cij =0;
           for(k=0; k<n;k++)
              cij = cij + a[i*n+k] * b[k*n+j];
           c1[i*n+j] =cij;
       }
   }
}

void optimized_dgemm(int n){

}

void main(int argc, char** argv){
   int i, j;
   time_t t;
   struct timeval start, end;
   double elapsed_time;
   int check_correctness = 0;
   int correct = 1;

   if(argc > 1){
       if(strcmp(argv[1], “corr”) ==0){
          check_correctness = 1;
       }
   }
/* Initialize random number generator */
   srand((unsigned) time(&t));
/* Populate the arrays with random values */
   for(i=0; i< SIZE; i++){
       for(j=0; j< SIZE; j++){
           a[i* SIZE +j] =(double)rand() / (RAND_MAX + 1.0);
           b[i* SIZE +j] =(double)rand() / (RAND_MAX + 1.0);
           c[i* SIZE +j] =0.0;
           c1[i* SIZE +j] =0.0;
       }
   }
   gettimeofday(&start, NULL);
/* Call you optimized function optimized_dgemm */optimized_dgemm(SIZE);
   gettimeofday(&end, NULL);
}

void dgemm_unrolling(int n){
   int i,j,k;
   for(i=0; i<n; i++){
       for(j=0; j<n; j++){
           double cij =0;
           for(k=0; k<n;k+=4){
              double s1 = a[i*n+k] * b[k*n+j];
              double s2 = a[i*n+(k+1)] * b[(k+1)*n+j];
              double s3 = a[i*n+(k+2)] * b[(k+2)*n+j];
              double s4 = a[i*n+(k+3)] * b[(k+3)*n+j];
              cij += s1 + s2 + s3 + s4;
           }
           c[i*n+j] =cij;
       }
   }
}

Expert Answer


Answer to We consider a special case of matrix multiplication: C := C + A*B where A, B, and C are n x n matrices. This can be per… . . .

OR


Leave a Reply

Your email address will not be published. Required fields are marked *