[Solved]Consider Special Case Matrix Multiplication C C B B C N X N Matrices Performed Using 2n3 F Q37168777
We consider a special case of matrix multiplication:
C := C + A*B
where A, B, and C are n x n matrices. This can be performed using2n3 floating point operations (n3 adds, n3 multiplies), as in thefollowing pseudocode:
for i = 1 to n for j = 1 to n
for k = 1 to n
C(i,j) = C(i,j) + A(i,k) * B(k,j)
end end
end
In this assignment, we will design and implement a C programthat performs computations on large matrices. The size of a matrixis large enough so that the execution of program causes paging.Read p. 225 to p. 228, and p. 413 to p. 417 of Patterson &Hennessy textbook.
Purpose:
Different choices of copying the matrix may have different impactson the program runtime. You are required to notice such impacts andeventually propose a design that efficiently leverages themechanisms described below to achieve the best performance
Requirements:
1. Complete the given C program and use multiple optimizationmechanisms to improve the execution runtime.
-
You are required to use all the mechanisms discussed in theclass and recitation
-
You are only allowed to use standard libraries and intrinsiclibrary to implement your program
-
You are not allowed to create new threads in yourimplementation.
-
When using gcc to compile your code, you are allowed to useoptimization level3(-O3).
Mechanisms:
-
Caching: You are required to try different cache block size inyour code and use the block size that gives you minimum runtimewhen integrated with other techniques (SIMD and superscalarmechanism)
-
SIMD: You are required to make use of Single InstructionMultiple Data (SIMD). It means performing a single operation onmultiple data points simultaneously.
-
Superscalar mechanisms: A superscalar processor executes morethan one instruction during a clock cycle by simultaneouslydispatching multiple instructions to different components on theprocessor.
Experiments:
-
Per attempted optimization, you can have different size ofmatrix (for example, n = 128, 256, 512, 1024, 2048), differentblock sizes (m = 2^x), different number of unrolled instructions,etc.
-
You need to measure the runtime of each experiment.
-
You need to verify the correctness of the attemptedoptimization.
#Implement the opitmized_dgemm in the C code
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include <string.h>
#include <math.h>
#include <immintrin.h>
#include <x86intrin.h>
#define ALIGN __attribute__ ((aligned (32)))
#define SIZE 1024
double ALIGN a[SIZE * SIZE];
double ALIGN b[SIZE * SIZE];
double ALIGN c[SIZE * SIZE];
double ALIGN c1[SIZE * SIZE];
// naïve matrix multiplication void dgemm(int n)
void dgemm(int n){
int i,j,k;
for(i=0; i<n; i++){
for(j=0; j<n; j++){
double cij =0;
for(k=0; k<n;k++)
cij = cij + a[i*n+k] * b[k*n+j];
c1[i*n+j] =cij;
}
}
}
void optimized_dgemm(int n){
}
void main(int argc, char** argv){
int i, j;
time_t t;
struct timeval start, end;
double elapsed_time;
int check_correctness = 0;
int correct = 1;
if(argc > 1){
if(strcmp(argv[1], “corr”) ==0){
check_correctness = 1;
}
}
/* Initialize random number generator */
srand((unsigned) time(&t));
/* Populate the arrays with random values */
for(i=0; i< SIZE; i++){
for(j=0; j< SIZE; j++){
a[i* SIZE +j] =(double)rand() / (RAND_MAX + 1.0);
b[i* SIZE +j] =(double)rand() / (RAND_MAX + 1.0);
c[i* SIZE +j] =0.0;
c1[i* SIZE +j] =0.0;
}
}
gettimeofday(&start, NULL);
/* Call you optimized function optimized_dgemm */optimized_dgemm(SIZE);
gettimeofday(&end, NULL);
}
void dgemm_unrolling(int n){
int i,j,k;
for(i=0; i<n; i++){
for(j=0; j<n; j++){
double cij =0;
for(k=0; k<n;k+=4){
double s1 = a[i*n+k] * b[k*n+j];
double s2 = a[i*n+(k+1)] * b[(k+1)*n+j];
double s3 = a[i*n+(k+2)] * b[(k+2)*n+j];
double s4 = a[i*n+(k+3)] * b[(k+3)*n+j];
cij += s1 + s2 + s3 + s4;
}
c[i*n+j] =cij;
}
}
}
Expert Answer
Answer to We consider a special case of matrix multiplication: C := C + A*B where A, B, and C are n x n matrices. This can be per… . . .
OR

