Asst. Prof. at Ankara Yildirim Beyazit University in
Computer Engineering Department.

Parallel Minimum Norm Solution of Sparse Block Diagonal Column Overlapped Underdetermined Systems
Underdetermined systems of equations in which the minimum norm solution needs to be
computed arise in many applications, such as geophysics, signal processing, and
biomedical engineering. In this article, we introduce a new parallel algorithm for
obtaining the minimum 2norm solution of an underdetermined system of equations.
The proposed algorithm is based on the Balance scheme, which was originally developed
for the parallel solution of banded linear systems.
The proposed scheme assumes a
generalized banded form where the coefficient matrix has column overlapped block
structure in which the blocks could be dense or sparse.
More


Solving Sparse Underdetermined Linear Least Squares Problems on Parallel Computing Platforms
Computing the minimum 2norm solution is essential for many areas such as geophysics, signal processing and biomedical engineering.
In this work, we present a new parallel algorithm for solving sparse underdetermined linear systems where the coefficient matrix is block diagonal with overlapping columns.
The proposed parallel approach handles the diagonal blocks independently and the reduced system involving the shared unknowns.
Experimental results show the effectiveness of the proposed scheme on various parallel computing platforms.
Slides 

A Novel Partitioning Method for Accelerating the Block Cimmino Algorithm
We propose a novel blockrow partitioning method in order to improve the convergence rate of the block Cimmino algorithm for solving general sparse linear systems of equations.
The convergence rate of the block Cimmino algorithm depends on the orthogonality among the block rows obtained by the partitioning method.
The proposed method takes numerical orthogonality among block rows into account by proposing a row innerproduct graph model of the coefficient matrix.
In the graph partitioning formulation defined on this graph model, the partitioning objective of minimizing the cutsize directly corresponds to minimizing the sum of interblock inner products between block rows thus leading to an improvement in the eigenvalue spectrum of the iteration matrix. This in turn leads to a significant reduction in the number of iterations required for convergence.
Extensive experiments conducted on a large set of matrices confirm the validity of the proposed method against a stateoftheart method.
Paper


Minimizing Communication Through Computational Redundancy in Parallel Iterative Solvers
Sparse matrix vector multiplication (SpMxV) of the form y = Ax is a kernel operation in iterative linear solvers used in scientific applications. In these solvers, the SpMxV operation is performed repeatedly with the same sparse matrix through iterations until convergence. Depending on the matrix and its decomposition, parallel SpMxV operation necessitates communication among processors in the parallel environment. The communication can be reduced by intelligent decomposition. However, we can further decrease the communication through data replication and redundant computation. The communication occurs due to the transfer of xvector entries in rowparallel SpMxV computation. The input vector x of the next iteration is computed from the output vector of the current iteration through linear vector operations.
More

Numerically aware partitioning for Block CimminoNumerically aware rowblock partitioning code segment for Block Cimmino Iterative method in ABCD Solver. This is the C++ implementation of the work that is published in SIAM SISC 2018. You need to do patch the code I attached and modify the config_file.info as part_type 5. 
ParBaMiN: Parallel Balance Scheme Algorithm for Minimum Norm Solution of Underdetermined Systems
A distributed parallel algorithm for solving minimum 2norm solution of an underdetermined system on distributed memory high performance computing (HPC) platforms. This software is implemented in C++ programming languages and it uses MPI library for communication among processors.
Each processor concurrently and independently applies QR factorization on the submatrices. Sparse QR factorization of SuiteSparse (www.suitesparse.com) package is used in local sparse QR factorization operations.
The performance of the proposed method may increase by adding the parallelism mechanisms of multithreaded SuiteSparseQR and multithreaded BLAS for the local QR factorizations.
This algorithm can be considered as an scalable extension of any multithreaded general sparse QR factorization algorithm
to distributed memory architectures for computing minimum 2norm solution of underdetermined linear least squares problems.
ParBaMiN 
ParBaMiN_matlab: A Sequential MATLAB program for Parallel Balance Scheme Algorithm for Minimum Norm Solution of Underdetermined Systems
A simple MATLAB program for the solving Minimum Norm Solution of Sparse Block Diagonal Column Overlapped Underdetermined Systems.
It is a sequential version of Parallel Minimum Norm Solution of Sparse Block Diagonal Column Overlapped Underdetermined Systems.
Sparse QR factorization with Householder reflection of spqr routine in SuiteSparseQR package is used in local sparse QR factorization operations.
ParBaMiN_matlab 
EoCoE: Energy oriented Centre of ExcellenceThe project coordinates a panEuropean network covering 21 teams in 8 countries with a total budget of 5.5 M€. EoCoE assists the energy transition via targeted support to four renewable energy pillars: Meteo, Materials, Water and Fusion. These four pillars are anchored within a strong transversal multidisciplinary basis providing highend expertise in applied mathematics and HPC. 
Courses Given: 
Fall 20182019:
