DESIGN OF HYBRID QRD ARCHITECTURE USING MIMO-OFDM SYSTEMS

Aakash.K¹, Manohar.G.M², Revathi.T³, Sowmiya.T⁴
¹Department of Electronics and Communication Engineering, Aksheyaa College of Engineering, Puluvakkam, Kancheepuram, Tamilnadu, India
²Department of Electronics and Communication Engineering, Aksheyaa College of Engineering, Puluvakkam, Kancheepuram, Tamilnadu, India
³Department of Electronics and Communication Engineering, Aksheyaa College of Engineering, Puluvakkam, Kancheepuram, Tamilnadu, India
⁴Department of Electronics and Communication Engineering, Aksheyaa College of Engineering, Puluvakkam, Kancheepuram, Tamilnadu, India

Abstract
QR decomposition has been widely used in many signal processing applications to solve linear inverse problems. However, QR decomposition is considered a computationally expensive process, and its sequential implementations fail to meet the requirements of many time-sensitive applications. We propose a deeply pipelined reconfigurable architecture that can be dynamically configured to perform either approach in a manner that takes advantage of the strengths of each. At runtime, the input matrix is first partitioned into numerous sub matrices. This paper proposes a fully parallel VLSI architecture under fixed-precision for the inverse computation of a real square matrix using QR decomposition with Hybrid QRD Gram-Schmidt orthogonalization. The Hybrid QRD based on algorithm is stable and accurate to the integral multiples of machine precision under fixed-precision for a well-conditioned non-singular matrix. For typical matrices (4x4) found in MIMO communication systems, the proposed architecture was able to achieve a clock latency of 38 clocks.

Keywords:—Architecture, FPGA, QR decomposition, Gram-Schmidt.

1. INTRODUCTION

Due to significant performance gains provided by MIMO, it is being widely adopted in most of the current and next generation wireless communication systems. To exploit the full potential of gains offered by MIMO, computationally efficient design of a wireless baseband communication receiver has become difficult and challenging. Signal processing circuits involved in a MIMO receiver have to be designed for high data throughput and low latency owing to their application in real-time wireless systems. The computational accuracy of signal detection has direct consequence on the throughput and reliability achieved in the receiver.

QR decomposition has been widely used in many signal processing applications such as MIMO systems [1], beam forming [2] and image recovery [3] to calculate the inverse of matrices or solve linear systems. However, its inherent computational complexity makes it unlikely to satisfy the requirements of many time-sensitive designs, especially when the system operates on a large-scale dataset.

The Gram-Schmidt process, Householder transformation and Givens rotation are known as the most popular algorithms for QR decomposition [4], among which, the Householder transformation and the Givens rotation are considered numerical stable algorithms, while the Gram-Schmidt process provides an opportunity to perform successive orthogonalizations. Parallel designs have been previously investigated to accelerate QR decomposition on traditional multi-core systems [5], [6], GPUs [7] and reconfigurable computing platforms [8].

In this paper, we propose a reconfigurable architecture for QR decomposition, which can be dynamically configured to perform Gram-Schmidt are deeply pipelined. To process large data sets, the input matrix is partitioned into multiple columns and rows of sub-matrices. The sub-matrix columns and rows are processed successively.

The rest of this paper is organized as follows: Section II offers a brief overview of the channel model for MIMO systems and the need for matrix. Section III explains QR decomposition based on hybrid QRD using Gram-Schmidt algorithm. Section IV presents the proposed architecture for hybrid QR decomposition. Section V discusses the latency, operations involved and the throughput achieved. Section VI concludes with the summary of key results of this paper.
2. MIMO SYSTEMS

The block diagram of a N transmit and M receive antennae MIMO system is shown in Fig.1.

The channel model under a flat fading channel condition is given by Eq.1.

\[ Y = HX + Z \quad (1) \]

where, Y is a (M x T) complex received matrix, H is a (M x N) complex channel matrix, X is a (N x T) transmitted matrix whose elements are taken from a complex modulation constellation, and Z is a (M x T) complex additive white Gaussian noise matrix. Here, T is the number of symbol periods over which data is being transmitted. A good summary for these technique can be found in [2]. As M, N increases beyond four, a MIMO system gains very little in performance for a disproportionate increase in receiver complexity. A typical MIMO system has M, N ≤ 4 as performance gain achieved beyond four antennae is insignificant compared to the increased receiver complexity.

3. PROPOSED HYBRID QR DECOMPOSITION

QR decomposition is one of the popular matrix factorization methods. QR decomposition of an m x n matrix A has a form given by eq. (2)

\[ A = QR \quad (2) \]

where Q is an m x m matrix, which is an orthogonal matrix such that Q^T Q = I and R is an m x n upper triangular matrix. This amounts to finding Orthonormal basis for an(A) [9], [10]. QR decomposition can be used to solve full rank least squares problem. The original matrix A and decomposed matrices Q and R for m = n = 4 are represented by Eqs.3-5.
Gram-Schmidt orthogonalization [9], [10] is a direct method to compute Q and R. The Gram-Schmidt process for a matrix A proceeds as:

\[ A = \begin{bmatrix}
    a_{11} & a_{12} & a_{13} & a_{14} \\
    a_{21} & a_{22} & a_{23} & a_{24} \\
    a_{31} & a_{32} & a_{33} & a_{34} \\
    a_{41} & a_{42} & a_{43} & a_{44}
\end{bmatrix} \]  

(3)

\[ Q= \begin{bmatrix}
    q_{11} & q_{12} & q_{13} & q_{14} \\
    q_{21} & q_{22} & q_{23} & q_{24} \\
    q_{31} & q_{32} & q_{33} & q_{34} \\
    q_{41} & q_{42} & q_{43} & q_{44}
\end{bmatrix} \]  

(4)

\[ R = \begin{bmatrix}
    r_{11} & r_{12} & r_{13} & r_{14} \\
    0 & r_{22} & r_{23} & r_{24} \\
    0 & 0 & r_{33} & r_{34} \\
    0 & 0 & 0 & r_{44}
\end{bmatrix} \]  

(5)

The input matrix A is subdivided into sub-matrices, where \( a_{i1} \) is the first column vectors of matrix A, \( a_{i2} \) is the second column vectors of matrix A, \( a_{i3} \) and \( a_{i4} \) are the first column vectors of the matrix Q and \( a_{i5} \) is the second column vectors of the matrix Q.

To find Q matrix we need to find u, v and w. where \( u_i \) is the first column of the input matrix A, \( u_2 \) is the second column of the input matrix A, \( w_{11} \) and \( w_{22} \) are the first column vectors of the matrix Q and \( w_{12} \) is the second column vectors of the matrix Q.

To find R matrix we need to find \( r_{jj} \), where \( r_{jj} = u_j w_j \).

Hence input matrix A is proved by multiplying matrix Q and R. Similarly can find A1, A2, A2, A3 by these method.

4. ARCHITECTURE FOR HYBRID QR DECOMPOSITION

In Hybrid QR Decomposition block diagram, input matrix is divided into m x n sub-matrices [8]. Here we taken 4x4 matrix as a input, it is divided into four 2x2 sub-matrices. Among these we solved for one 2x2 sub-matrix.

To find Q matrix we need to find u, v and w. where \( u_1 \) is the first column of the input matrix A, \( u_2 \) is the second column of the input matrix A.\( w_v \) is square root of \( v_{11} + v_{22} \). Where \( v_1 \) is the first column of the matrix Q and \( v_2 \) is the second column of the matrix Q.

To find R matrix we need to find \( r_{jj} \) where \( r_{jj} = u_j w_j \). Here upper triangular matrix is used, hence \( r_{11} = 0 \). Now we can find R matrix by product of u and w.

To find A matrix multiply the matrix Q and R. By using sub-matrices we can solve matrices easily and it requires less time.

5. IMPLEMENTATION AND RESULT

Our design is implemented in verilog HDL on Xilinx 9.1i XSE. Our architecture uses 4-bits input values that are used for multipliers, subtractors, dividers, square roots for Gram-Schmidt process. In QR Decomposition method, it is more efficient.
complexity to find the orthogonal matrix and upper triangular matrix. By using Hybrid QR Decomposition we can reduce the clock latency than other methods. In Fig. 3, it is shown that the clock latency reduced to 38 clocks than other methods. In Ref.[11] the clock latency is given as 88 clocks, as well as in Ref.[12] the clock latency is given as 67 clocks. Hence our design has less latency than other methods.

An RTL (Register Transfer Logic) view of finding Q matrix is shown in Fig.4. By using input matrix A we can find Q matrix using Gram-Schmidt algorithm.

In an TABLE I, the comparison results are shown. The parameters are shown that the clock latency, order, technology and algorithm which are used in this paper and Ref.[11] and Ref.[12].

![Fig. 3 Comparison of clock latencies](image)

**Table 1 Comparison of implementation results**

![Fig. 4 RTL view of finding Q matrix.](image)

![Fig. 5 RTL view of Q and R matrix](image)

6. CONCLUSION

This paper has presented an implementation of the Hybrid QR decomposition based on Gram-Schmidt algorithm for MIMO-OFDM systems. The Hybrid QR Decomposition divides the input matrix into mxn sub-matrices. The architecture of Hybrid QR decomposition reduces the hardware cost. The Hybrid QR Decomposition reduces the clock latency than QR Decomposition. The proposed architecture is implemented in ModelSim-Altera 6.3g_p1 (Quartus II 8.1) and verified by Xilinx 9.1i XSE.

REFERENCES


[8]. Xinying Wang, Philip and Joseph Zambreno "A Reconfigurable Architecture for QR Decomposition using a hybrid approach" 2014 IEEE computer society annual symposium on VLSI.


[11]. Kuang-Hao Lin, Robert, Chang, Member, IEEE, Chien-Lin Huang, Feng-Chi Chen, and Shin-Chun Lin "Implementation Of QR Decomposition For MIMO-OFDM Detection Systems" This work was supported in part by the National Science Council (NSC), Taiwan, R.O.C. under Grant NSC 96-2220-E-005-004 and in part by the Ministry of Education, Taiwan, R.O.C. under the ATU plan. The authors would like to thank the National Chip Implementation Center (CIC) of Taiwan for technical support.