University of Washington Tacoma CES Projects

Singular Value Decomposition using a Field Programmable Gate Array

I was honoured to have been selected to be a recipient of the Chancellor’s Undergraduate Research Scholar award for the 2013 summer and autumn quarters at UWT. The research focused on implementing SVD in a FPGA.
Sponsor: Jie Sheng, Ph.D.

Problem Statement

The research is based on the masters work: " AN FPGA-BASED SINGULAR VALUE DECOMPOSITION PROCESSOR" by Weiwei Ma. Singular Value Decomposition (SVD) is a particular factorization of an n x m matrix into three matrices into the form M = U(SIGMA)V*. Applications that take advantage of the SVD include signal processing, bioinformatics, and digital image processing. In implementing the SVD on a Field Programmable Gate Array (FPGA), the n x m matrix must be organized as a mesh-connected network of processors. Each one of those processors represents a 2 x 2 matrix, such that the proposed n x m matrix is diagonalized by a sequence of 2 x 2 SVDs.

Implement the algorithm using an Altera DE2 board featuring Altera Cyclone II 2C35 FPGA. Additionally, simulate the mesh-connected network of processors in Matlab and Java.

Where to begin?

What is SVD?

The first step in approaching the research was to understand precisely what the problem was that needed to be solved. While the mathematical definition of the SVD listed above is complete, it doesn't necessarily give a lot of insight into what is being accomplished.

An image, like the one on the right can be represented by a matrix of grey-scale values 0-255. The picture of the mathematician Gauss is a 340 x 280 pixel image with 256 shades of grey. When the singular value decomposition of the image is calculated, three resultant matrices are produced. One of which, is the matrix containing the singular values on the diagonal and all other values are zero. The diagonal singular values are listed in order of greatest to least. It turns out that the smaller singular values in the SVD come from the "boring" sections of the image, and they may be ignored. [3]

So, what does this mean? Imagine transmitting the data for this image. What if only a fraction of the data could be transmitted while preserving the integrity of the original? The image where k = 2 represents the image as reproduced from 2 singular values. It can be seen that when k = 32, there is almost no noticeable difference between the original image and the one reproduced from only 32 singular values.

The applications of SVD include signal processing, bioinformatics, and digital image processing. However, this example provided a little context to me on what is trying to be achieved and how it can be applied.

If you're interested in finding out more about SVD, try this link to a video that takes a light-hearted approach to explaining what it is and how it can be used: “It had to be U” – the SVD Song on (First played at the 9th Tartu Conference on Multivariable Statistics and 20th International Workshop on Matrices and Statistics in Tartu, Estonia 28 June, 2011.)

What is an FPGA?

A field programmable gate array is an integrated circuit designed to be programmed by the end user for very specific applications. FPGAs are composed of large sections of logic gates and RAM blocks. (see FPGA on wiki) The advantage of using a FPGA over using a standard microprocessor is that an FPGA is programmed for very specific operations without having to compete with numerous background tasks that always run on a GPOS. Additionally, while concurrency can be simulated on microprocessors through multi-threading, FPGAs can achieve true concurrency by running various processes simultaneously through the numerous logic gates and memory components programmed into the design.

Phase I - Implementing SVD on a 2 x 2 matrix

Brent, Luk, Van Loan [4] proposed a mesh-connected network of processors for handling SVD on a larger scale. However, to achieve the SVD through a network of processors, the SVD of a single 2 x 2 matrix must be first developed. They proposed an iterative process, based on multiple processors concurrently providing an SVD for their corresponding 2 x 2 matrix. Forsythe and Henrici [6] proposed the diagonalization of a n x n matrix by a sequence of 2 x 2 SVD’s.

The math involved is fairly straightforward on the handy TI-83. However, how can this be implemented on an FPGA that is based on mathematical operations through logic gates? Three trig functions needed to be developed for sin, cos, and atan.

To further break down the problem being solved, the left and right rotation angles must be written in terms of the input values {(a, b), (c, d)} that compose the 2 x 2 matrix input.

Verilog Library of Programmable Modules (LPM’s) are extremely accurate but utilize a large percentage of the system resources. Which is to say that there are built in Libraries within the FPGA's development software that already facilitate trigonometric functions. However, it is noted that the single 2 x 2 matrix will take multiple trig functions to return a desired SVD. Additionally, concurrency is the goal, which means that multiple calculations should be occurring simultaneously. A single ATAN function in the LPM took up 20% of the system's resources alone. Therefore, COordinate Rotation DIgital Computer (CORDIC) modules needed to be developed for this specific application. The CORDIC algorithm works generally by a series of successive approximations that iteratively bring you closer to the true value through vector rotations. The number of iterations and the resolution of the binary value being used determine the margin of error produced from the calculations. The image below shows the generic CORDIC algorithm.

The CORDIC functions were developed in the FPGA for sin, cos, and atan. Input values were hard-coded into the FPGA to begin with to test the algorithm. Using the input values of {(1, 2), (3, 4)}, the input values were scaled by multiplying them by 2^16, which amounts to simply shifting the values to the left 16 places in binary. Scaled values were taken advantage of in all levels of the algorithm to speed up calculation time. The singular values returned from the FPGA were:
    [ A = -0.365859985       B = 0.000061035 ]
    [ C = 0.0000610351      D = 5.463378906 ]
The results produced by Matlab were:
    [ A = 5.4650      B = 0.0000 ]
    [ C = 0.0000      D = 0.3660 ]
The results produced by the FPGA were not normalized. The results should be all positive, and arranged in order of greatest -> least. This is fairly easily corrected by shifting around and inverting the rotation angles applied to the original matrix and running the algorithm again. The state machine for the 2x2 SVD algorithm can be seen below.

Phase II - Implementing SVD in a mesh-connected network of processors

With the functions in place to handle the SVD of a single 2 x 2 matrix, the next step is to begin integrating numerous instances of those processors and link them together. They are be linked in a mesh-connected network of processors. The general flow of information works iteratively again where the diagonal elements first calculate the left and right rotation angles and calculate the SVD locally. Those angles are propagated outward to neighbouring processors. The SVD for those neighbouring elements are calculated while the rotation angles are propagated further outwards until all elements have completed one iteration of the SVD. The flow of signals is represented below for a 8 x 8 matrix of values.

The next step is to trade the resultant data between neighbouring processors and repeat. To first understand the processes, I evaluated and adapted a Matlab script from the masters work this research was based on: " AN FPGA-BASED SINGULAR VALUE DECOMPOSITION PROCESSOR" by Weiwei Ma. Furthermore, I wrote a JAVA program that accomplished much the same effect, only it provided me a little more flexibility in altering various aspects of the algorithm to evaluate the results.

Unfortunately, what was not known when this research began was that the thesis it was based on did not provide a normalized output for the singular value matrix. After an additional quarter trying to further understand the theory behind the algorithm and produce a normalized output, I was unsuccessful. The singular value matrix produced could easily be normalized at the end by making all values positive and restructuring the matrix to be organized in order of greatest -> least. However, the matrices U and V* must also be produced in the process of arriving at the singular value matrix. As such, the singular values cannot be arbitrarily altered without also altering the two matrices U and V*.


Using a counter on the FPGA, the number of clock cycles to complete processes within the state machines that composed the 2 x 2 SVD algorithm was recorded. The maximum clock frequency of the FPGA is 50 MHz, which amounts to 0.02 micro-seconds per clock cycle. The process to produce the SVD of the 2 x 2 matrix took 227 clock cycles or 4.54 micro-seconds. I am confident that given more time to research the algorithm to interface the mesh-connected network of processors, I could produce the desired results. Although, I would have quickly run into issues with the resources taken up on a single FPGA. The 2 x 2 SVD algorithm, in the end, took up 20% of the system's resources. This percentage could have been decreased at the cost of accuracy of results. The resources used could also be decreased by sharing resources like the state machines that implement the CORDIC functions. That comes at a cost of slower run-time because true concurrency would not be taken advantage of.


     (2)  Eigenvalues and Singular Values (
     (3)  Linear Algebra – A Modern Introduction 3rd Ed. (David Poole)
     (4)  R.P. Brent, F. T. Luk, F. T. and C. Van Loan, Computation of the Singular Value Decomposition using mesh-connected processors,
            Journal of VLSI and computer systems, vol. 1, no. 3, pp. 242-270, 1985.
     (5)  J. Volder, “Binary Computation Algorithm for Coordinate Rotation and function Generation”, Convair Report IAT-1 148
            Aeroelectrics Group, June 1956.
     (6)  G. E. Forsythe and P. Henrici, “The Cyclic Jacobi Method for Computing the Principal Values of a complex Matrix”,
            Trans. Of the American Mathematical Society, 94(l):l-23, January, 1960.
     (7)  “It had to be U” – the SVD Song on (First played at the 9th Tartu Conference on Multivariable Statistics and
            20th International Workshop on Matrices and Statistics in Tartu, Estonia 28 June, 2011.)

My LinkedIn Page:

Faculty Sponsor:

Jie Sheng, Ph.D.