\item Size of LUTs reduced significantly (60MB to 200KB)
\item Size of LUTs reduced significantly (60MB to 200KB)
\item Siginifcantly enhances execution time (factor 3.5)
\item Siginifcantly enhances execution time (factor 3.5)
\item Improved BLER performance
\item Improved BLER performance (all simulation results have been updated)
\end{itemize}
\end{itemize}
\end{itemize}
\end{itemize}
...
@@ -555,7 +555,7 @@ In this section, the performance in terms of BLER and decoding latency of the cu
...
@@ -555,7 +555,7 @@ In this section, the performance in terms of BLER and decoding latency of the cu
In all simulations, we assume AWGN, QPSK modulation and 8-bit input LLRs, i.e. $-127$ until $+127$. The DLSCH coding procedure in 38.212 is used to encode/decode the TB and an error is declared if the TB CRC check failed. Results are averaged over at least $10\,000$ channel realizations.
In all simulations, we assume AWGN, QPSK modulation and 8-bit input LLRs, i.e. $-127$ until $+127$. The DLSCH coding procedure in 38.212 is used to encode/decode the TB and an error is declared if the TB CRC check failed. Results are averaged over at least $10\,000$ channel realizations.
The first set of simulations in Figure \ref{fig:bler-bg2-15} compares the current LDPC decoder implementation to the reference implementation developed by Kien. This reference implementation is called \textit{LDPC Ref} and uses the min-sum algorithm with 2 layers and 16 bit for processing. Our current optimized decoder implementation is referred to as \textit{LDPC Opt}. Moreover, reference results provided by Huawei are also shown.
The first set of simulations in Figure \ref{fig:bler-bg2-15} compares the current LDPC decoder implementation to the reference implementation developed by Kien. This reference implementation is called \textit{LDPC Ref} and uses the min-sum algorithm with 2 layers and 16 bit for processing. Our current optimized decoder implementation is referred to as \textit{LDPC OAI}. Moreover, reference results provided by Huawei are also shown.
\begin{figure}[ht]
\begin{figure}[ht]
\centering
\centering
...
@@ -574,67 +574,57 @@ The first set of simulations in Figure \ref{fig:bler-bg2-15} compares the curren
...
@@ -574,67 +574,57 @@ The first set of simulations in Figure \ref{fig:bler-bg2-15} compares the curren
@@ -642,12 +632,20 @@ The first set of simulations in Figure \ref{fig:bler-bg2-15} compares the curren
...
@@ -642,12 +632,20 @@ The first set of simulations in Figure \ref{fig:bler-bg2-15} compares the curren
\label{fig:bler-bg2-15}
\label{fig:bler-bg2-15}
\end{figure}
\end{figure}
From Figure \ref{fig:bler-bg2-15} it can be observed that the reference decoder outperforms the current implementation significantly for low to medium number of iterations. The reason is the implementation of 2 layers in the reference decoder, which results in faster convergence for punctured codes and hence requires less iterations to achieve a given BLER target. Note that there is a large performance loss of nearly 6 dB at BLER $10^{-2}$ between the Huawei reference and the current optimized decoder implementation with 5 iterations.
From Figure \ref{fig:bler-bg2-15} it can be observed that the reference decoder outperforms the current implementation significantly for low to medium number of iterations. The reason is the implementation of 2 layers in the reference decoder, which results in faster convergence for punctured codes and hence requires less iterations to achieve a given BLER target. Note that there is a large performance loss of about 4 dB at BLER $10^{-2}$ between the Huawei reference and the current optimized decoder implementation with 5 iterations.
Moreover, there is a gap of about 1.5 dB between the results provided by Huawei and the current decoder with 20 iterations. The reason is the min-sum approximation algorithm used in both the reference decoder and the current implementation. The gap can be closed by using a tighter approximation like the min-sum with normalization or the lambda-min approach. Moreover, the gap closes for higher code rates which can be observed from Figure \ref{fig:bler-bg2-r23}. The gap is only about 0.6 dB for 50 iterations.
Moreover, there is a gap of about 1.5 dB between the results provided by Huawei and the current decoder with 20 iterations. The reason is the min-sum approximation algorithm used in both the reference decoder and the current implementation. The gap can be closed by using a tighter approximation like the min-sum with normalization or the lambda-min approach. Moreover, the gap closes for higher code rates which can be observed from Figure \ref{fig:bler-bg2-r23}. The gap is only about 0.6 dB for 50 iterations.
Concerning the LDPC decoder provided by MATLAB, the performance appears to be rather inconsistent. For 5 iterations, the MATLAB decoder outperforms the optimized decoder most likely due to a tighter approximation used in the check node processing. However, it is inferior to the reference algorithm which suggests that the MATLAB decoder is not optimized for punctured LDPC codes, i.e. no layered processing. For 50 iterations the MATLAB LDPC decoder shows a strange behavior, the slope of the BLER curve is not as expected. This suggests that there might be some internal decoder problems with the NR base graph 2.
The Matlab results denoted \texttt{MATLAB NMS} are obtained with the function \texttt{nrLDPCDecode} provided by the MATLAB 5G Toolbox R2019b. The following options are provided to the function: \texttt{'Termination','max','Algorithm','Normalized min-sum','ScalingFactor',1}. Furthermore, the 8-bit input LLRs are adapted to fit the dynamic range of \texttt{nrLDPCDecode} which is shown in Listing \ref{ldpc_matlab}.
\begin{lstlisting}[frame=single,caption={Input adaptation for MATLAB LDPC Decoder},label=ldpc_matlab]
maxLLR = max(abs(softbits));
rxLLRs = round((softbits/maxLLR)*127);
// adjust range to fit tanh use in decoder code
softbits = rxLLRs/3.4;
\end{lstlisting}
A scaling factor (SF) of 1 has been chosen to compare the results more easily with the \textit{LDPC OAI} since the resulting check node processing is the same. However, the Matlab normelized min-sum algorithm uses layered processing and floating point operations. Thus, for the same number of iterations, the performance is significantly better than \textit{LDPC OAI}, especially for small a number of iterations.
\begin{figure}[ht]
\begin{figure}[ht]
\centering
\centering
...
@@ -681,8 +679,8 @@ Concerning the LDPC decoder provided by MATLAB, the performance appears to be ra
...
@@ -681,8 +679,8 @@ Concerning the LDPC decoder provided by MATLAB, the performance appears to be ra
\legend{{Huawei 2017-06-15}\\
\legend{{Huawei 2017-06-15}\\
{LDPC Opt 5 iter}\\
{LDPC OAI 5 iter}\\
{LDPC Opt 50 iter}\\};
{LDPC OAI 50 iter}\\};
\end{semilogyaxis}
\end{semilogyaxis}
\end{tikzpicture}
\end{tikzpicture}
...
@@ -690,7 +688,57 @@ Concerning the LDPC decoder provided by MATLAB, the performance appears to be ra
...
@@ -690,7 +688,57 @@ Concerning the LDPC decoder provided by MATLAB, the performance appears to be ra
\label{fig:bler-bg2-r23}
\label{fig:bler-bg2-r23}
\end{figure}
\end{figure}
Figure \ref{fig:bler-bg1-r89} shows the performance of BG1 with largest block size of $B=8448$ and highest code rate $R=8/9$.
In Figure \ref{fig:bler-bg2-15-2} we compare the performance of different algorithms using at most 50 iterations with early stopping if the parity check passes. The Matlab layered believe propagation (LBP) is used with unquantized input LLRs and performs the best since no approximation is done in the processing. Both NMS and offset min-sum (OMS) use a scaling factor and offset, respectively, that has been empirically found to perform best in this simulation setting. Theirs performance is very close to the BLP and OMS is slightly better than NMS. The performance of \textit{LDPC OAI} is more than 1 dB worse mainly because of the looser approximation. Moreover, the NMS algorithm with SF=1 performs worst probably because the SF is not optimized for the input LLRs. From the results in Figure \ref{fig:bler-bg2-15-2} we can conclude that the performance of the \textit{LDPC OAI} can be significantly improved by adopting an offset min-sum approximation improving the performance to within 0.3dB of the Huawei reference curve.
\caption{BLER vs. SNR, BG2, Rate=1/5, max iterations = 50, B=1280.}
\label{fig:bler-bg2-15-2}
\end{figure}
Figure \ref{fig:bler-bg1-r89} shows the performance of BG1 with largest block size of $B=8448$ and highest code rate $R=8/9$. From Figure \ref{fig:bler-bg1-r89} it can be observed that the performance gap is only about 0.3 dB if 50 iterations are used. However, for 5 iterations there is still a significant performance loss of about 2.3 dB at BLER $10^{-2}$.
\begin{figure}[ht]
\begin{figure}[ht]
\centering
\centering
...
@@ -718,8 +766,8 @@ Figure \ref{fig:bler-bg1-r89} shows the performance of BG1 with largest block si
...
@@ -718,8 +766,8 @@ Figure \ref{fig:bler-bg1-r89} shows the performance of BG1 with largest block si
@@ -727,8 +775,6 @@ Figure \ref{fig:bler-bg1-r89} shows the performance of BG1 with largest block si
...
@@ -727,8 +775,6 @@ Figure \ref{fig:bler-bg1-r89} shows the performance of BG1 with largest block si
\label{fig:bler-bg1-r89}
\label{fig:bler-bg1-r89}
\end{figure}
\end{figure}
From \ref{fig:bler-bg1-r89} it can be observed that the performance gap is only about 0.3 dB if 50 iterations are used. However, for 5 iterations there is still a significant performance loss of about 2.3 dB at BLER $10^{-2}$.
\newpage
\newpage
\subsection{Decoding Latency}
\subsection{Decoding Latency}
\label{sec:decoding-time}
\label{sec:decoding-time}
...
@@ -766,7 +812,7 @@ The results in Table \ref{tab:lat-bg2-r15} show the impact of the number of iter
...
@@ -766,7 +812,7 @@ The results in Table \ref{tab:lat-bg2-r15} show the impact of the number of iter