More documentation, Matlab BLER results to be updated.

4012f63b · sebastian · 72999a10 · 4012f63b · 4012f63b · 4012f63b
Commit 4012f63b authored Sep 30, 2019 by sebastian
5 changed files
--- a/openair1/PHY/CODING/nrLDPC_decoder/doc/nrLDPC/nrLDPC.pdf
+++ b/openair1/PHY/CODING/nrLDPC_decoder/doc/nrLDPC/nrLDPC.pdf
--- a/openair1/PHY/CODING/nrLDPC_decoder/doc/nrLDPC/nrLDPC.tex
+++ b/openair1/PHY/CODING/nrLDPC_decoder/doc/nrLDPC/nrLDPC.tex
@@ -133,8 +133,7 @@
 \item Bug fixes:
  \begin{itemize}
  \item Fixed bug in function \texttt{llr2CnProcBuf}
-  \item Introduced saturation to $-127$ in \texttt{bnProc}
-  \item Corrected input LLR dynamic range in simulation
+  \item Corrected input LLR dynamic range in BLER simulations
  \end{itemize}
 \item Results:
  \begin{itemize}
@@ -321,7 +320,7 @@ The implementation on a general purpose processor (GPP) has to take advantage of

    \draw (bnProcPc)       edge[connector] (bnProc);
    \draw (bnProc)         edge[connector] (bn2cnProcBuf);
-    \draw (bn2cnProcBuf)   edge[connector] (cnProc);
+    \draw (bn2cnProcBuf)   edge[connector] node[above left] {\texttt{cnProcPc}} (cnProc);
    \draw (cnProc)         edge[connector] (cn2bnProcBuf);
    \draw (cn2bnProcBuf)   edge[connector] (bnProcPc);

@@ -344,7 +343,7 @@ The implementation on a general purpose processor (GPP) has to take advantage of
  \caption{LDPC Decoder processing flow.}
 \end{figure}

-The functions involved are described in more detail in table \ref{tab:sum_func}.
+The functions involved are described in more detail in Table \ref{tab:sum_func}.

 \begin{table}[ht]
  \centering
@@ -554,7 +553,7 @@ In this section, the performance in terms of BLER and decoding latency of the cu
 \subsection{BLER Performance}
 \label{sec:bler-performance}

-In all simulations, we assume AWGN, QPSK modulation and 8-bit input LLRs, i.e. $-127$ until $+127$. The results are averaged over at least $10\,000$ channel realizations.
+In all simulations, we assume AWGN, QPSK modulation and 8-bit input LLRs, i.e. $-127$ until $+127$. The DLSCH coding procedure in 38.212 is used to encode/decode the TB and an error is declared if the TB CRC check failed. Results are averaged over at least $10\,000$ channel realizations. 

 The first set of simulations in Figure \ref{fig:bler-bg2-15} compares the current LDPC decoder implementation to the reference implementation developed by Kien. This reference implementation is called \textit{LDPC Ref} and uses the min-sum algorithm with 2 layers and 16 bit for processing. Our current optimized decoder implementation is referred to as \textit{LDPC Opt}. Moreover, reference results provided by Huawei are also shown.

@@ -850,7 +849,7 @@ It is often unnecessary to carry out the maximum number of iterations. After eac
 \section{Conclusion}
 \label{sec:conclusion}

-The results in the previous sections show that the current optimized LDPC implementation full-fills the requirements in terms of decoding latency for low to medium number of iterations at the expanse of a significant loss in BLER performance. To improve BLER performance, it is recommended to implement a layered algorithm and a min-sum algorithm with normalization. Further improvements upon the current implementation are detailed in the next section.
+The results in the previous sections show that the current optimized LDPC implementation full-fills the requirements in terms of decoding latency for low to medium number of iterations at the expense of a loss in BLER performance. To improve BLER performance, it is recommended to implement a layered algorithm and a min-sum algorithm with normalization. Further improvements upon the current implementation are detailed in the next section.

 \newpage
 \section{Future Work}
@@ -889,12 +888,16 @@ The following improvements will reduce the decoding latency:

 \begin{itemize}
 \item Adapt to AVX512
+\item Optimization of CN processing
 \item Implement 2/3-layers for faster convergence
 \end{itemize}

 \paragraph{AVX512:}
 The computations in the CN and BN processing can be further accelerated by using AVX512 instructions. This improvement will speed-up the CN and BN processing by a approximately a factor of 2.

+\paragraph{Optimization of CN Processing:}
+It can be investigated if CN processing can be improved by computing two minima regardless of the number of BNs. Susequently, the (absolute) value fed back to the BN is one of those minima.
+
 \paragraph{Layered processing:}
 The LDPC code in NR always punctures the first 2 columns of the base graph. Hence, the decoder inserts LLRs with value 0 at their place and needs to retrieve those bits during the decoding process. Instead of computing all the parity equations and then passing the results to the BN processing, it is beneficial to first compute parity equations where at most one punctured BN is connected to that CN. If two punctured BNs are connected than according to \eqref{eq:40}, the result will be again 0. Thus in a first sub-iteration those parity equation are computed and the results are send to BN processing which calculates the results using only those rows in the PCM. In the second sub-iteration the remaining check equation are used.
 The convergence of this layered approach is much fast since the bit can be retrieved more quickly while the decoding complexity remains the same. Therefore, for a fixed number of iterations the layered algorithm will have a significantly better performance.

--- a/openair1/PHY/CODING/nrLDPC_decoder/nrLDPC_decoder.c
+++ b/openair1/PHY/CODING/nrLDPC_decoder/nrLDPC_decoder.c
@@ -38,7 +38,7 @@
 #include "nrLDPC_cnProc.h"
 #include "nrLDPC_bnProc.h"

-//#define NR_LDPC_ENABLE_PARITY_CHECK
+#define NR_LDPC_ENABLE_PARITY_CHECK
 //#define NR_LDPC_PROFILER_DETAIL

 #ifdef NR_LDPC_DEBUG_MODE

--- a/openair1/PHY/CODING/nrLDPC_decoder/nrLDPC_init.h
+++ b/openair1/PHY/CODING/nrLDPC_decoder/nrLDPC_init.h
@@ -22,7 +22,7 @@
 /*!\file nrLDPC_init.h
 * \brief Defines the function to initialize the LDPC decoder and sets correct LUTs.
 * \author Sebastian Wagner (TCL Communications) Email: <mailto:sebastian.wagner@tcl.com>
- * \date 27-03-2018
+ * \date 30-09-2019
 * \version 1.0
 * \note
 * \warning

--- a/openair1/PHY/CODING/nrLDPC_decoder/nrLDPC_mPass.h
+++ b/openair1/PHY/CODING/nrLDPC_decoder/nrLDPC_mPass.h
@@ -77,6 +77,20 @@ static inline void *nrLDPC_circ_memcpy(int8_t *str1, const int8_t *str2, uint16_

 /**
   \brief Copies the input LLRs to their corresponding place in the LLR processing buffer.
+   Example: BG2
+             | 0| 0| LLRs -->                                    |
+   BN Groups |22|23|10| 5| 5|14| 7|13| 6| 8| 9|16| 9|12|1|1|...|1|
+              ^---------------------------------------/----     /
+                            _________________________/    |    /
+                           /  ____________________________|___/
+                          /  /                            \
+   LLR Proc Buffer (BNG) | 1| 5| 6| 7| 8| 9|10|12|13|14|16|22|23|
+   Number BN in BNG(R15) |38| 2| 1| 1| 1| 2| 1| 1| 1| 1| 1| 1| 1|
+   Idx:                  0  ^                             ^  ^
+          38*384=14592 _____|   ...                       |  |
+          50*384=19200 -----------------------------------   |
+          51*384=19584 --------------------------------------
+
   \param p_lut Pointer to decoder LUTs
   \param llr Pointer to input LLRs
   \param p_procBuf Pointer the processing buffers
@@ -308,6 +322,21 @@ static inline void nrLDPC_llr2CnProcBuf_BG1(t_nrLDPC_lut* p_lut, int8_t* llr, t_

 /**
   \brief Copies the input LLRs to their corresponding place in the CN processing buffer for BG2.
+   Example: BG2
+             | 0| 0| LLRs -->                                    |
+   BN Groups |22|23|10| 5| 5|14| 7|13| 6| 8| 9|16| 9|12|1|1|...|1|
+
+
+   CN Processing Buffer (CNGs) | 3| 4| 5| 6| 8|10|
+   Number of CN per CNG (R15)  | 6|20| 9| 3| 2| 2|
+                               0  ^     ^\  \
+            3*6*384=6912 _________|     ||   \_____________
+            (3*6+4*20+5*9)*384=54912____||                 \
+                                     Bit | 1| 2| 3| 4| 5| 6|
+                                 3*Z CNs>|  |<
+                                            ^
+                         54912 + 3*384______|
+
   \param p_lut Pointer to decoder LUTs
   \param llr Pointer to input LLRs
   \param p_procBuf Pointer to the processing buffers