Commit 72999a10 authored by sebastian's avatar sebastian

Clean up and documentation

parent 6f351c3d
......@@ -118,6 +118,34 @@
\end{tabular}
}
\paragraph{Version 1.0:}
\begin{itemize}
\item Initial version
\end{itemize}
\paragraph{Version 2.0:}
\begin{itemize}
\item Enhancements in message passing:
\begin{itemize}
\item LUTs replaced by smaller BG-specific parameters
\item Inefficient load/store replaced by circular memcpy
\end{itemize}
\item Bug fixes:
\begin{itemize}
\item Fixed bug in function \texttt{llr2CnProcBuf}
\item Introduced saturation to $-127$ in \texttt{bnProc}
\item Corrected input LLR dynamic range in simulation
\end{itemize}
\item Results:
\begin{itemize}
\item Size of LUTs reduced significantly (60MB to 200KB)
\item Siginifcantly enhances execution time (factor 3.5)
\item Improved BLER performance
\end{itemize}
\end{itemize}
\newpage
\tableofcontents
\newpage
......@@ -327,6 +355,7 @@ The functions involved are described in more detail in table \ref{tab:sum_func}.
\texttt{llr2llrProcBuf} & Copies input LLRs to LLR processing buffer \\
\texttt{llr2CnProcBuf} & Copies input LLRs to CN processing buffer \\
\texttt{cnProc} & Performs CN signal processing \\
\texttt{cnProcPc} & Performs parity check \\
\texttt{cn2bnProcBuf} & Copies the CN results to the BN processing buffer \\
\texttt{bnProcPc} & Performs BN processing for parity check and/or hard-decision \\
\texttt{bnProc} & Utilizes the results of \texttt{bnProcPc} to compute LLRs for CN processing \\
......@@ -514,24 +543,7 @@ The sum of the LLRs is carried out in 16 bit for accuracy and is then saturated
\subsection{Mapping to the Processing Buffers}
\label{sec:mapp-cn-proc}
For efficient processing with the AVX instructions, the data is required to be aligned in a certain manner. That is the reason why processing buffers have been introduced. The drawback is that the results of the processing need to copied every time to the processing buffer of the next task. However, the speed up in computation with AVX more than makes up for the time wasted in copying data. The copying is implemented using look-up tables (LUTs) which are described in table \ref{tab:sum_lut}.
\begin{table}[ht]
\centering
\begin{tabular}{ll}
\toprule
\textbf{LUT} & \textbf{Description} \\
\midrule
\texttt{lut\_llr2llrProcBuf\_BGX\_ZX\_RX} & Indices for function \texttt{llr2llrProcBuf} \\
\texttt{lut\_llr2CnProcBuf\_BGX\_ZX\_RX} & Indices for function \texttt{llr2CnProcBuf} \\
\texttt{lut\_cn2bnProcBuf\_BGX\_ZX\_RX} & Indices for functions \texttt{cn2bnProcBuf} and \texttt{bn2cnProcBuf} \\
\bottomrule
\end{tabular}
\caption{Summary of the LUTs.}
\label{tab:sum_lut}
\end{table}
These LUTs are depending on the BG, the lifting size and the code rate. Assuming 5 rates for BG2 and 7 rates for BG1, the total number of LUTs is 617.
For efficient processing with the AVX instructions, the data is required to be aligned in a certain manner. That is the reason why processing buffers have been introduced. The drawback is that the results of the processing need to copied every time to the processing buffer of the next task. However, the speed up in computation with AVX more than makes up for the time wasted in copying data. The copying is implemented as a circular memcpy because every edge in the BG is a circular shift of a $Z\times Z$ identity matrix. Hence, a circular mempcy consists of two regular memcpys each copying a part of the $Z$ values depending on the circular shift in the BG definition. The circular shifts are stored in \texttt{nrLDPC\_lut.h} in arrays \texttt{circShift\_BGX\_ZX\_CNGX}. In the specification there are only 8 sets of cirular shifts defined. However, the applied circular shift depends on $Z$, i.e. modulo $Z$. To avoid inefficient modulo operations in loops, we store the the circular shift values for every $Z$. Moreover, for convinience the arrays are already arranged depending on the CN group (CNG).
\newpage
\section{Performance Results}
......@@ -595,7 +607,10 @@ The first set of simulations in Figure \ref{fig:bler-bg2-15} compares the curren
% 5 iterations
\addplot[red, solid, mark=o] plot coordinates {(-1.250000,0.781300) (-1.000000,0.421000) (-0.750000,0.140400) (-0.500000,0.028900) (-0.250000,0.003300) (0.000000,0.000300) (0.250000,0.000000) (0.500000,0.000000)};
%\addplot[blue, solid, mark=square] plot coordinates {(-0.250000,0.705000) (0.000000,0.406200) (0.250000,0.181300) (0.500000,0.061600) (0.750000,0.015900) (1.000000,0.004900) (1.250000,0.000900) (1.500000,0.000200)};
\addplot[blue, solid, mark=square] plot coordinates {(-1.000000,0.700400) (-0.750000,0.370600) (-0.500000,0.136000) (-0.250000,0.039000) (0.000000,0.008500) (0.250000,0.002400) (0.500000,0.000700) (0.750000,0.000000) (1.000000,0.000000) };
%\addplot[blue, solid, mark=square] plot coordinates {(-1.000000,0.700400) (-0.750000,0.370600) (-0.500000,0.136000) (-0.250000,0.039000) (0.000000,0.008500) (0.250000,0.002400) (0.500000,0.000700) (0.750000,0.000000) (1.000000,0.000000) };
% with saturation
\addplot[blue, solid, mark=square] plot coordinates {(-1.000000,0.693730) (-0.750000,0.370190) (-0.500000,0.137260) (-0.250000,0.038850) (0.000000,0.009740) (0.250000,0.002510) (0.500000,0.000730) (0.750000,0.000180) };
%\addplot[blue, dotted, mark=square] plot coordinates {(-1.000000,0.695700) (-0.750000,0.374750) (-0.500000,0.136140) (-0.250000,0.038240) (0.000000,0.010350) (0.250000,0.002490) (0.500000,0.000610) (0.750000,0.000190) (1.000000,0.000050) };
\addplot[green, solid, mark=triangle] plot coordinates {(-1.000000,0.778900) (-0.500000,0.226400) (0.000000,0.027400) (0.500000,0.002600) (1.000000,0.000300) };
......@@ -646,7 +661,7 @@ Concerning the LDPC decoder provided by MATLAB, the performance appears to be ra
\pgfplotsset{every axis/.append style={mark options=solid, mark size=2.5pt}}
\begin{semilogyaxis}[title={}, xlabel={$\SNR$ [dB]}, ylabel={BLER},
grid={both}, xmin=3, xmax=5.5, xtick={3,3.5,...,5.5}, ymin=0,
grid={both}, xmin=3, xmax=6.5, xtick={3,3.5,...,6.5}, ymin=0,
ymax=1,ytickten={-5,-4,-3,-2,-1,0},legend columns=1]
% Kien's 2-layer 16bit code
......@@ -656,11 +671,14 @@ Concerning the LDPC decoder provided by MATLAB, the performance appears to be ra
\addplot[black, solid] plot coordinates { (3.28392,0.01) (3.73319,0.0001) };
% LDPC opt with 16bit BN processing
\addplot[blue, solid, mark=square] plot coordinates {(4.000000,0.487500) (4.250000,0.163400) (4.500000,0.029800) (4.750000,0.002700) (5.000000,0.000100)};
%\addplot[blue, solid, mark=square] plot coordinates {(4.000000,0.487500) (4.250000,0.163400) (4.500000,0.029800) (4.750000,0.002700) (5.000000,0.000100)};
\addplot[blue, solid, mark=square] plot coordinates {(5.000000,0.439600) (5.250000,0.185800) (5.500000,0.062100) (5.750000,0.015000) (6.000000,0.003900)};
%\addplot[blue, dashed, mark=triangle] plot coordinates {(4.000000,0.487500) (4.250000,0.163700) (4.500000,0.030000) (4.750000,0.002900) (5.000000,0.000100)};
\addplot[blue, dashed, mark=square] plot coordinates {(3.000000,0.911600) (3.250000,0.614100) (3.500000,0.230100) (3.750000,0.036900) (4.000000,0.001100) (4.250000,0.000000) (4.500000,0.000000)};
%\addplot[blue, dashed, mark=square] plot coordinates {(3.000000,0.911600) (3.250000,0.614100) (3.500000,0.230100) (3.750000,0.036900) (4.000000,0.001100) (4.250000,0.000000) (4.500000,0.000000)};
\addplot[blue, dashed, mark=square] plot coordinates {(3.000000,0.900400) (3.250000,0.600000) (3.500000,0.216400) (3.750000,0.036000) (4.000000,0.002600) (4.250000,0.000000) };
\legend{ {Huawei 2017-06-15}\\
......@@ -686,18 +704,19 @@ Figure \ref{fig:bler-bg1-r89} shows the performance of BG1 with largest block si
\pgfplotsset{every axis/.append style={mark options=solid, mark size=2.5pt}}
\begin{semilogyaxis}[title={}, xlabel={$\SNR$ [dB]}, ylabel={BLER},
grid={both}, xmin=6, xmax=11, xtick={6,6.5,...,11}, ymin=0,
grid={both}, xmin=6, xmax=9, xtick={6,6.5,...,9}, ymin=0,
ymax=1,ytickten={-5,-4,-3,-2,-1,0},legend columns=1]
% Huawei
\addplot[black, solid] plot coordinates { (6.118717,0.01) (6.291449,0.0001) };
% LDPC opt 5 iter
\addplot[blue, solid, mark=square] plot coordinates {(8.500000,0.350000) (8.750000,0.155100) (9.000000,0.062400) (9.250000,0.023000) (9.500000,0.008700) (9.750000,0.003500) (10.000000,0.000900) (10.250000,0.000300) };
%\addplot[blue, solid, mark=square] plot coordinates {(8.500000,0.350000) (8.750000,0.155100) (9.000000,0.062400) (9.250000,0.023000) (9.500000,0.008700) (9.750000,0.003500) (10.000000,0.000900) (10.250000,0.000300) };
\addplot[blue, solid, mark=square] plot coordinates {(7.500000,0.858900) (7.750000,0.449500) (8.000000,0.129700) (8.250000,0.025500) (8.500000,0.002300) (8.750000,0.000300) (9.000000,0.000000) };
% LDPC opt 50 iter
\addplot[blue, dashed, mark=square] plot coordinates {(6.000000,0.705333) (6.100000,0.353367) (6.200000,0.102100) (6.300000,0.015133) (6.400000,0.000967) (6.500000,0.000000)};
%\addplot[blue, dashed, mark=square] plot coordinates {(6.000000,0.705333) (6.100000,0.353367) (6.200000,0.102100) (6.300000,0.015133) (6.400000,0.000967) (6.500000,0.000000)};
\addplot[blue, dashed, mark=square] plot coordinates {(6.000000,0.970000) (6.100000,0.830800) (6.200000,0.527300) (6.300000,0.216900) (6.400000,0.045500) (6.500000,0.005600) (6.600000,0.000300) (6.700000,0.000000) (6.800000,0.000000) };
\legend{ {Huawei}\\
{LDPC Opt 5 iter}\\
......@@ -709,7 +728,7 @@ Figure \ref{fig:bler-bg1-r89} shows the performance of BG1 with largest block si
\label{fig:bler-bg1-r89}
\end{figure}
From \ref{fig:bler-bg1-r89} it can be observed that the performance gap is only about 0.2 dB if 50 iterations are used. However, for 5 iterations there is still a significant performance loss of about 3.4 dB at BLER $10^{-2}$.
From \ref{fig:bler-bg1-r89} it can be observed that the performance gap is only about 0.3 dB if 50 iterations are used. However, for 5 iterations there is still a significant performance loss of about 2.3 dB at BLER $10^{-2}$.
\newpage
\subsection{Decoding Latency}
......@@ -824,8 +843,8 @@ Table \ref{tab:lat-bg1-i5} shows the results for BG1, larges block size and diff
From the above results it can be observed that the data transfer between CNs and BNs takes up a significant amount of the run time. However, the performance gain due to AVX instructions in both CN and BN processing is significantly larger than the penalty incurred by the data transfers.
\section{Parity Check and early stopping Criteria}
It is often unnecessary to carry out the maximum number of iterations. After each iteration a parity check \eqref{eq:29} can be computed and if a valid code word is found the decoder can stop. This functionality has been implemented and the additional overhead is reasonable. The PC is carried out in the CN processing buffer and the calculation complexity itself is negligible. However, for the processing it is necessary to move the BN results to the CN buffer which takes time, the overall overhead is at most $10\%$ compared to an algorithm without early stopping criteria with the same number of iterations. The PC has to be activated via the define \texttt{NR\_LDPC\_ENABLE\_PARITY\_CHECK}.
\section{Parity Check and Early Stopping Criteria}
It is often unnecessary to carry out the maximum number of iterations. After each iteration a parity check (PC) \eqref{eq:29} can be computed and if a valid code word is found the decoder can stop. This functionality has been implemented and the additional overhead is reasonable. The PC is carried out in the CN processing buffer and the calculation complexity itself is negligible. However, for the processing it is necessary to move the BN results to the CN buffer which takes time, the overall overhead is at most $10\%$ compared to an algorithm without early stopping criteria with the same number of iterations. The PC has to be activated via the define \texttt{NR\_LDPC\_ENABLE\_PARITY\_CHECK}.
\section{Conclusion}
......
......@@ -99,8 +99,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -154,8 +152,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -210,8 +206,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -266,8 +260,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -322,15 +314,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
//mexPrintf("ymm0: ");
//nrLDPC_debug_print256i_epi8(&ymm0);
////ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
//mexPrintf("\n");
//mexPrintf("ymmX: ");
//nrLDPC_debug_print256i_epi8(&ymm0);
//mexPrintf("\n");
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -385,8 +368,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -441,8 +422,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -497,8 +476,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -553,8 +530,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -609,8 +584,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -665,8 +638,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -721,8 +692,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -777,8 +746,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -833,8 +800,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -889,8 +854,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -945,8 +908,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1001,8 +962,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1057,8 +1016,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1113,8 +1070,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1169,8 +1124,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1225,8 +1178,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1281,8 +1232,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1337,8 +1286,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1393,8 +1340,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1449,8 +1394,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1505,8 +1448,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1561,8 +1502,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1617,8 +1556,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1673,8 +1610,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1729,8 +1664,6 @@ static inline void nrLDPC_bnProcPc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_proc
// Pack results back to epi8
ymm0 = _mm256_packs_epi16(ymmRes0, ymmRes1);
// Limit to minLLR -127
//ymm0 = _mm256_max_epi8(ymm0, *p_minLLR);
// ymm0 = [ymmRes1[255:128] ymmRes0[255:128] ymmRes1[127:0] ymmRes0[127:0]]
// p_llrRes = [ymmRes1[255:128] ymmRes1[127:0] ymmRes0[255:128] ymmRes0[127:0]]
*p_llrRes = _mm256_permute4x64_epi64(ymm0, 0xD8);
......@@ -1775,8 +1708,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
uint32_t cnOffsetInGroup;
uint8_t idxBnGroup = 0;
const __m256i* p_minLLR = (__m256i*) minLLR256_epi8;
// =====================================================================
// Process group with 1 CN
// Already done in bnProcBufPc
......@@ -1809,8 +1740,7 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
}
......@@ -1845,8 +1775,7 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
}
......@@ -1881,8 +1810,7 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
}
......@@ -1917,17 +1845,7 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
/*
mexPrintf("res: ");
nrLDPC_debug_print256i_epi8(p_res);
mexPrintf(" llrRes: ");
nrLDPC_debug_print256i_epi8(p_llrRes);
mexPrintf(" bnProcBuf: ");
nrLDPC_debug_print256i_epi8(&p_bnProcBuf[k*cnOffsetInGroup + i]);
mexPrintf("\n");
*/
p_res++;
p_llrRes++;
}
......@@ -1962,8 +1880,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -1999,8 +1915,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2036,8 +1950,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2073,8 +1985,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2110,8 +2020,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2147,8 +2055,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2184,8 +2090,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2221,8 +2125,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2258,8 +2160,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2295,8 +2195,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2332,8 +2230,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2369,8 +2265,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2406,8 +2300,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2443,8 +2335,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2480,8 +2370,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2517,8 +2405,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2554,8 +2440,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2591,8 +2475,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2628,8 +2510,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2665,8 +2545,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2702,8 +2580,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2739,8 +2615,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2776,8 +2650,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2813,8 +2685,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......@@ -2850,8 +2720,6 @@ static inline void nrLDPC_bnProc(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBu
for (i=0; i<M; i++)
{
*p_res = _mm256_subs_epi8(*p_llrRes, p_bnProcBuf[k*cnOffsetInGroup + i]);
// Limit to minLLR -127
*p_res = _mm256_max_epi8(*p_res, *p_minLLR);
p_res++;
p_llrRes++;
......
......@@ -22,7 +22,7 @@
/*!\file nrLDPC_cnProc.h
* \brief Defines the functions for check node processing
* \author Sebastian Wagner (TCL Communications) Email: <mailto:sebastian.wagner@tcl.com>
* \date 27-03-2018
* \date 30-09-2019
* \version 1.0
* \note
* \warning
......@@ -34,6 +34,7 @@
/**
\brief Performs CN processing for BG2 on the CN processing buffer and stores the results in the CN processing results buffer.
\param p_lut Pointer to decoder LUTs
\param p_procBuf Pointer to processing buffers
\param Z Lifting size
*/
static inline void nrLDPC_cnProc_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBuf, uint16_t Z)
......
......@@ -22,8 +22,8 @@
/*!\file nrLDPC_decoder.c
* \brief Defines the LDPC decoder
* \author Sebastian Wagner (TCL Communications) Email: <mailto:sebastian.wagner@tcl.com>
* \date 27-03-2018
* \version 1.0
* \date 30-09-2019
* \version 2.0
* \note
* \warning
*/
......@@ -222,7 +222,6 @@ static inline uint32_t nrLDPC_decoder_core(int8_t* p_llr, int8_t* p_out, t_nrLDP
#endif
#ifdef NR_LDPC_DEBUG_MODE
nrLDPC_debug_initBuffer2File(nrLDPC_buffers_CN_PROC);
nrLDPC_debug_writeBuffer2File(nrLDPC_buffers_CN_PROC, p_procBuf);
#endif
......@@ -451,7 +450,6 @@ static inline uint32_t nrLDPC_decoder_core(int8_t* p_llr, int8_t* p_out, t_nrLDP
stop_meas(&p_profiler->cnProcPc);
#endif
#endif
//mexPrintf("End Last Iter: i=%d, numMaxIter=%d, pcRes = %d\n",i,numMaxIter,pcRes);
}
// If maximum number of iterations reached an PC still fails increase number of iterations
......@@ -468,7 +466,7 @@ static inline uint32_t nrLDPC_decoder_core(int8_t* p_llr, int8_t* p_out, t_nrLDP
#ifdef NR_LDPC_PROFILER_DETAIL
start_meas(&p_profiler->llrRes2llrOut);
#endif
nrLDPC_llrRes2llrOut(p_lut, p_llrOut, p_procBuf, numLLR, Z, BG);
nrLDPC_llrRes2llrOut(p_lut, p_llrOut, p_procBuf, Z, BG);
#ifdef NR_LDPC_PROFILER_DETAIL
stop_meas(&p_profiler->llrRes2llrOut);
#endif
......
......@@ -197,6 +197,5 @@ static const int8_t ones256_epi8[32] __attribute__ ((aligned(32))) = {1,1,1,1,1,
static const int8_t zeros256_epi8[32] __attribute__ ((aligned(32))) = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
/** Vector of 32 '127' in int8 for application with AVX2 */
static const int8_t maxLLR256_epi8[32] __attribute__ ((aligned(32))) = {127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127,127};
static const int8_t minLLR256_epi8[32] __attribute__ ((aligned(32))) = {-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127,-127};
#endif
......@@ -20,10 +20,10 @@
*/
/*!\file nrLDPC_lut.h
* \brief Header file loading all look-up tables
* \brief Header file defining all look-up tables
* \author Sebastian Wagner (TCL Communications) Email: <mailto:sebastian.wagner@tcl.com>
* \date 27-03-2018
* \version 1.0
* \date 30-09-2019
* \version 2.0
* \note
* \warning
*/
......
......@@ -22,8 +22,8 @@
/*!\file nrLDPC_mPass.h
* \brief Defines the functions for message passing
* \author Sebastian Wagner (TCL Communications) Email: <mailto:sebastian.wagner@tcl.com>
* \date 27-03-2018
* \version 1.0
* \date 30-09-2019
* \version 2.0
* \note
* \warning
*/
......@@ -36,35 +36,37 @@
/**
\brief Circular memcpy
(src) str2 = |xxxxxxxxxxxxxxxxxxxx\--------|
|<- rem->|<- circular shift ->|
(src) str2 = |--------xxxxxxxxxxxxxxxxxxxxx|
\_______________
\
(dst) str1 = |--------xxxxxxxxxxxxxxxxxxxxx|
(dst) str1 = |xxxxxxxxxxxxxxxxxxxxx---------|
\param str1 Pointer to the start of the destination buffer
\param str2 Pointer to the source buffer
\param Z Lifting size
\param cshift Cyclic shift
\param cshift Circular shift
*/
static inline void *nrLDPC_circ_memcpy(int8_t *str1, const int8_t *str2, uint16_t Z, uint16_t cshift)
static inline void *nrLDPC_inv_circ_memcpy(int8_t *str1, const int8_t *str2, uint16_t Z, uint16_t cshift)
{
uint16_t rem = Z - cshift;
memcpy(str1+cshift, str2 , rem);
memcpy(str1 , str2+rem, cshift);
//mexPrintf("memcpy(%p,%p,%d) | memcpy(%p,%p,%d) | rem = %d, cshift = %d\n", str1+cshift, str2, rem, str1,str2+rem,cshift,rem,cshift);
return(str1);
}
/**
\brief Circular memcpy
\brief Inverse circular memcpy
|<- circular shift ->|<- rem->|
(src) str2 = |xxxxxxxxxxxxxxxxxxxx\--------|
\
(dst) str1 = |--------xxxxxxxxxxxxxxxxxxxxx|
\param str1 Pointer to the start of the destination buffer
\param str2 Pointer to the source buffer
\param Z Lifting size
\param cshift Cyclic shift
\param cshift Circular shift
*/
static inline void *nrLDPC_inv_circ_memcpy(int8_t *str1, const int8_t *str2, uint16_t Z, uint16_t cshift)
static inline void *nrLDPC_circ_memcpy(int8_t *str1, const int8_t *str2, uint16_t Z, uint16_t cshift)
{
uint16_t rem = Z - cshift;
memcpy(str1 , str2+cshift, rem);
......@@ -77,6 +79,7 @@ static inline void *nrLDPC_inv_circ_memcpy(int8_t *str1, const int8_t *str2, uin
\brief Copies the input LLRs to their corresponding place in the LLR processing buffer.
\param p_lut Pointer to decoder LUTs
\param llr Pointer to input LLRs
\param p_procBuf Pointer the processing buffers
\param Z Lifting size
\param BG Base graph
*/
......@@ -110,12 +113,11 @@ static inline void nrLDPC_llr2llrProcBuf(t_nrLDPC_lut* p_lut, int8_t* llr, t_nrL
}
/**
\brief Copies the input LLRs to their corresponding place in the CN processing buffer.
\brief Copies the input LLRs to their corresponding place in the CN processing buffer for BG1.
\param p_lut Pointer to decoder LUTs
\param llr Pointer to input LLRs
\param numLLR Number of LLR values
\param p_procBuf Pointer to the processing buffers
\param Z Lifting size
\param BG Base graph
*/
static inline void nrLDPC_llr2CnProcBuf_BG1(t_nrLDPC_lut* p_lut, int8_t* llr, t_nrLDPC_procBuf* p_procBuf, uint16_t Z)
{
......@@ -161,7 +163,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG1(t_nrLDPC_lut* p_lut, int8_t* llr, t_
idxBn = lut_posBnInCnProcBuf_CNG3[j][0]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG3[j][0]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG3[j][0]);
}
// =====================================================================
......@@ -177,7 +179,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG1(t_nrLDPC_lut* p_lut, int8_t* llr, t_
{
idxBn = lut_posBnInCnProcBuf_CNG4[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG4[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG4[j][i]);
p_cnProcBuf += Z;
}
......@@ -195,7 +197,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG1(t_nrLDPC_lut* p_lut, int8_t* llr, t_
for (i=0; i<lut_numCnInCnGroups[2]; i++)
{
idxBn = lut_posBnInCnProcBuf_CNG5[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG5[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG5[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -212,7 +214,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG1(t_nrLDPC_lut* p_lut, int8_t* llr, t_
for (i=0; i<lut_numCnInCnGroups[3]; i++)
{
idxBn = lut_posBnInCnProcBuf_CNG6[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG6[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG6[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -229,7 +231,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG1(t_nrLDPC_lut* p_lut, int8_t* llr, t_
for (i=0; i<lut_numCnInCnGroups[4]; i++)
{
idxBn = lut_posBnInCnProcBuf_CNG7[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG7[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG7[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -246,7 +248,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG1(t_nrLDPC_lut* p_lut, int8_t* llr, t_
for (i=0; i<lut_numCnInCnGroups[5]; i++)
{
idxBn = lut_posBnInCnProcBuf_CNG8[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG8[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG8[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -263,7 +265,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG1(t_nrLDPC_lut* p_lut, int8_t* llr, t_
for (i=0; i<lut_numCnInCnGroups[6]; i++)
{
idxBn = lut_posBnInCnProcBuf_CNG9[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG9[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG9[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -280,7 +282,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG1(t_nrLDPC_lut* p_lut, int8_t* llr, t_
for (i=0; i<lut_numCnInCnGroups[7]; i++)
{
idxBn = lut_posBnInCnProcBuf_CNG10[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG10[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG10[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -297,7 +299,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG1(t_nrLDPC_lut* p_lut, int8_t* llr, t_
for (i=0; i<lut_numCnInCnGroups[8]; i++)
{
idxBn = lut_posBnInCnProcBuf_CNG19[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG19[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG19[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -305,8 +307,10 @@ static inline void nrLDPC_llr2CnProcBuf_BG1(t_nrLDPC_lut* p_lut, int8_t* llr, t_
}
/**
\brief Copies the values in the CN processing results buffer to their corresponding place in the BN processing buffer for BG2.
\brief Copies the input LLRs to their corresponding place in the CN processing buffer for BG2.
\param p_lut Pointer to decoder LUTs
\param llr Pointer to input LLRs
\param p_procBuf Pointer to the processing buffers
\param Z Lifting size
*/
static inline void nrLDPC_llr2CnProcBuf_BG2(t_nrLDPC_lut* p_lut, int8_t* llr, t_nrLDPC_procBuf* p_procBuf, uint16_t Z)
......@@ -348,7 +352,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG2(t_nrLDPC_lut* p_lut, int8_t* llr, t_
for (i=0; i<lut_numCnInCnGroups[0]; i++)
{
idxBn = lut_posBnInCnProcBuf_CNG3[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG3[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG3[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -365,7 +369,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG2(t_nrLDPC_lut* p_lut, int8_t* llr, t_
for (i=0; i<lut_numCnInCnGroups[1]; i++)
{
idxBn = lut_posBnInCnProcBuf_CNG4[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG4[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG4[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -382,7 +386,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG2(t_nrLDPC_lut* p_lut, int8_t* llr, t_
for (i=0; i<lut_numCnInCnGroups[2]; i++)
{
idxBn = lut_posBnInCnProcBuf_CNG5[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG5[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG5[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -399,7 +403,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG2(t_nrLDPC_lut* p_lut, int8_t* llr, t_
for (i=0; i<lut_numCnInCnGroups[3]; i++)
{
idxBn = lut_posBnInCnProcBuf_CNG6[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG6[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG6[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -416,7 +420,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG2(t_nrLDPC_lut* p_lut, int8_t* llr, t_
for (i=0; i<lut_numCnInCnGroups[4]; i++)
{
idxBn = lut_posBnInCnProcBuf_CNG8[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG8[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG8[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -433,7 +437,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG2(t_nrLDPC_lut* p_lut, int8_t* llr, t_
for (i=0; i<lut_numCnInCnGroups[5]; i++)
{
idxBn = lut_posBnInCnProcBuf_CNG10[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG10[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &llr[idxBn], Z, lut_circShift_CNG10[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -442,6 +446,7 @@ static inline void nrLDPC_llr2CnProcBuf_BG2(t_nrLDPC_lut* p_lut, int8_t* llr, t_
/**
\brief Copies the values in the CN processing results buffer to their corresponding place in the BN processing buffer for BG2.
\param p_lut Pointer to decoder LUTs
\param p_procBuf Pointer to the processing buffers
\param Z Lifting size
*/
static inline void nrLDPC_cn2bnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBuf, uint16_t Z)
......@@ -491,7 +496,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[0]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG3[j][i] + lut_bnPosBnProcBuf_CNG3[j][i]*Z;
nrLDPC_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG3[j][i]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG3[j][i]);
p_cnProcBufRes += Z;
}
}
......@@ -508,7 +513,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[1]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG4[j][i] + lut_bnPosBnProcBuf_CNG4[j][i]*Z;
nrLDPC_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG4[j][i]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG4[j][i]);
p_cnProcBufRes += Z;
}
}
......@@ -525,7 +530,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[2]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG5[j][i] + lut_bnPosBnProcBuf_CNG5[j][i]*Z;
nrLDPC_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG5[j][i]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG5[j][i]);
p_cnProcBufRes += Z;
}
}
......@@ -542,7 +547,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[3]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG6[j][i] + lut_bnPosBnProcBuf_CNG6[j][i]*Z;
nrLDPC_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG6[j][i]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG6[j][i]);
p_cnProcBufRes += Z;
}
}
......@@ -559,7 +564,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[4]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG8[j][i] + lut_bnPosBnProcBuf_CNG8[j][i]*Z;
nrLDPC_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG8[j][i]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG8[j][i]);
p_cnProcBufRes += Z;
}
}
......@@ -576,7 +581,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[5]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG10[j][i] + lut_bnPosBnProcBuf_CNG10[j][i]*Z;
nrLDPC_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG10[j][i]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG10[j][i]);
p_cnProcBufRes += Z;
}
}
......@@ -585,6 +590,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
/**
\brief Copies the values in the CN processing results buffer to their corresponding place in the BN processing buffer for BG1.
\param p_lut Pointer to decoder LUTs
\param p_procBuf Pointer to the processing buffers
\param Z Lifting size
*/
static inline void nrLDPC_cn2bnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBuf, uint16_t Z)
......@@ -639,7 +645,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
{
p_cnProcBufRes = &cnProcBufRes[lut_startAddrCnGroups[0] + j*bitOffsetInGroup];
nrLDPC_circ_memcpy(&bnProcBuf[lut_startAddrBnProcBuf_CNG3[j][0]],p_cnProcBufRes,Z,lut_circShift_CNG3[j][0]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[lut_startAddrBnProcBuf_CNG3[j][0]],p_cnProcBufRes,Z,lut_circShift_CNG3[j][0]);
}
// =====================================================================
......@@ -654,7 +660,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[1]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG4[j][i] + lut_bnPosBnProcBuf_CNG4[j][i]*Z;
nrLDPC_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG4[j][i]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG4[j][i]);
p_cnProcBufRes += Z;
}
}
......@@ -671,7 +677,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[2]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG5[j][i] + lut_bnPosBnProcBuf_CNG5[j][i]*Z;
nrLDPC_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG5[j][i]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG5[j][i]);
p_cnProcBufRes += Z;
}
}
......@@ -688,7 +694,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[3]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG6[j][i] + lut_bnPosBnProcBuf_CNG6[j][i]*Z;
nrLDPC_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG6[j][i]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG6[j][i]);
p_cnProcBufRes += Z;
}
}
......@@ -705,7 +711,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[4]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG7[j][i] + lut_bnPosBnProcBuf_CNG7[j][i]*Z;
nrLDPC_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG7[j][i]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG7[j][i]);
p_cnProcBufRes += Z;
}
}
......@@ -722,7 +728,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[5]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG8[j][i] + lut_bnPosBnProcBuf_CNG8[j][i]*Z;
nrLDPC_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG8[j][i]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG8[j][i]);
p_cnProcBufRes += Z;
}
}
......@@ -739,7 +745,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[6]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG9[j][i] + lut_bnPosBnProcBuf_CNG9[j][i]*Z;
nrLDPC_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG9[j][i]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG9[j][i]);
p_cnProcBufRes += Z;
}
}
......@@ -756,7 +762,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[7]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG10[j][i] + lut_bnPosBnProcBuf_CNG10[j][i]*Z;
nrLDPC_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG10[j][i]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG10[j][i]);
p_cnProcBufRes += Z;
}
}
......@@ -773,7 +779,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[8]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG19[j][i] + lut_bnPosBnProcBuf_CNG19[j][i]*Z;
nrLDPC_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG19[j][i]);
nrLDPC_inv_circ_memcpy(&bnProcBuf[idxBn],p_cnProcBufRes,Z,lut_circShift_CNG19[j][i]);
p_cnProcBufRes += Z;
}
}
......@@ -783,6 +789,7 @@ static inline void nrLDPC_cn2bnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
/**
\brief Copies the values in the BN processing results buffer to their corresponding place in the CN processing buffer for BG2.
\param p_lut Pointer to decoder LUTs
\param p_procBuf Pointer to the processing buffers
\param Z Lifting size
*/
static inline void nrLDPC_bn2cnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBuf, uint16_t Z)
......@@ -835,7 +842,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[0]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG3[j][i] + lut_bnPosBnProcBuf_CNG3[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG3[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG3[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -852,7 +859,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[1]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG4[j][i] + lut_bnPosBnProcBuf_CNG4[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG4[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG4[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -869,7 +876,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[2]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG5[j][i] + lut_bnPosBnProcBuf_CNG5[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG5[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG5[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -886,7 +893,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[3]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG6[j][i] + lut_bnPosBnProcBuf_CNG6[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG6[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG6[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -903,7 +910,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[4]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG8[j][i] + lut_bnPosBnProcBuf_CNG8[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG8[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG8[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -920,7 +927,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[5]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG10[j][i] + lut_bnPosBnProcBuf_CNG10[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG10[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG10[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -929,6 +936,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG2(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
/**
\brief Copies the values in the BN processing results buffer to their corresponding place in the CN processing buffer for BG1.
\param p_lut Pointer to decoder LUTs
\param p_procBuf Pointer to the processing buffers
\param Z Lifting size
*/
static inline void nrLDPC_bn2cnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf* p_procBuf, uint16_t Z)
......@@ -986,7 +994,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
{
p_cnProcBuf = &cnProcBuf[lut_startAddrCnGroups[0] + j*bitOffsetInGroup];
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[lut_startAddrBnProcBuf_CNG3[j][0]], Z, lut_circShift_CNG3[j][0]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[lut_startAddrBnProcBuf_CNG3[j][0]], Z, lut_circShift_CNG3[j][0]);
}
// =====================================================================
......@@ -1001,7 +1009,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[1]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG4[j][i] + lut_bnPosBnProcBuf_CNG4[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG4[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG4[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -1018,7 +1026,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[2]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG5[j][i] + lut_bnPosBnProcBuf_CNG5[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG5[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG5[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -1035,7 +1043,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[3]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG6[j][i] + lut_bnPosBnProcBuf_CNG6[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG6[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG6[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -1052,7 +1060,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[4]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG7[j][i] + lut_bnPosBnProcBuf_CNG7[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG7[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG7[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -1069,7 +1077,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[5]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG8[j][i] + lut_bnPosBnProcBuf_CNG8[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG8[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG8[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -1086,7 +1094,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[6]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG9[j][i] + lut_bnPosBnProcBuf_CNG9[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG9[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG9[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -1103,7 +1111,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[7]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG10[j][i] + lut_bnPosBnProcBuf_CNG10[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG10[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG10[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -1120,7 +1128,7 @@ static inline void nrLDPC_bn2cnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
for (i=0; i<lut_numCnInCnGroups[8]; i++)
{
idxBn = lut_startAddrBnProcBuf_CNG19[j][i] + lut_bnPosBnProcBuf_CNG19[j][i]*Z;
nrLDPC_inv_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG19[j][i]);
nrLDPC_circ_memcpy(p_cnProcBuf, &bnProcBufRes[idxBn], Z, lut_circShift_CNG19[j][i]);
p_cnProcBuf += Z;
}
}
......@@ -1131,9 +1139,11 @@ static inline void nrLDPC_bn2cnProcBuf_BG1(t_nrLDPC_lut* p_lut, t_nrLDPC_procBuf
\brief Copies the values in the LLR results buffer to their corresponding place in the output LLR vector.
\param p_lut Pointer to decoder LUTs
\param llrOut Pointer to output LLRs
\param numLLR Number of LLR values
\param p_procBuf Pointer to the processing buffers
\param Z Lifting size
\param BG Base graph
*/
static inline void nrLDPC_llrRes2llrOut(t_nrLDPC_lut* p_lut, int8_t* llrOut, t_nrLDPC_procBuf* p_procBuf, uint16_t numLLR,uint16_t Z, uint8_t BG)
static inline void nrLDPC_llrRes2llrOut(t_nrLDPC_lut* p_lut, int8_t* llrOut, t_nrLDPC_procBuf* p_procBuf, uint16_t Z, uint8_t BG)
{
uint32_t i;
const uint8_t numBn2CnG1 = p_lut->numBnInBnGroups[0];
......@@ -1154,7 +1164,6 @@ static inline void nrLDPC_llrRes2llrOut(t_nrLDPC_lut* p_lut, int8_t* llrOut, t_n
memcpy(&llrOut[colG1], llrRes, numBn2CnG1*Z);
}
for (i=0; i<startColParity; i++)
{
idxBn = lut_llr2llrProcBufAddr[i] + lut_llr2llrProcBufBnPos[i]*Z;
......
......@@ -46,11 +46,11 @@ typedef struct nrLDPC_lut {
const uint8_t* numBnInBnGroups; /**< Number of CNs in every BN group */
const uint32_t* startAddrBnGroups; /**< Start addresses for BN groups in BN processing buffer */
const uint16_t* startAddrBnGroupsLlr; /**< Start addresses for BN groups in LLR processing buffer */
const uint16_t** circShift[NR_LDPC_NUM_CN_GROUPS_BG1]; /**< LUT for circular shift values for all CN groups and Z's */
const uint32_t** startAddrBnProcBuf[NR_LDPC_NUM_CN_GROUPS_BG1]; /**< LUT for circular shift values for all CN groups and Z's */
const uint8_t** bnPosBnProcBuf[NR_LDPC_NUM_CN_GROUPS_BG1]; /**< LUT for circular shift values for all CN groups and Z's */
const uint16_t** circShift[NR_LDPC_NUM_CN_GROUPS_BG1]; /**< LUT for circular shift values for all CN groups and Zs */
const uint32_t** startAddrBnProcBuf[NR_LDPC_NUM_CN_GROUPS_BG1]; /**< LUT of start addresses of CN groups in BN proc buffer */
const uint8_t** bnPosBnProcBuf[NR_LDPC_NUM_CN_GROUPS_BG1]; /**< LUT of BN positions in BG for CN groups */
const uint16_t* llr2llrProcBufAddr; /**< LUT for transferring input LLRs to LLR processing buffer */
const uint8_t* llr2llrProcBufBnPos; /**< LUT for transferring input LLRs to LLR processing buffer */
const uint8_t* llr2llrProcBufBnPos; /**< LUT BN position in BG */
const uint8_t** posBnInCnProcBuf[NR_LDPC_NUM_CN_GROUPS_BG1]; /**< LUT for llr2cnProcBuf */
} t_nrLDPC_lut;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment