Commit 7d4f7ebb authored by Robert Schmidt's avatar Robert Schmidt

Merge remote-tracking branch 'origin/rfsim-deadlock-avoidance' into integration_2025_w06 (!3246)

Deadlock avoidance in rfsimulator

This change introduces a countermeasure for deadlock in rfsimulator. The
deadlock happens when all entities are waiting for new data to come in,
and happens with 2+ clients, when a new client connects. I think this
issue is due to ordering of fullwrite calls, resulting in out-of-order
delivery of packets and eventually trashing the packets on the receiving
side. The out-of-order delivery warnings are printed just before the
system deadlocks but I have not found a better solution so far. The
workaround makes the server never lock up permanently by ignoring the
client failure to write on time after 10 tries.

This was tested locally for both UE as server and gNB as server and
works correctly, causing the deadlock to clear and the added log to be
printed several times when the deadlock is detected, after which the
system goes back to normal.

I have some gdb output of the executables during deadlock:

    UE:

    $7 = {conn_sock = 98, lastReceivedTS = 3226163740, headerMode = true, trashingPacket = false, th = {size = 13184, nbAnt = 1, timestamp = 3226150556, option_value = 0, option_flag = 0}, transferPtr = 0x7f6a500018a8 "\200\063", remainToTransfer = 24,
      circularBufEnd = 0x7f6a503b3ac0 "", circularBuf = 0x7f6a501f1ac0, channel_model = 0x0}
    (gdb) p t->buf[5]
    $8 = {conn_sock = 97, lastReceivedTS = 0, headerMode = true, trashingPacket = false, th = {size = 0, nbAnt = 0, timestamp = 0, option_value = 0, option_flag = 0}, transferPtr = 0x7f6a50001900 "", remainToTransfer = 24, circularBufEnd = 0x7f6a50575ad0 "",
      circularBuf = 0x7f6a503b3ad0, channel_model = 0x0}

    nextRxTimestamp 3225937740
    nsamps = 30720

    gNB 1:
    (gdb) p t->buf[0]
    $4 = {conn_sock = 95, lastReceivedTS = 3226026876, headerMode = true, trashingPacket = false, th = {size = 1, nbAnt = 1, timestamp = 3226026875, option_value = 0, option_flag = 0},
      transferPtr = 0x7f8dfc003ab8 "\001", remainToTransfer = 24, circularBufEnd = 0x7f8e1c3ff010 "", circularBuf = 0x7f8e1c23d010, channel_model = 0x0}
    nextRxTimestamp 3225996956

    gNB 2:
    lastReceivedTS = 3226026875
    $2 = {conn_sock = 95, lastReceivedTS = 3226026875, headerMode = true, trashingPacket = false, th = {size = 1, nbAnt = 1, timestamp = 3226026875, option_value = 0, option_flag = 0},
      transferPtr = 0x744898003ab8 "\001", remainToTransfer = 24, circularBufEnd = 0x7448bc2e7010 "", circularBuf = 0x7448bc125010, channel_model = 0x0}

    nextRxTimestamp 3226026875

As you can see all executables are in have_to_wait state.
parents 976f2101 fd20068f
......@@ -973,7 +973,7 @@ static int rfsimulator_read(openair0_device *device, openair0_timestamp *ptimest
}
} else {
bool have_to_wait;
int loops = 0;
do {
have_to_wait=false;
......@@ -995,6 +995,12 @@ static int rfsimulator_read(openair0_device *device, openair0_timestamp *ptimest
t->nextRxTstamp + nsamps);
flushInput(t, 3, nsamps);
}
if (loops++ > 10 && t->role == SIMU_ROLE_SERVER) {
// Just start producing samples. The clients will catch up.
have_to_wait = false;
LOG_W(HW,
"No longer waiting for clients to catch up, starting to produce samples\n");
}
} while (have_to_wait);
}
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment