Commit fe9d871a authored by Robert Schmidt's avatar Robert Schmidt

Merge remote-tracking branch 'origin/doc-ci-segv-debug' into integration_2024_w10

parents 127ae812 e615321e
......@@ -255,7 +255,44 @@ Some tests are run from source (e.g.
`ci-scripts/xml_files/gnb_phytest_usrp_run.xml`), which directly give the
options they are run with.
## How to retrieve core dumps (for CI team members)
## How to debug CI failures
It is possible to debug CI failures using the generated core dump and the image
used for the run. A script is provided (see developer instructions below) that,
provided the core dump file, container image, and the source tree, executes
`gdb` inside the container; using the core dump information, a developer can
investigate the cause of failure.
### Developer instructions
The CI team will send you a docker image and a core dump file, and the commit
as of which the pipeline failed. Let's assume the coredump is stored at
`/tmp/coredump.tar.xz`, and the image is in `/tmp/oai-nr-ue.tar.gz`. First, you
should check out the corresponding branch (or directly the commit), let's say
in `~/oai-branch-fail`. Now, unpack the core dump, load the image into docker,
and use the script [`docker/debug_core_image.sh`](../docker/debug_core_image.sh)
to open gdb, as follows:
```
cd /tmp
tar -xJf /tmp/coredump.tar.xz
docker load < /tmp/oai-nr-ue.tar.gz
~/oai-branch-fail/docker/debug_core_image.sh <image> /tmp/coredump ~/oai-branch-fail
```
where you replace `<image>` with the image loaded in `docker load`. The script
will start the container and open gdb; you should see information about where
the failure (e.g., segmentation fault) happened. If you just see `??`, the core
dump and container image don't match. Be also on the lookout for the
corresponding message from gdb:
```
warning: core file may not match specified executable file.
```
Once you quit `gdb`, the container image will be removed automatically.
### CI team instructions
The entrypoint scripts of all containers print the core pattern that is used on
the running machine. Search for `core_pattern` at the start of the container
......@@ -267,19 +304,14 @@ logs to retrieve the possible location. Possible locations might be:
- abrt: see [documentation](https://abrt.readthedocs.io/en/latest/usage.html)
- apport: see [documentation](https://wiki.ubuntu.com/Apport)
You furthermore have to extract the executable that caused the core dump.
Download the container image, and extract, e.g.:
See below for instructions on how to retrieve the core dump. Further, download
the image and store it to a file using `docker save`. Make sure to pick the
right image (Ubuntu or RHEL)!
```
docker create --name c1 porcepix.sboai.cs.eurecom.fr/oai-gnb:develop-c99db698
docker cp c1:/opt/oai-gnb/bin/nr-softmodem /tmp
docker rm c1
```
### Core dump in a file
#### Core dump in a file
**This is not recommended, as files could pile up and fill the system disk
completely!** Prefer systemd or abrt instead.
completely!** Prefer another method further down.
If the core pattern is a path: it should at least include the time in the
pattern name (suggested pattern: `/tmp/core.%e.%p.%t`) to correlate the time
......@@ -287,38 +319,31 @@ the segfault occurred with the CI logs. If you identified the core dump,
copy the core dump from that machine; if identification is difficult, consider
rerunning the pipeline.
### Core dump via systemd
#### Core dump via systemd
Run this command to list all core dumps:
Use the first command to list all core dumps. Scroll down to the core dump of
interest (it lists the executables in the last column; use the time to
correlate the segfault and the CI run). Take the PID of the executable (first
column after the time). Dump the core dump to a location of your choice.
```
sudo coredumpctl list
```
Scroll to the end and find the core dump of interest (it lists the executables
in the last column; use the time to correlate the segfault and the CI run).
Take the PID of the executable (first column after the time). Dump the core
dump to a location of your choice:
```
sudo coredumpctl dump <PID> > /tmp/coredump
```
### Core dump via abrt (automatic bug reporting tool)
#### Core dump via abrt (automatic bug reporting tool)
TBD: use the documentation page for the moment.
### Core dump via apport
On Ubuntu machines, apport first needs to be enabled to collect core dumps:
```
sudo systemctl enable apport.service
```
and [needs to be enabled](https://wiki.ubuntu.com/Apport#How_to_enable_apport).
Then, show a list of core dumps using
#### Core dump via apport
I did not find an easy way to use apport. Anyway, the systemd approach works
fine. So remove apport, install systemd-coredump, and verify it is the new
coredump handler:
```
sudo apport-cli
sudo systemctl stop apport
sudo systemctl mask --now apport
sudo apt install systemd-coredump
# Verify this changed the core pattern to a pipe to systemd-coredump
sysctl kernel.core_pattern
```
#!/bin/bash
if [ $# -ne 3 ]; then
echo "usage: $0 <image> <coredump> <path-to-sources>"
exit 1
fi
die() {
echo $1
exit 1
}
IMAGE=$1
COREDUMP=$2
SOURCES=$3
set -x
# the image/build_oai builds in cmake_targets/ran_build/build, so source
# information is relative to this path. In case the user did not compile on
# their computer, this directory will not exist. still allow to find it by
# creating it
BUILD_DIR=$SOURCES/cmake_targets/ran_build/build
mkdir -p $BUILD_DIR || die "cannot create $BUILD_DIR: is $SOURCES valid?"
# check if coredump is valid file
[ -f $COREDUMP ] || die "no such file: $COREDUMP"
# check if image exists, and determine type (gnb, nr-ue) for correct invocation
# of gdb
docker image inspect $IMAGE > /dev/null || exit 1
if [ $(grep "oai-gnb:" <<< $IMAGE) ] || [ $(grep "oai-gnb-aerial:" <<< $IMAGE) ]; then
EXEC=bin/nr-softmodem
TYPEPATH=oai-gnb
elif [ $(grep "oai-nr-ue:" <<< $IMAGE) ]; then
EXEC=bin/nr-uesoftmodem
TYPEPATH=oai-nr-ue
elif [ $(grep "oai-enb:" <<< $IMAGE) ]; then
EXEC=bin/lte-softmodem
TYPEPATH=oai-enb
elif [ $(grep "oai-lte-ue:" <<< $IMAGE) ]; then
EXEC=bin/lte-uesoftmodem
TYPEPATH=oai-lte-ue
else
die "cannot determine if image is gnb or nr-ue: must match \"oai-gnb:\" or \"oai-nr-ue:\""
fi
# run gdb inside a container. We mount the coredump and the sources inside the
# container, and run gdb with the core dump, the correct executable, and using
# the source directory to show correct line numbers
docker run --rm -it \
-v $COREDUMP:/tmp/coredump \
-v $SOURCES:/opt/$TYPEPATH/src \
--entrypoint bash \
$IMAGE \
-c "gdb --dir=src/cmake_targets/ran_build/build $EXEC /tmp/coredump"
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment