Commit e615321e authored by Robert Schmidt's avatar Robert Schmidt

Rewrite debugging instructions for CI fail

parent 06820d37
......@@ -250,7 +250,44 @@ Some tests are run from source (e.g.
`ci-scripts/xml_files/gnb_phytest_usrp_run.xml`), which directly give the
options they are run with.
## How to retrieve core dumps (for CI team members)
## How to debug CI failures
It is possible to debug CI failures using the generated core dump and the image
used for the run. A script is provided (see developer instructions below) that,
provided the core dump file, container image, and the source tree, executes
`gdb` inside the container; using the core dump information, a developer can
investigate the cause of failure.
### Developer instructions
The CI team will send you a docker image and a core dump file, and the commit
as of which the pipeline failed. Let's assume the coredump is stored at
`/tmp/coredump.tar.xz`, and the image is in `/tmp/oai-nr-ue.tar.gz`. First, you
should check out the corresponding branch (or directly the commit), let's say
in `~/oai-branch-fail`. Now, unpack the core dump, load the image into docker,
and use the script [`docker/debug_core_image.sh`](../docker/debug_core_image.sh)
to open gdb, as follows:
```
cd /tmp
tar -xJf /tmp/coredump.tar.xz
docker load < /tmp/oai-nr-ue.tar.gz
~/oai-branch-fail/docker/debug_core_image.sh <image> /tmp/coredump ~/oai-branch-fail
```
where you replace `<image>` with the image loaded in `docker load`. The script
will start the container and open gdb; you should see information about where
the failure (e.g., segmentation fault) happened. If you just see `??`, the core
dump and container image don't match. Be also on the lookout for the
corresponding message from gdb:
```
warning: core file may not match specified executable file.
```
Once you quit `gdb`, the container image will be removed automatically.
### CI team instructions
The entrypoint scripts of all containers print the core pattern that is used on
the running machine. Search for `core_pattern` at the start of the container
......@@ -262,19 +299,14 @@ logs to retrieve the possible location. Possible locations might be:
- abrt: see [documentation](https://abrt.readthedocs.io/en/latest/usage.html)
- apport: see [documentation](https://wiki.ubuntu.com/Apport)
You furthermore have to extract the executable that caused the core dump.
Download the container image, and extract, e.g.:
See below for instructions on how to retrieve the core dump. Further, download
the image and store it to a file using `docker save`. Make sure to pick the
right image (Ubuntu or RHEL)!
```
docker create --name c1 porcepix.sboai.cs.eurecom.fr/oai-gnb:develop-c99db698
docker cp c1:/opt/oai-gnb/bin/nr-softmodem /tmp
docker rm c1
```
### Core dump in a file
#### Core dump in a file
**This is not recommended, as files could pile up and fill the system disk
completely!** Prefer systemd or abrt instead.
completely!** Prefer another method further down.
If the core pattern is a path: it should at least include the time in the
pattern name (suggested pattern: `/tmp/core.%e.%p.%t`) to correlate the time
......@@ -282,38 +314,31 @@ the segfault occurred with the CI logs. If you identified the core dump,
copy the core dump from that machine; if identification is difficult, consider
rerunning the pipeline.
### Core dump via systemd
#### Core dump via systemd
Run this command to list all core dumps:
Use the first command to list all core dumps. Scroll down to the core dump of
interest (it lists the executables in the last column; use the time to
correlate the segfault and the CI run). Take the PID of the executable (first
column after the time). Dump the core dump to a location of your choice.
```
sudo coredumpctl list
```
Scroll to the end and find the core dump of interest (it lists the executables
in the last column; use the time to correlate the segfault and the CI run).
Take the PID of the executable (first column after the time). Dump the core
dump to a location of your choice:
```
sudo coredumpctl dump <PID> > /tmp/coredump
```
### Core dump via abrt (automatic bug reporting tool)
#### Core dump via abrt (automatic bug reporting tool)
TBD: use the documentation page for the moment.
### Core dump via apport
On Ubuntu machines, apport first needs to be enabled to collect core dumps:
```
sudo systemctl enable apport.service
```
and [needs to be enabled](https://wiki.ubuntu.com/Apport#How_to_enable_apport).
Then, show a list of core dumps using
#### Core dump via apport
I did not find an easy way to use apport. Anyway, the systemd approach works
fine. So remove apport, install systemd-coredump, and verify it is the new
coredump handler:
```
sudo apport-cli
sudo systemctl stop apport
sudo systemctl mask --now apport
sudo apt install systemd-coredump
# Verify this changed the core pattern to a pipe to systemd-coredump
sysctl kernel.core_pattern
```
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment