Ubuntu Manpage: debuginfod - debuginfo-related http file-server daemon

Provided by: debuginfod_0.186-1ubuntu0.1_amd64

NAME

       debuginfod - debuginfo-related http file-server daemon

SYNOPSIS

       debuginfod [OPTION]... [PATH]...

DESCRIPTION

debuginfod serves debuginfo-related artifacts over HTTP. It periodically scans a set of directories for
ELF/DWARF files and their associated source code, as well as archive files containing the above, to build
an index by their buildid. This index is used when remote clients use the HTTP webapi, to fetch these
files by the same buildid.

If a debuginfod cannot service a given buildid artifact request itself, and it is configured with
information about upstream debuginfod servers, it queries them for the same information, just as
debuginfod-find would. If successful, it locally caches then relays the file content to the original
requester.

Indexing the given PATHs proceeds using multiple threads. One thread periodically traverses all the
given PATHs logically or physically (see the -L option). Duplicate PATHs are ignored. You may use a
file name for a PATH, but source code indexing may be incomplete; prefer using a directory that contains
the binaries. The traversal thread enumerates all matching files (see the -I and -X options) into a work
queue. A collection of scanner threads (see the -c option) wait at the work queue to analyze files in
parallel.

If the -F option is given, each file is scanned as an ELF/DWARF file. Source files are matched with
DWARF files based on the AT_comp_dir (compilation directory) attributes inside it. Caution: source files
listed in the DWARF may be a path anywhere in the file system, and debuginfod will readily serve their
content on demand. (Imagine a doctored DWARF file that lists /etc/passwd as a source file.) If this is
a concern, audit your binaries with tools such as:

% eu-readelf -wline BINARY | sed -n '/^Directory.table/,/^File.name.table/p'
or
% eu-readelf -wline BINARY | sed -n '/^Directory.table/,/^Line.number/p'
or even use debuginfod itself:
% debuginfod -vvv -d :memory: -F BINARY 2>&1 | grep 'recorded.*source'
^C

If any of the -R, -U, or -Z options is given, each file is scanned as an archive file that may contain
ELF/DWARF/source files. Archive files are recognized by extension. If -R is given, ".rpm" files are
scanned; if -U is given, ".deb" and ".ddeb" files are scanned; if -Z is given, the listed extensions are
scanned. Because of complications such as DWZ-compressed debuginfo, may require two traversal passes to
identify all source code. Source files for RPMs are only served from other RPMs, so the caution for -F
does not apply. Note that due to Debian/Ubuntu packaging policies & mechanisms, debuginfod cannot re‐
solve source files for DEB/DDEB at all.

If no PATH is listed, or none of the scanning options is given, then debuginfod will simply serve content
that it accumulated into its index in all previous runs, periodically groom the database, and federate to
any upstream debuginfod servers. In passive mode, debuginfod will only serve content from a read-only
index and federated upstream servers, but will not scan or groom.

OPTIONS

-F Activate ELF/DWARF file scanning. The default is off.

-Z EXT -Z EXT=CMD
Activate an additional pattern in archive scanning. Files with name extension EXT (include the
dot) will be processed. If CMD is given, it is invoked with the file name added to its argument
list, and should produce a common archive on its standard output. Otherwise, the file is read as
if CMD were "cat". Since debuginfod internally uses libarchive to read archive files, it can ac‐
cept a wide range of archive formats and compression modes. The default is no additional pat‐
terns. This option may be repeated.

-R Activate RPM patterns in archive scanning. The default is off. Equivalent to -Z .rpm=cat, since
libarchive can natively process RPM archives. If your version of libarchive is much older than
2020, be aware that some distributions have switched to an incompatible zstd compression for their
payload. You may experiment with -Z .rpm='(rpm2cpio|zstdcat)<' instead of -R.

-U Activate DEB/DDEB patterns in archive scanning. The default is off. Equivalent to
-Z .deb='dpkg-deb --fsys-tarfile' -Z .ddeb='dpkg-deb --fsys-tarfile'.

-d FILE --database=FILE
Set the path of the sqlite database used to store the index. This file is disposable in the sense
that a later rescan will repopulate data. It will contain absolute file path names, so it may not
be portable across machines. It may be frequently read/written, so it should be on a fast
filesystem. It should not be shared across machines or users, to maximize sqlite locking perfor‐
mance. For quick testing the magic string ":memory:" can be used to use an one-time memory-only
database. The default database file is $HOME/.debuginfod.sqlite.

--passive
Set the server to passive mode, where it only services webapi requests, including participating in
federation. It performs no scanning, no grooming, and so only opens the sqlite database read-on‐
ly. This way a database can be safely shared between a active scanner/groomer server and multiple
passive ones, thereby sharing service load. Archive pattern options must still be given, so de‐
buginfod can recognize file name extensions for unpacking.

-D SQL --ddl=SQL
Execute given sqlite statement after the database is opened and initialized as extra DDL (SQL data
definition language). This may be useful to tune performance-related pragmas or indexes. May be
repeated. The default is nothing extra.

-p NUM --port=NUM
Set the TCP port number (0 < NUM < 65536) on which debuginfod should listen, to service HTTP re‐
quests. Both IPv4 and IPV6 sockets are opened, if possible. The webapi is documented below. The
default port number is 8002.

-I REGEX --include=REGEX -X REGEX --exclude=REGEX
Govern the inclusion and exclusion of file names under the search paths. The regular expressions
are interpreted as unanchored POSIX extended REs, thus may include alternation. They are evaluat‐
ed against the full path of each file, based on its realpath(3) canonicalization. By default, all
files are included and none are excluded. A file that matches both include and exclude REGEX is
excluded. (The contents of archive files are not subject to inclusion or exclusion filtering:
they are all processed.) Only the last of each type of regular expression given is used.

-t SECONDS --rescan-time=SECONDS
Set the rescan time for the file and archive directories. This is the amount of time the traver‐
sal thread will wait after finishing a scan, before doing it again. A rescan for unchanged files
is fast (because the index also stores the file mtimes). A time of zero is acceptable, and means
that only one initial scan should performed. The default rescan time is 300 seconds. Receiving a
SIGUSR1 signal triggers a new scan, independent of the rescan time (including if it was zero), in‐
terrupting a groom pass (if any).

-r Apply the -I and -X during groom cycles, so that files excluded by the regexes are removed from
the index. These parameters are in addition to what normally qualifies a file for grooming, not a
replacement.

-g SECONDS --groom-time=SECONDS Set the groom time for the index database. This is the amount of
time the grooming thread will wait after finishing a grooming pass before doing it again. A groom
operation quickly rescans all previously scanned files, only to see if they are still present and
current, so it can deindex obsolete files. See also the DATA MANAGEMENT section. The default
groom time is 86400 seconds (1 day). A time of zero is acceptable, and means that only one ini‐
tial groom should be performed. Receiving a SIGUSR2 signal triggers a new grooming pass, indepen‐
dent of the groom time (including if it was zero), interrupting a rescan pass (if any)..

-G Run an extraordinary maximal-grooming pass at debuginfod startup. This pass can take considerable
time, because it tries to remove any debuginfo-unrelated content from the archive-related parts of
the index. It should not be run if any recent archive-related indexing operations were aborted
early. It can take considerable space, because it finishes up with an sqlite "vacuum" operation,
which repacks the database file by triplicating it temporarily. The default is not to do maximal-
grooming. See also the DATA MANAGEMENT section.

-c NUM --concurrency=NUM
Set the concurrency limit for the scanning queue threads, which work together to process archives
& files located by the traversal thread. This important for controlling CPU-intensive operations
like parsing an ELF file and especially decompressing archives. The default is the number of
processors on the system; the minimum is 1.

-L Traverse symbolic links encountered during traversal of the PATHs, including across devices - as
in find -L. The default is to traverse the physical directory structure only, stay on the same
device, and ignore symlinks - as in find -P -xdev. Caution: a loops in the symbolic directory
tree might lead to infinite traversal.

--fdcache-fds=NUM --fdcache-mbs=MB --fdcache-prefetch=NUM2
Configure limits on a cache that keeps recently extracted files from archives. Up to NUM request‐
ed files and up to a total of MB megabytes will be kept extracted, in order to avoid having to de‐
compress their archives over and over again. In addition, up to NUM2 other files from an archive
may be prefetched into the cache before they are even requested. The default NUM, NUM2, and MB
values depend on the concurrency of the system, and on the available disk space on the $TMPDIR or
/tmp filesystem. This is because that is where the most recently used extracted files are kept.
Grooming cleans this cache.

--fdcache--prefetch-fds=NUM --fdcache--prefetch-mbs=MB
Configure how many file descriptors (fds) and megabytes (mbs) are allocated to the prefetch fd‐
cache. If unspecified, values of --prefetch-fds and --prefetch-mbs depend on concurrency of the
system and on the available disk space on the $TMPDIR. Allocating more to the prefetch cache will
improve performance in environments where different parts of several large archives are being ac‐
cessed.

--fdcache-mintmp=NUM
Configure a disk space threshold for emergency flushing of the cache. The filesystem holding the
cache is checked periodically. If the available space falls below the given percentage, the cache
is flushed, and the fdcache will stay disabled until the next groom cycle. This mechanism, along
a few associated /metrics on the webapi, are intended to give an operator notice about storage
scarcity - which can translate to RAM scarcity if the disk happens to be on a RAM virtual disk.
The default threshold is 25%.

--forwarded-ttl-limit=NUM
Configure limits of X-Forwarded-For hops. if X-Forwarded-For exceeds N hops, it will not delegate
a local lookup miss to upstream debuginfods. The default limit is 8.

-v Increase verbosity of logging to the standard error file descriptor. May be repeated to increase
details. The default verbosity is 0.

WEBAPI

debuginfod's webapi resembles ordinary file service, where a GET request with a path containing a known
buildid results in a file. Unknown buildid / request combinations result in HTTP error codes. This file
service resemblance is intentional, so that an installation can take advantage of standard HTTP manage‐
ment infrastructure.

Upon finding a file in an archive or simply in the database, some custom http headers are added to the
response. For files in the database X-DEBUGINFOD-FILE and X-DEBUGINFOD-SIZE are added. X-DEBUGINFOD-FILE
is simply the unescaped filename and X-DEBUGINFOD-SIZE is the size of the file. For files found in
archives, in addition to X-DEBUGINFOD-FILE and X-DEBUGINFOD-SIZE, X-DEBUGINFOD-ARCHIVE is added. X-DE‐
BUGINFOD-ARCHIVE is the name of the archive the file was found in.

There are three requests. In each case, the buildid is encoded as a lowercase hexadecimal string. For
example, for a program /bin/ls, look at the ELF note GNU_BUILD_ID:

% readelf -n /bin/ls | grep -A4 build.id
Note section [ 4] '.note.gnu.buildid' of 36 bytes at offset 0x340:
Owner Data size Type
GNU 20 GNU_BUILD_ID
Build ID: 8713b9c3fb8a720137a4a08b325905c7aaf8429d

Then the hexadecimal BUILDID is simply:

8713b9c3fb8a720137a4a08b325905c7aaf8429d

/buildid/BUILDID/debuginfo
If the given buildid is known to the server, this request will result in a binary object that contains
the customary .*debug_* sections. This may be a split debuginfo file as created by strip, or it may be
an original unstripped executable.

/buildid/BUILDID/executable
If the given buildid is known to the server, this request will result in a binary object that contains
the normal executable segments. This may be a executable stripped by strip, or it may be an original un‐
stripped executable. ET_DYN shared libraries are considered to be a type of executable.

/buildid/BUILDID/source/SOURCE/FILE
If the given buildid is known to the server, this request will result in a binary object that contains
the source file mentioned. The path should be absolute. Relative path names commonly appear in the
DWARF file's source directory, but these paths are relative to individual compilation unit AT_comp_dir
paths, and yet an executable is made up of multiple CUs. Therefore, to disambiguate, debuginfod expects
source queries to prefix relative path names with the CU compilation-directory, followed by a mandatory
"/".

Note: the caller may or may not elide ../ or /./ or extraneous /// sorts of path components in the direc‐
tory names. debuginfod accepts both forms. Specifically, debuginfod canonicalizes path names according
to RFC3986 section 5.2.4 (Remove Dot Segments), plus reducing any // to / in the path.

For example:
#include <stdio.h> /buildid/BUILDID/source/usr/include/stdio.h
/path/to/foo.c /buildid/BUILDID/source/path/to/foo.c
../bar/foo.c AT_comp_dir=/zoo/ /buildid/BUILDID/source/zoo//../bar/foo.c

Note: the client should %-escape characters in /SOURCE/FILE that are not shown as "unreserved" in section
2.3 of RFC3986. Some characters that will be escaped include "+", "\", "$", "!", the 'space' character,
and ";". RFC3986 includes a more comprehensive list of these characters.

/metrics
This endpoint returns a Prometheus formatted text/plain dump of a variety of statistics about the opera‐
tion of the debuginfod server. The exact set of metrics and their meanings may change in future ver‐
sions. Caution: configuration information (path names, versions) may be disclosed.

DATA MANAGEMENT

debuginfod stores its index in an sqlite database in a densely packed set of interlinked tables. While
the representation is as efficient as we have been able to make it, it still takes a considerable amount
of data to record all debuginfo-related data of potentially a great many files. This section offers some
advice about the implications.

As a general explanation for size, consider that debuginfod indexes ELF/DWARF files, it stores their
names and referenced source file names, and buildids will be stored. When indexing archives, it stores
every file name of or in an archive, every buildid, plus every source file name referenced from a DWARF
file. (Indexing archives takes more space because the source files often reside in separate subpackages
that may not be indexed at the same pass, so extra metadata has to be kept.)

Getting down to numbers, in the case of Fedora RPMs (essentially, gzip-compressed cpio files), the sqlite
index database tends to be from 0.5% to 3% of their size. It's larger for binaries that are assembled
out of a great many source files, or packages that carry much debuginfo-unrelated content. It may be
even larger during the indexing phase due to temporary sqlite write-ahead-logging files; these are check‐
pointed (cleaned out and removed) at shutdown. It may be helpful to apply tight -I or -X regular-expres‐
sion constraints to exclude files from scanning that you know have no debuginfo-relevant content.

As debuginfod runs in normal active mode, it periodically rescans its target directories, and any new
content found is added to the database. Old content, such as data for files that have disappeared or
that have been replaced with newer versions is removed at a periodic grooming pass. This means that the
sqlite files grow fast during initial indexing, slowly during index rescans, and periodically shrink dur‐
ing grooming. There is also an optional one-shot maximal grooming pass is available. It removes infor‐
mation debuginfo-unrelated data from the archive content index such as file names found in archives
("archive sdef" records) that are not referred to as source files from any binaries find in archives
("archive sref" records). This can save considerable disk space. However, it is slow and temporarily
requires up to twice the database size as free space. Worse: it may result in missing source-code info
if the archive traversals were interrupted, so that not all source file references were known. Use it
rarely to polish a complete index.

You should ensure that ample disk space remains available. (The flood of error messages on -ENOSPC is
ugly and nagging. But, like for most other errors, debuginfod will resume when resources permit.) If
necessary, debuginfod can be stopped, the database file moved or removed, and debuginfod restarted.

sqlite offers several performance-related options in the form of pragmas. Some may be useful to fine-
tune the defaults plus the debuginfod extras. The -D option may be useful to tell debuginfod to execute
the given bits of SQL after the basic schema creation commands. For example, the "synchronous",
"cache_size", "auto_vacuum", "threads", "journal_mode" pragmas may be fun to tweak via -D, if you're
searching for peak performance. The "optimize", "wal_checkpoint" pragmas may be useful to run periodi‐
cally, outside debuginfod. The default settings are performance- rather than reliability-oriented, so a
hardware crash might corrupt the database. In these cases, it may be necessary to manually delete the
sqlite database and start over.

As debuginfod changes in the future, we may have no choice but to change the database schema in an incom‐
patible manner. If this happens, new versions of debuginfod will issue SQL statements to drop all prior
schema & data, and start over. So, disk space will not be wasted for retaining a no-longer-useable
dataset.

In summary, if your system can bear a 0.5%-3% index-to-archive-dataset size ratio, and slow growth after‐
wards, you should not need to worry about disk space. If a system crash corrupts the database, or you
want to force debuginfod to reset and start over, simply erase the sqlite file before restarting debugin‐
fod.

In contrast, in passive mode, all scanning and grooming is disabled, and the index database remains read-
only. This makes the database more suitable for sharing between servers or sites with simple one-way
replication, and data management considerations are generally moot.

SECURITY

       debuginfod does not include any particular security features.  While it is robust with respect to inputs,
       some  abuse is possible.  It forks a new thread for each incoming HTTP request, which could lead to a de‐
       nial-of-service in terms of RAM, CPU, disk I/O, or network I/O.  If this is a problem, users are  advised
       to  install  debuginfod with a HTTPS reverse-proxy front-end that enforces site policies for firewalling,
       authentication, integrity, authorization, and load control.  The /metrics webapi endpoint is probably not
       appropriate for disclosure to the public.

       When relaying queries to upstream debuginfods, debuginfod does not include any particular  security  fea‐
       tures.   It  trusts  that  the binaries returned by the debuginfods are accurate.  Therefore, the list of
       servers should include only trustworthy ones.  If accessed across HTTP rather  than  HTTPS,  the  network
       should  be trustworthy.  Authentication information through the internal libcurl library is not currently
       enabled.

ADDITIONAL FILES

       $HOME/.debuginfod.sqlite
              Default database file.