Ubuntu Manpage: asncounter — collect hits per ASN and netblock

NAME

       asncounter — collect hits per ASN and netblock

DESCRIPTION

       Count  the  number  of  hits  (HTTP, packets, etc) per autonomous system number (ASN) and related network
       blocks.

       This is useful when you get a lot of traffic on a server to figure out which network is  responsible  for
       the  traffic,  to  direct  abuse complaints or block whole networks, or on core routers to figure out who
       your peers are and who you might want to seek particular peering agreements with.

SYNOPSIS

       asncounter OPTIONS [ADDRESS ...]

OPTIONS

       -h, –help
              show this help message and exit

       --cache-directory, -C CACHE_DIRECTORY
              where to store pyasn cache files, default: ~/.cache/pyasn

       --no-prefixes
              disable prefix count

       --no-asn
              disable ASN count

       --no-resolve-asn
              disable ASN to name resolution in output

       --top, -t N
              only show top N entries, default: 10

       --input, -i INPUT
              input file, default: stdin

       --input-format, -I {line,tuple,tcpdump,scapy}
              input format, default: line

       --scapy-filter SCAPY_FILTER
              BPF filter to apply to incoming packets, default: ip and not src host  0.0.0.0  and  not  src  net
              192.168.0.0/24

       --interface [INTERFACE]
              open an interface instead of stdin for packets, implies -I scapy, auto-detects by default

       --output, -o OUTPUT
              write stats or final prometheus metrics to the given file, default: stdout

       --output-format, -O {tsv,prometheus,null}
              output format, choices: tsv, prometheus, null, default: tsv

       --port, -p [PORT]
              start a prometheus server on the given port, default disabled, port 8999 if unspecified

       --refresh, -R
              download a recent RIB cache file and exit

       --repl run a REPL thread in main loop

       --manhole
              setup a REPL socket with manhole

       --debug
              more debugging output

       ADDRESS
              zero  or  more  IP  addresses  to parse directly from the command line, before the input stream is
              read.  disables the default stdin reading and --input-format cannot be changed.

INPUT FORMATS

The --input-formats option warrants more discussion.

line
The line input formats treats each line in the stream as an IP address, counting as one hit.

Empty lines are skipped, and comments – whatever follows the pound (#) sign – are trimmed. Whatever
cannot be parsed as an IP address is loggd as a warning and skipped.

This, for example, counts as a hit on two different IP addresses, for a total of two hits. It will also
yield a warning:

192.0.2.1 # comment
2001:DB8::1
# comment
garbage that generates a warning

tuple
Same as the line input format, except the count is specified in a second, whitespace-separated field.

This, for example, will count one hit for the first IP address, and two for the second one, and will
generate a warning.

192.0.2.1 1 # comment
2001:DB8::1 2
# comment
garbage that generates a warning

The “count” field can be anything: to represent a count, but also sizes, timings, asncounter doesn’t
care.

The counts are actually parsed as floats, as Python understand them.

The default output format (tsv) will round the numbers to the nearest even integer. This, for example,
adds up to 5, which might be surprising to some (because Python rounds 2.5 to 2 and not 3):

192.0.2.2 3.4
192.0.2.2 2.5

This is known as the “rounding half to even” rule or the IEEE 754 standard.

If the --output-format is set to prometheus floats will be recorded as accurately as Python allows. In
that context, the above correctly sums up to 5.9.

tcpdump
The tcpdump format is a bit of an odd ball: it parses a tcpdump(1) line with a regular expression to
extract the source IP address, and counts that.

It could be extended to count the packets sizes but currently does not do so. Likewise, it only tracks
the left (source) side of packets, and not the destination, but could be extended to track both.

This approach likely can’t deal with a multi-gigabit per second small packet attack (2 million packets
per second or more). But in a real production environment, it could easily deal with regular the 100-200
megabit per second traffic, where tcpdump and asncounter each took about 2% of one core to handle about
3-5 thousand packets per second.

scapy
The scapy input format is also special: instead of parsing text lines, it parses packets.

With the --interface flag, it will open the default interface unless one is provided (e.g. --interface is
generally equivalent to --interface eth0 if eth0 is the primary interface). This requires elevated
privileges.

This is much slower than the tcpdump parser (close to full 100% CPU usage) in a 100-200mbps scenario like
above, but could eventually be leveraged to implement byte counts, which are harder to extract from
tcpdump because of the variability of its output.

This only counts packets, regardless of direction, and, like tcpdump, only keeps track of source IP
addresses. Like tcpdump, it could also be improved by tracking sizes instead of counts, but does not
currently do so.

OUTPUT FORMATS

       The --output-format argument also warrants a little more discussion.

   tsv
       TSV stands for Tab-Separated Values.  It’s a poorly designed output formats that dumps two  tables  where
       rows  are separated by newlines and columns by tabs.  One table shows per ASN counts, the other shows per
       prefix counts.

       As mentioned in the above tuple section, counts are rounded when  recorded  in  tsv  mode.   This  is  to
       simplify           the           display,           in           theory,          the          underlying
       https://docs.python.org/3/library/collections.html#collections.Counterfl Counter supports floats as well.

       If more precision, long term storage, or alerting are needed, the prometheus output format is preferred.

       This format is useful because it doesn’t require any dependency outside of  the  standard  library  (and,
       obviously, pyasn).

   prometheus
       The prometheus output format keeps tracks of counters inside Prometheus data structures.  With the --port
       flag,  it  will  open up a port (defaulting to 8999) where metrics will be exposed over HTTP, without any
       special security, on all interfaces.

       Otherwise, upon completion, results will be written in a textfile collector-compatible format.

   null
       The null output formats doesn’t display anything.  It can be used for debugging, but internally uses  the
       same recorder as the tsv format.

EXAMPLES

   Simple web log counter
       This extracts the IP addresses from current access logs and reports ratios:

              > awk '{print $2}' /var/log/apache2/*access*.log | asncounter
              INFO: using datfile ipasn_20250527.1600.dat.gz
              INFO: collecting addresses from <stdin>
              INFO: loading datfile /home/anarcat/.cache/pyasn/ipasn_20250527.1600.dat.gz...
              INFO: finished reading data
              INFO: loading /home/anarcat/.cache/pyasn/asnames.json
              count   percent ASN AS
              12779   69.33   66496   SAMPLE, CA
              3361    18.23   None    None
              366 1.99    66497   EXAMPLE, FR
              337 1.83    16276   OVH, FR
              321 1.74    8075    MICROSOFT-CORP-MSN-AS-BLOCK, US
              309 1.68    14061   DIGITALOCEAN-ASN, US
              128 0.69    16509   AMAZON-02, US
              77  0.42    48090   DMZHOST, GB
              56  0.3 136907  HWCLOUDS-AS-AP HUAWEI CLOUDS, HK
              53  0.29    17621   CNCGROUP-SH China Unicom Shanghai network, CN
              total: 18433
              count   percent prefix  ASN AS
              12779   69.33   192.0.2.0/24    66496   SAMPLE, CA
              3361    18.23   None
              298 1.62    178.128.208.0/20    14061   DIGITALOCEAN-ASN, US
              289 1.57    51.222.0.0/16   16276   OVH, FR
              272 1.48    2001:DB8::/48   66497   EXAMPLE, FR
              235 1.27    172.160.0.0/11  8075    MICROSOFT-CORP-MSN-AS-BLOCK, US
              94  0.51    2001:DB8:1::/48 66497   EXAMPLE, FR
              72  0.39    47.128.0.0/14   16509   AMAZON-02, US
              69  0.37    93.123.109.0/24 48090   DMZHOST, GB
              53  0.29    27.115.124.0/24 17621   CNCGROUP-SH China Unicom Shanghai network, CN

       This can also be done in real time of course:

              tail -F /var/log/apache2/*access*.log | awk '{print $2}' | asncounter

       The  above  report  will  be  generated when the process is killed.  Send SIGHUP to show a report without
       interrupting the parser:

              pkill -HUP asncounter

       You can count sizes with --input-format=tuple as well.  Assuming the size field is in  the  10th  column,
       this will sum sizes instead of just number of hits:

              tail -F /var/log/apache2/*access*.log | awk '{print $1 $10}' |
              asncounter --input-format=tuple

       If logs hold that information, you can also add up processing times, for example.

   tcpdump parser
       Extract IP addresses from incoming TCP/UDP packets on eth0 and report the top 5:

              > tcpdump -c 10000 -q -i eth0 -n -Q in "(udp or tcp)" | asncounter --top 5 --input-format tcpdump
              tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
              listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
              INFO: collecting IPs from stdin, using datfile ipasn_20250523.1600.dat.gz
              INFO: loading datfile /root/.cache/pyasn/ipasn_20250523.1600.dat.gz...
              INFO: loading /root/.cache/pyasn/asnames.json
              ASN     count   AS
              136907  7811    HWCLOUDS-AS-AP HUAWEI CLOUDS, HK
              8075    254     MICROSOFT-CORP-MSN-AS-BLOCK, US
              62744   164     QUINTEX, US
              24940   114     HETZNER-AS, DE
              14618   82      AMAZON-AES, US
              prefix  count
              166.108.192.0/20        1294
              188.239.32.0/20 1056
              166.108.224.0/20        970
              111.119.192.0/20        951
              124.243.128.0/18        667

       A query similar to the HTTP log parser might be:

              tcpdump -q -i eth0 -n -Q in "tcp and (port 80 or port 443)" | grep
              'Flags \[S\]' | asncounter --input-format=tcpdump --repl

       ...  otherwise you will get different results from a pure packet count, as various connections will yield
       different  number  of  packets!   The  above counts connection attempts, which is still different than an
       actual HTTP hit, as the connection could be refused before it reaches the webserver or aborted before  it
       gets logged properly.

       It’s  still a good estimate, and is especially useful if you do not log IP addresses, for example on high
       traffic caching servers.

       Note that we use grep above because tcpdump’s tcp[tcpflags] & tcp-syn != 0 only works for IPv4 packets, a
       disappointing (but understandable) limitation.

   scapy parser
       Extract IP addresses directly from the network interface, bypassing tcpdump entirely:

              asncounter --interface

   REPL
       With --repl, you will drop into a Python shell where you can interactively get real-time statistics:

              > awk '{print $2}' /var/log/apache2/*access*.log | asncounter --repl --top 2
              INFO: using datfile ipasn_20250527.1600.dat.gz
              INFO: collecting addresses from <stdin>
              INFO: starting interactive console, use recorder.display_results() to show current results
              INFO: recorder.asn_counter and .prefix_counter dictionaries have the full data
              Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
              Type "help", "copyright", "credits" or "license" for more information.
              (InteractiveConsole)
              >>> INFO: loading datfile /home/anarcat/.cache/pyasn/ipasn_20250527.1600.dat.gz...
              INFO: finished reading data

              >>> recorder.display_results()
              INFO: loading /home/anarcat/.cache/pyasn/asnames.json
              count   percent ASN AS
              13008   69.38   66496   SAMPLE, CA
              3422    18.25   None    None
              total: 18748
              count   percent prefix  ASN AS
              13008   69.38   192.0.2.0/24    66496   SAMPLE, CA
              3422    18.25   None
              total: 18748
              >>> recorder.asn_counter
              Counter({66496: 13008, None: 3422, [...]})
              >>> recorder.prefix_counter
              Counter({'192.0.2.0/24': 13008, None: 3422, [...]})

       So you can get the actual number of hits for an AS, even if it’s not listed in the --top entries with:

              >>> recorder.asn_counter.get(66496)
              13008

   Blocking whole networks
       asncounter does not block anything: it only counts.  Another mechanism needs to be used to actually block
       attackers or act on the collected data.

       If you want to block the network blocks, you can use  the  shown  netblocks  directly  in  (say)  Linux’s
       netfilter  firewall,  or  Nginx’s access module geo module.  For example, this will reject traffic from a
       network with iptables:

              iptables -I INPUT -s 192.0.2.0/24 -j REJECT

       or with nftables:

              nft insert rule inet filter INPUT 'ip saddr 192.0.2.0/24 reject'

       This will likely become impractical with large number of networks, look into IP sets to scale that up.

       With Nginx, you can block a network with the deny directive:

              deny 192.0.2.0/24;

       This will return a 403 status code.  If you want to be fancier, you can return a tailored status code and
       build a larger list with the geo module:

              geo $geo_map_deny {
                  default 0;

                  192.0.2.0/24 1;
              }

              if ($geo_map_deny) {
                return 429;
              }

       Many networks can be listed in the geo block relatively effectively.

       [pyasn][] doesn’t (unfortunately) provide an easy command line interface to extract the data you need  to
       block an entire AS.  For that, you need to revert to some Python.  From inside the --repl loop:

              print("\n".join(sorted(recorder.asn_all_prefixes(64496))))

       This  will  give  you  the  list of ALL prefixes associated with AS64496, which is actually empty in this
       case, as AS64496 is an example AS from RFC5398.

       Note the list of prefixes is  not  aggregated  by  default.   If  netaddr  is  installed,  you  can  pass
       aggregate=True to reduce the set.

   Aggregating results
       It  might  be  worth  aggregating  large  number  of  netblocks  for  performance reasons.  Network block
       announcements can be spread in multiple contiguous blocks for various reasons and can often be unified in
       smaller sets.  For IPv4-only, iprange is good (and fast) enough:

              > grep -v :: networks > networks-ipv4
              > iprange < networks-ipv4 > networks-ipv4-filtered
              > wc -l networks*
                588 networks
                495 networks-ipv4
                181 networks-ipv4-filtered

       If you have it installed, the netaddr Python package can also do that for you, and it supports IPv6:

              import netaddr
              print("\n".join([str(n) for n in netaddr.cidr_merge(recorder.asndb.get_as_prefixes(64496))]))

       Note that asncounter can aggregate those results directly now, for example:

              print(recorder.asn_all_prefixes_str(66496, aggregate=True))

       ...  but, as above, it requires the netaddr package to be available.

   Selective blocking
       A more delicate approach is to block all network blocks from a specific ASN that have been found  in  the
       result sets, instead of blocking the entire netblock.

       The  recorder.as_prefixes  and  recorder.as_prefixes_str  functions can do this for you, merging multiple
       ASNs and aggregating with netaddr as well:

              print(recorder.asn_prefixes_str(66496, 66497, aggregate=True))

       Note that the asn_prefixes selectors are not implemented in Prometheus mode.

       Remember you can extract the list of current ASNs and prefixes just by looking at the dictionary keys  as
       well:

              print("\n".join(recorder.asn_counter.keys()))
              print("\n".join(recorder.prefix_counter.keys()))

FILES

       ~/.cache/pyasn/
              Default storage location for pyasn cache files.

       /run/$UID/asncounter-manhole-$PID or ~/.local/.state/asncounter-manhole-$PID
              Default location for the debugging manhole socket, if enabled.

LIMITATIONS

       • only counts, does not calculate bandwidth, but could be extended to do so

       • does  not  actually  do  any  sort of mitigation or blocking, purely an analysis tool; if you want such
         mitigation, hook up asncount in Prometheus and AlertManager with web hooks,  this  is  not  a  fail2ban
         rewrite

       • test coverage is relatively low, 37% as of this writing.  most critical paths are covered, although not
         the scapy parser or the RIB file download procedures

       • requires  downloading  RIB  files,  could be improved by talking directly with a BGP router daemon like
         Bird or FRR

       • only a small set of tcpdump outputs have been tested

       • the REPL shell does not have proper readline support (keyboard arrows, control characters like “control
         a” do not work)

       Note that this documentation and test code uses sample AS  numbers  from  RFC5398,  IPv4  addresses  from
       RFC5737, and IPv6 addresses from RFC3849.  Some more well known entities (e.g. Amazon, Facebook) have not
       been redacted from the output for clarity.

   Performance considerations
       As  mentioned  above,  this  will unlikely tolerate multi-gigabit denial of service attacks.  The tcpdump
       parser, however, is pretty fast and should be able to sustain  a  saturated  gigabit  link  under  normal
       conditions.  The scapy parser is slower.

       Memory usage seems reasonable: on startup, it uses about 250MB of memory, and a long-running process with
       about 40 000 blocks was using about 400MB.

       By  extrapolation, it is expect that data on the full routing table (currently 1.2 million entries) could
       be held within 12 GB of memory, although that would be a rare condition, only occurring on a core  router
       with traffic from literally the entire internet.

   Security considerations
       There’s  an  unknown  in  the  form  of the C implementation of a Radix tree in pyasn.  asncounter itself
       should be fairly safe: it does not trust its inputs, and the  worse  it  can  do  is  likely  a  resource
       exhaustion attack on high traffic.

       It  can  run  completely  unprivileged  as  long  as  it  has access to the input files, although in many
       scenarios people will not bother to drop privileges before calling it and it will not, itself, attempt to
       do so.

       Privileges can be dropped with systemd-run, for example:

              systemd-run --pipe --property=DynamicUser=yes \
                 --property=CacheDirectory=asncounter \
                 --setenv=XDG_CACHE_HOME=/var/cache/asncounter \
                 -- asncounter

       This interacts poorly with --repl option, as it tries to reopen the tty for stdin.  You might have better
       luck with sharing a debug socket with --manhole:

              systemd-run --pipe --property=DynamicUser=yes \
                 --property=CacheDirectory=asncounter \
                 --setenv=XDG_CACHE_HOME=/var/cache/asncounter \
                 -- asncounter --manhole=/var/cache/asncounter/asncounter-manhole

       Then you can open a Python debugging shell for further diagnostics with:

              nc -U /var/cache/asncounter/asncounter-manhole

AUTHOR

       Antoine Beaupré anarcat@debian.org