Save Ipcad Network Statistic To Clickhouse

Temir Umurzakov · June 16, 2021

What you will find in this article:

  • Description of IPCAD daemon that pretends as a Cisco router and provides aggregated network statistic
  • Example of materialized views in ClickHouse database
  • ipcad2ch utility that will help you to organize accounting of network traffic

How I saved network statistics from IPCAD to ClickHouse

In the era of broadband Internet there is not so much sense to account network traffic. Maybe just a small ISP by some reason not taken by the big players. Infrastructure of this kind of ISP I proudly support. Technological stack is following:

  • FreeBSD server as router that aggregates network statistics in IPCAD daemon
  • FreeBSD server with self made billing on PHP and PostgreSQL database. Aggregates are saved in DB, detail statistic in zipped ipcad text files

This configuration was enough for narrow network channels of 2000th but now in the epoch of gigabytes become very bad and everything starts to shamelessly slow down. It’s time to do something…

If something starts to slow down, what do we need? Right, something that could not be slow. As everybody knows ClickHouse is never slow then we install it.

  ClickHouse  - is ...   ClickHouse is a database, created in the bowels of russian company Yandex. Most interesting thing in this database is that it is column db, i.e. store datum of tables not by rows how it is doing relation databases but by columns. Roughly saying, values of one column stored in one file. This gives advantages in processing data from several columns, not reading everything from disk by rows.

Main features of application are reports about service usage:

  • Who and how much traffic is consumed by customers in the report period and how much money we must ask from them.
  • Take a look at how much service is consumed by customers in particular day, hour and minute with beautiful charts.
  • Output detailed information by customer request where he was on the internet by requested period.

It comes from requirements that we need to store received data from IPCAD format as is and calculate some aggregates by time. Looks not so difficult…

IPCAD format

IPCAD (Cisco IP accounting simulator) - daemon that accounts passing through router traffic and calculating aggregates of packet count and traffic volume in RAM by tuple:

  • source IP
  • source Port
  • destination IP
  • destination Port
  • Protocol
  • Interface

Data could be fetched by requesting it using RSH

#!/bin/sh
RSH=`which rsh`
IP=<router IP address>
FILE=<saving path>

# flush statistic to control point
$RSH -l root $IP clear ip accounting > /dev/null

# show statistic in control point
$RSH -l root $IP show ip accounting checkpoint > $FILE

Output is:

   Source           Destination    Packets        Bytes  SrcPt DstPt Proto   IF
 188.218.189.188  188.138.119.98         1           88     83 28088    18  em1
 188.218.189.188  188.138.119.98         1           88     83 12038    18  em1
 108.232.38.113   188.218.189.198        1           80    883 28818     8  em1

This data we need to store not in the old fashioned way in the zip archive but in a brand new ClickHouse database.

Table structure in ClickHouse

One of the most powerful features of ClickHouse is materialized views. It is tables that could be updated automatically from data saved in other tables i.e we could insert into detail traffic tables and aggregates will be calculated automatically by giving rules.

Detail traffic table

 

CREATE TABLE IF NOT EXISTS details
(
    collected DateTime,
    user_id String,
    dir Enum8('unknown' = 0, 'in' = 1, 'out' = 2),
    class Enum8('unknown' = 0, 'local' = 1, 'peering' = 2, 'internet' = 3, 'multicast' = 4),
    src_ip UInt32,
    src_port UInt16,
    dst_ip UInt32,
    dst_port UInt16,
    packets UInt16,
    bytes UInt32,
    proto UInt8
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(collected)
ORDER BY (collected, user_id, dir, class, src_ip, dst_ip, proto)
SETTINGS index_granularity = 8192

In the table of detailed traffic we store IPCAD data as is also adding timestamp and several classification fields as customer id, traffic direction and traffic class. Let’s make materialized views:

##Табличка агрегатов по дням  

CREATE MATERIALIZED VIEW IF NOT EXISTS daily
(
    date Date,
    user_id String,
    class Enum8('unknown' = 0, 'local' = 1, 'peering' = 2, 'internet' = 3, 'multicast' = 4),
    dir Enum8('unknown' = 0, 'in' = 1, 'out' = 2),
    bytes AggregateFunction(sum, UInt32)
)
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (date, user_id, class, dir)
SETTINGS index_granularity = 8192 AS
SELECT
    toDate(collected) AS date,
    user_id,
    class,
    dir,
    sumState(bytes) AS bytes
FROM details
GROUP BY
    toDate(collected),
    user_id,
    class,
    dir

##Таблицы агрегатов по часам и минутам  

CREATE MATERIALIZED VIEW IF NOT EXISTS hourly
(
    date DateTime,
    user_id String,
    class Enum8('unknown' = 0, 'local' = 1, 'peering' = 2, 'internet' = 3, 'multicast' = 4),
    dir Enum8('unknown' = 0, 'in' = 1, 'out' = 2),
    bytes AggregateFunction(sum, UInt32)
)
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (date, user_id, class, dir)
SETTINGS index_granularity = 8192 AS
SELECT
    toStartOfHour(collected) AS date,
    user_id,
    class,
    dir,
    sumState(bytes) AS bytes
FROM details
GROUP BY
    toStartOfHour(collected),
    user_id,
    class,
    dir;

CREATE MATERIALIZED VIEW IF NOT EXISTS minutely
(
    date DateTime,
    user_id String,
    class Enum8('unknown' = 0, 'local' = 1, 'peering' = 2, 'internet' = 3, 'multicast' = 4),
    dir Enum8('unknown' = 0, 'in' = 1, 'out' = 2),
    bytes AggregateFunction(sum, UInt32)
)
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (date, user_id, class, dir)
SETTINGS index_granularity = 8192 AS
SELECT
    toStartOfMinute(collected) AS date,
    user_id,
    class,
    dir,
    sumState(bytes) AS bytes
FROM details
GROUP BY
    toStartOfMinute(collected),
    user_id,
    class,
    dir;

Now aggregates calculating automatically when data is inserted into the detail table. It’s magic!

Storing data into ClickHouse

Now there is a little left to make a tool for that. Drawing two ovals and then finishing the rest of the owl. I took my favorit Golang, parsed, classified and saved. I wrote a little simple application that is open source on github.

Usage

Remember in the beginning of the article was a script for fetching statistic from IPCAD, let’s change it:

#!/bin/sh
RSH=`which rsh`
IP=<router IP address>
FILE=<saving path>

# flush statistic to control point
$RSH -l root $IP clear ip accounting > /dev/null

# show statistic in control point
$RSH -l root $IP show ip accounting checkpoint | ipcad2ch > $FILE

Insert tool with pipe after data fetch but before writing it to file. Now the program reads ipcad data and saves it into ClickHouse then it prints it untouched to standard output. Thereby we could save data immediately even without changing legacy code. Now we need to collect enough statistics and smoothly transit to new data sources. As I said, old statistics are kept in zip archives in IPCAD format and could be also processed with this tool.

Twitter, Facebook