Anonymization of argus flow data

Kaustubh Gadkari kaustubh.gadkari at gmail.com
Tue Sep 3 09:59:29 EDT 2013


On Tue, Sep 3, 2013 at 7:34 AM, Jesper Skou Jensen
<jesper.skou.jensen at uni-c.dk> wrote:
> I can't say why ranonymize is taking that long, but it might be because
> there are many millions sessions in your logfile?
>

There are:

kaustubh at neutron:/raid0/kaustubh$ time racount -r
2013-05-02.1900.hJmbh.lander.argus.gz
racount   records     total_pkts     src_pkts       dst_pkts
total_bytes        src_bytes          dst_bytes
    sum   660346226   22569184689    13942009249    8627175440
23277747475402     12592489900324     10685257575078


>
> You could try using rasplit on the file first and then analyzing the
> resulting split-files one by one.
>

Unfortunately, our use case does not permit for splitting the files,
unless there is a way to merge the split files after processing them.

> Depending on how long a period that 125GB file covers, you could split it
> into eg. days or hours. That way it should be much less taxing on
> cpu/memory/io usage.
>

Each files covers 12hours worth of data, but as I said before,
splitting the files is not a viable option at this point.

Thanks,
Kaustubh

>
> Regards
> Jesper
>
>
>
> On 02-09-2013 20:20, Kaustubh Gadkari wrote:
>>
>> Hi,
>>
>> I have a set of argus flow data captured at our data capture vantage
>> point, and I want to anonymize the IP addresses (both source and
>> destination) fully i.e. I want to replace both the addresses, using a prefix
>> preserving technique. I have tried using ranonymize, but it is taking an
>> extremely long time to anonymize the file (I started the process a couple of
>> months ago, on a ~125GB file, and the output file size today is only ~30GB).
>>
>> Can anyone suggest the right way to go about anonymizing the data set I
>> have? Is ranonymize the right tool for the job?
>>
>> Thanks,
>> Kaustubh
>>
>> --
>> Kaustubh Gadkari
>
>



-- 
Kaustubh Gadkari



More information about the argus mailing list