Anonymization of argus flow data
Kaustubh Gadkari
kaustubh.gadkari at gmail.com
Tue Sep 3 09:59:29 EDT 2013
On Tue, Sep 3, 2013 at 7:34 AM, Jesper Skou Jensen
<jesper.skou.jensen at uni-c.dk> wrote:
> I can't say why ranonymize is taking that long, but it might be because
> there are many millions sessions in your logfile?
>
There are:
kaustubh at neutron:/raid0/kaustubh$ time racount -r
2013-05-02.1900.hJmbh.lander.argus.gz
racount records total_pkts src_pkts dst_pkts
total_bytes src_bytes dst_bytes
sum 660346226 22569184689 13942009249 8627175440
23277747475402 12592489900324 10685257575078
>
> You could try using rasplit on the file first and then analyzing the
> resulting split-files one by one.
>
Unfortunately, our use case does not permit for splitting the files,
unless there is a way to merge the split files after processing them.
> Depending on how long a period that 125GB file covers, you could split it
> into eg. days or hours. That way it should be much less taxing on
> cpu/memory/io usage.
>
Each files covers 12hours worth of data, but as I said before,
splitting the files is not a viable option at this point.
Thanks,
Kaustubh
>
> Regards
> Jesper
>
>
>
> On 02-09-2013 20:20, Kaustubh Gadkari wrote:
>>
>> Hi,
>>
>> I have a set of argus flow data captured at our data capture vantage
>> point, and I want to anonymize the IP addresses (both source and
>> destination) fully i.e. I want to replace both the addresses, using a prefix
>> preserving technique. I have tried using ranonymize, but it is taking an
>> extremely long time to anonymize the file (I started the process a couple of
>> months ago, on a ~125GB file, and the output file size today is only ~30GB).
>>
>> Can anyone suggest the right way to go about anonymizing the data set I
>> have? Is ranonymize the right tool for the job?
>>
>> Thanks,
>> Kaustubh
>>
>> --
>> Kaustubh Gadkari
>
>
--
Kaustubh Gadkari
More information about the argus
mailing list