multi-threaded clients
Carter Bullard
carter at qosient.com
Mon Jun 25 12:47:53 EDT 2012
Hey CS Lee,
So, splitting the tasks into parallelizable subtasks and then scheduling them off
are great and useful strategies. Most currently use the existing tools, carve the problem
up themselves, like running 10 rabins(), against 10 time ranges, where you
carve the time up into 10 equal periods, and you create the tasks, take the outputs and
then merge the data into a single output file.
This would be the hadoop way of doing it. With multiple cores/machines looking at the
same datastore, you can get great scalability that way, but argus-clients doesn't have
to do anything different to be a part of that type of system.
If we can come up with some compelling examples, I don't have any problem with
putting in some hooks for doing coarse grain parallelism.
Carter
On Jun 25, 2012, at 11:30 AM, CS Lee wrote:
> hi Carter,
>
> Those instrumental tools that we use to perform analysis such as rabin, rahisto and rafilteraddr.
>
> One thing I found it tough to retrieve primitive data from mysql blob data in certain time range say if we are generating 10G of primitive data in that particular time range, when performing analysis using rasql, we need to load all of them into memory and run through it, that hogs the memory.
>
> I can't remember where I read, I think something like what splunk is doing, if they have 10G of data, they will split them and load 1G each and process them piece by piece, so forever the memory consumption is 1G instead of throwing 10G to the memory at once.
>
> Cheers!
>
> On Mon, Jun 25, 2012 at 11:09 PM, Carter Bullard <carter at qosient.com> wrote:
> Hey CS Lee,
> All the clients are multi-threaded, now. If you have RA_RELIABLE_CONNECT enabled in your
> .rarc files, your clients will spawn a thread to maintain the connection to the remote data source.
>
> Radium has a thread per input stream, and a thread for the output stream, as well as threads
> for the reliable connections. ratop() has a few threads, one for the screen, one for input, etc…..
>
> Multi-threading is done, with lots of mutex'es, queues, etc….. What do you want to thread?
>
> Carter
>
>
> On Jun 25, 2012, at 11:04 AM, CS Lee wrote:
>
> > hi Carter,
> >
> > It's a trade off because I'm not keeping argus primitive data, I'm trying to minimize the fields I need to keep already ;)
> >
> > This is just the starting work of handling 10G with mysql, my plan is to have mysql handling first hand data, then sync them with hadoop/hive to perform big data analysis. Nothing is final, and I would like to hear some thoughts from other argus users in the list as well.
> >
> > By the way Carter, do you have plan to make argus client multi-threaded as well to have better performance in term of analysis, I know this is not trivial but just asking.
> >
> > Cheers!
> >
>
>
>
>
> --
> Best Regards,
>
> CS Lee<geek00L[at]gmail.com>
>
> http://geek00l.blogspot.com
> http://defcraft.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120625/6c866c64/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4367 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20120625/6c866c64/attachment.bin>
More information about the argus
mailing list