Combining seen DNS data with traffic data: Tracking traffic to domains

Carter Bullard carter at qosient.com
Mon Oct 22 16:35:25 EDT 2012


Hey Markku,
The standard argus way of doing what you are describing, is to write a sample ra* client
program that does the DNS parsing ( you can use radump() as a guide, as it has
the code to parse the DNS data from the argus records,  in ./examples/radump/print-domain.c ),
and have that client add the domain names to the other records that it receives.

Once it is working, then we would add that functionality to radium().

Carter 



On Oct 16, 2012, at 3:07 AM, Markku Parviainen <maketsi at gmail.com> wrote:

> Dave: thank you for analysis and for very valid questions. This is not
> that simple to implement for every network as you said. But that's the
> point of this, to raise up discussion.
> 
> So for simplicity, let's assume that argus (= the host running it) can
> see all DNS traffic for the entire network we are watching (e.g. it
> runs on the only gateway that is also the only recursor for the
> network).
> 
> 2012/10/15 Dave Edelman <dedelman at iname.com>:
>> 2 - A response to a request can be one or more A records; one or more  NS
>> records; a CNAME record and some of this stuff comes with glue records and
>> other bits and pieces. Some of these may already be cached
> 
> NS records are not needed for analysis. As long as TTL's are not
> exceeded, CNAMEs are not a problem as we do know what the client
> asked. If it asks for domain.net that is really a cname to foo.net
> that has an A record X, we do know that X is really domain.net for
> this client, because the dns response told so. If the cname target was
> already asked much earlier, and the response didn't include
> glue-records, then we are out of luck.
> 
> Cache could be as simple as a hash table from "saddr-daddr" to
> "fqdn.net". The hash entry can of course collide with virtual hosts
> (using the same daddr), so this can't be fully accurate even with some
> decay time. It's still better than no name at all, and it's still
> better than the PTR name. Why? Consider for example google:
> www.google.com = 173.194.71.147 = lb-in-f147.1e100.net (PTR). It is
> also entirely different thing to reach IP 1.1.1.1 via zxcliybwe.cn
> than via dsl-foobar.dhcp.isp.net (PTR). Which ones are more suspicious
> and/or informative? PTR record still has its uses and therefore you
> could label it separately if needed, allthough separate DNS resolution
> is always slow and noisy. If there is no cache entry for the daddr
> used in the hash table, for any reason, then it's shown without a name
> as it now already is.
> 
> Most problematic part here is shared dst IPs. If a client first uses
> name A and then name B for that same IP, which one should we label for
> the flow? On HTTP the only real name is found from application host
> header, which would require yet another parser for yet another
> protocol. Because these cases are kinda minority on large scale, I
> would not care. Using some kind of sane decay time value, we could
> just use the last name that was seen for that IP. If it goes wrong,
> then it does. It's not layer7 analysis tool, yet. :)
> Because we are not talking about access control here, the
> implementation won't need to be perfect.
> 
> Doing a separate analysis for HTTP host header only would be one
> thing. But that wouldn't detect botnet control channels using
> fast-flux domains. Doing both? With what software? Would argus be it,
> or is it causing too many integrated whistles that don't belong there?
> What could be the alternative?
> 
> 
> 2012/10/16 Carter Bullard <carter at qosient.com>:
>> The idea is to do the reverse lookup for the addresses, take the name
>> that is returned, grab the domain part of the name, and use that as the
>> value for the saddr or daddr when filtering, processing whatever.  This is
>> preferred, as deriving DNS information from packets on the wire can be easily
>> manipulated by an adversary, and you may not be able to grab the contents
>> of packets, as the snaplen may be small.
> 
> Assuming this simple case where there are no other paths to external
> DNS servers, the client still needs to do the dns lookup via us. If it
> purposedly sends faked lookups to cover up the real name used, then I
> would applaud to him. :)
> I don't think people or infected bots do that, or that they would
> trust those faked packets to affect to anything. If we wanted to
> confirm that the DNS responses are valid, we could do that too with
> another argus listening on different interface (pointing to dns
> server), but that could screw any possibilities of using shared cache.
> For simplicity here, let's just first assume that all dns traffic is
> valid. :)
> 
> Snaplen is another story. For a small network this is not an issue as
> some 600 or so will cover up "most" dns responses and will not fill up
> memory (tmpfs) that much. We can toss the data away on log rotation.
> Of course better alternative would be to just monitor the responses
> from the live line without storing any payloads. That might require
> some modifications to code. There are already passive DNS monitors
> that track ALL dns responses from ISP level dns servers without
> storing all network traffic. So the code is available, but not
> integrated.
> 
> 
>> We do have DNS name labeling, and it did support this option, but it
>> appears to be incomplete as well.  So the design allows you to poke the
>> domain name in as a label, but I need to finish it.
>> We have extensive support for label processing, so that they can be filtered,
>> grep'ed, aggregated, etc....  so we can do a lot with the labels, but not much
>> traffic on the mailing list about them, so not much in the way of descriptions
>> etc....
> 
> It would be great to have completely free-form (custom) label fields
> that could be used for aggregation. This would probably be one of
> those rare moments where one could need those. This feature can live
> with an existing domain label too - if it can be filled in with other
> than PTR data.
> 
> Considering the "domain" here, it might actually be good to be see it
> as full name rather than just the top level domain only. If only the
> other is implemented, then the full name would be better (i.e.
> www.google.com instead of google.com).
> 
> Any thoughts? How would You monitor your network traffic to actual
> real domains for usage patterns etc.?
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20121022/1b185838/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4367 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20121022/1b185838/attachment.bin>


More information about the argus mailing list