argus crash problem

Lei Wei lwei at cs.unc.edu
Thu Jan 31 19:47:28 EST 2008


Hey Carter,

I ran argus again without the -D option and it finally core dumped. The 
argus.core is about 350mb. Any comments on what we could do about this?

Thanks.
Lei


Quoting Carter Bullard <carter at qosient.com>:

> Hey Lei,
> You should print both spkts and dpkts, so you can see if you're  
> getting full-duplex
> flow monitoring.  Running with debug information is probably not going  to
> shed any light on the problem you ran into.   Run without the option  to see
> how argus performs on your system.  Depending on the type of dag card,  you
> could saturate the BUS around 2Gbps.  If you have the the PCI express  cards,
> you still may not get  10Gbps, depending on the motherboard you are  using.
>
> Just run without the "-D" option for a while to see if you get the 
> bug  again.
> Also try modifying the flow model using the /etc/argus.conf file to  see if
> you can get the pcu utilization down.
>
> Carter
>
>
> On Jan 30, 2008, at 5:25 PM, Lei Wei wrote:
>
>> You are right. I took a look at the flow. I could see some full tcp  
>> connections,i.e.from REQ,CON to FIN. But it seems that many  
>> connections have unknown directions like below. Is that normal?
>>
>> 15:58:19.299491  e            tcp     208.107.16.125.2740     <? >   
>>   152.23.105.115.6881          2        121   CON
>>  15:58:19.299491  e         udp     128.109.230.21.10002     - >     
>>   233.4.200.18.10002         2        136   INT
>>  15:58:19.299511  e         udp     155.101.127.30.10002     - >     
>>   233.4.200.18.10002         3        204   INT
>>  15:58:19.299539  e s       tcp     207.251.97.202.54174    <? >     
>>    152.2.2.149.smtp         26      22585   CON
>>  15:58:19.299551  e         udp     90.196.179.159.5331     <->      
>> 152.23.124.201.9965          7       4216   CON
>>  15:58:19.299616  e         udp        152.2.122.3.1179      ->      
>>  118.170.8.216.64371         9        563   INT
>>  15:58:19.299618  e         tcp      81.52.130.186.http     <?>      
>> 152.23.110.254.2068         10       9275   CON
>>  15:58:19.299634  e         tcp     152.23.211.119.52335    <?>      
>>  59.134.50.151.3830          3       1412   CON
>>  15:58:19.299663  e s       tcp     76.182.114.156.2093     <? >     
>>   152.2.210.22.rdp           6        558   CON
>>  15:58:19.299676  e         tcp      66.249.93.109.imaps    <? >     
>>   152.23.88.57.1055          3        263   CON
>>  15:58:19.299677  e s       tcp        74.125.1.81.http     <? >     
>>   152.2.176.13.3282         63      64848   CON
>>  15:58:19.299690  e         tcp       152.2.62.113.here-l   <? >     
>>   204.10.66.76.http         12       7402   CON
>>  15:58:19.299690  e         tcp      90.198.223.64.2684      ?>      
>>  152.23.79.176.16328         2       3028   CON
>>  15:58:19.299739  e         tcp     63.146.183.217.http     <?>      
>> 204.85.193.203.16244        43      37784   CON
>>  15:58:19.299778  e         tcp      64.233.179.91.http      ?>      
>>  152.23.93.163.4708          2       2968   CON
>>  15:58:19.299804  e         tcp         152.2.1.95.smtp     <? >     
>>    62.43.82.68.50100         6       2096   CON
>>  15:58:19.299823  e         tcp      81.52.130.152.http      ?>      
>>  152.23.69.105.1232          1         60   CON
>>  15:58:19.299862  e         tcp     195.241.129.86.14816    <? >     
>>    152.2.32.11.53759        67      25519   CON
>>  15:58:19.299891  e         tcp     81.154.227.107.51157    <?>      
>> 152.23.126.198.12215         3       1834   CON
>>  15:58:19.299892  e d       tcp     208.253.91.250.40767    <? >     
>>    152.2.63.68.rtsp         17       7464   CON
>>  15:58:19.299893  e         tcp      84.67.187.146.52319    <?>      
>>  152.23.63.139.2077          5       3206   CON
>>  15:58:19.299893  e         tcp    124.121.160.139.16800    <?>      
>> 152.23.204.249.4173          2       1560   CON
>>  15:58:19.299928  e         udp      152.2.255.242.10000    <- >     
>>   76.182.85.17.10000       114      94428   CON
>>  15:58:19.299938  e         tcp       152.23.69.88.3038     <?>      
>>  206.33.52.123.http         68      68056   CON
>>
>>
>>
>>
>> Quoting Carter Bullard <carter at qosient.com>:
>>
>>> One important question is whether the flows look reasonable.
>>> Full tcp connections, bidirectional etc...
>>> Carter
>>>
>>> On Jan 30, 2008, at 3:57 PM, Lei Wei wrote:
>>>
>>>> Argus didn't stop since the file is still growing and the memory   
>>>> usage showing in top is about 493mb. Since you mentioned that  
>>>> argus  slows down when it's in debug mode, does it mean that it  
>>>> might drop  some packets when running?
>>>>
>>>> It's a little bit wierd because when I asked it to output the  
>>>> debug  info to the stdout, it outputs many just like you did.  But 
>>>>  when I  tried to redirect it to write into a file using "argus -i 
>>>>  dag0 -w  data.out -D 2 > error 2>&1 ", it only outputs several  
>>>> lines into the  file:
>>>>
>>>> argus[574]: 28 Jan 08 01:14:13.680343 ArgusNewModeler() returning  
>>>>  0x816f000
>>>> argus[574]: 28 Jan 08 01:14:13.686915 ArgusNewSource() returning   
>>>> 0x81b2000
>>>> argus[574]: 28 Jan 08 01:14:13.686963 ArgusNewOutput() returning   
>>>> retn 0x81b1100
>>>> argus[574]: 28 Jan 08 01:14:13.708239 setArgusID(0x816f000,   
>>>> 0x9802899f) done
>>>> argus[574]: 28 Jan 08 01:14:13.708311 setArgusPortNum(561) returning
>>>> argus[574]: 28 Jan 08 01:14:13.708512 setArgusInterfaceStatus(1)
>>>>
>>>> I guess maybe I should run it again and just let it output the  
>>>> debug  info on the screen and see what it'll happen.
>>>>
>>>> Lei
>>>>
>>>>
>>>> Quoting Peter Van Epp <vanepp at sfu.ca>:
>>>>
>>>>> On Wed, Jan 30, 2008 at 01:45:17PM -0500, Lei Wei wrote:
>>>>>> Hello Peter and Carter,
>>>>>>
>>>>>> I've been trying to repeat the crash problem again recently. I ran
>>>>>> argus with the debug option enabled from Sunday night until now.  It
>>>>>> never crashed. The only debug information it outputed was  
>>>>>> several  lines
>>>>>> when argus started to run and nothing more. The CPU usage right   now is
>>>>>> till about 99% but the output data file grows well and now is  about
>>>>>> 144G already. I'm not sure why I can't repeat the crash again  
>>>>>> and  there
>>>>>> shouldn't be too much difference in the amount of traffic  between  this
>>>>>> and last week. By the way, the link I'm monitoring is the  
>>>>>> campus  border
>>>>>> link to the ISP and the traffic load is more than 1G per  second.  I
>>>>>> wonder if there's any difference running in debug mode that could
>>>>>> possibly play a role in it...  Anyway, I'll keep monitoring it  and  see
>>>>>> what happens.
>>>>>>
>>>>>> Lei
>>>>>>
>>>>>
>>>>> 	Hmmm, thats too bad. Debug mode does slow argus down a bit, its
>>>>> possible that its slowing it down enough that the problem isn't   
>>>>> happening.
>>>>> What does memory usage look like? I'd be suspicious that what is  
>>>>>  happening
>>>>> is that argus is running out of memory and stopping (although  
>>>>> that  should
>>>>> show up in syslog). That said I just noticed I'm not getting any   syslog
>>>>> messages from argus (and I think I should be). I just restarted  
>>>>> one  of my
>>>>> test ones with -D2:
>>>>>
>>>>> # argus -J -P 560 -i eth2 -i eth3 -m -U700 -D2  -- host   142.58.101.100
>>>>> argus[4637]: 30 Jan 08 12:32:05.555408 ArgusNewModeler()  
>>>>> returning  0x10181010
>>>>> argus[4637]: 30 Jan 08 12:32:05.555871 ArgusNewSource()  
>>>>> returning  0x10201f70
>>>>> argus[4637]: 30 Jan 08 12:32:05.555903 ArgusNewOutput()  
>>>>> returning  retn 0x102016b0
>>>>> argus[4637]: 30 Jan 08 12:32:05.555985 setArgusPortNum(560)  returning
>>>>> argus[4637]: 30 Jan 08 12:32:05.556023 setArgusInterfaceStatus(1)
>>>>> argus[4637]: 30 Jan 08 12:32:05.585055 ArgusInitSource()   
>>>>> pcap_open_live(eth3) returned 0x10252d60
>>>>> argus[4637]: 30 Jan 08 12:32:05.585119   
>>>>> ArgusOpenInterface(0x10201f70, 'eth3') returning
>>>>> argus[4637]: 30 Jan 08 12:32:05.615008 ArgusInitSource()   
>>>>> pcap_open_live(eth2) returned 0x10253370
>>>>> argus[4637]: 30 Jan 08 12:32:05.615071   
>>>>> ArgusOpenInterface(0x10201f70, 'eth2') returning
>>>>> argus[4637]: 30 Jan 08 12:32:05.615119 ArgusCopyArgv(0xff8811f8)  
>>>>>  returning 0x10253980
>>>>> argus[4637]: 30 Jan 08 12:32:05.615298 ArgusInitSource() returning
>>>>> argus[4637]: 30 Jan 08 12:32:05.615357 ArgusEstablishListen(560,  
>>>>>  (null), 0xff880a18)
>>>>> argus[4637]: 30 Jan 08 12:32:05.615526 ArgusEstablishListen(560,  
>>>>>  0xff880a18) binding: any:560
>>>>> argus[4637]: 30 Jan 08 12:32:05.615607 ArgusEstablishListen(560,  
>>>>>  0xff880a18) returning 5
>>>>> argus[4637]: 30 Jan 08 12:32:05.615633 ArgusInitOutput() done
>>>>> ArgusWarning: argus[4637]: 30 Jan 08 12:32:05.615670 started
>>>>> argus[4637]: 30 Jan 08 12:32:05.615711 ArgusInitModeler() done
>>>>> argus[4637]: 30 Jan 08 12:32:05.615974 setArgusInterfaceStatus(1)
>>>>> ArgusWarning: argus[4637]: 30 Jan 08 12:32:05.616010   
>>>>> ArgusGetInterfaceStatus: interface eth3 is up
>>>>> argus[4637]: 30 Jan 08 12:32:05.616046 setArgusInterfaceStatus(1)
>>>>> ArgusWarning: argus[4637]: 30 Jan 08 12:32:05.616102   
>>>>> ArgusGetInterfaceStatus: interface eth2 is up
>>>>> argus[4637]: 30 Jan 08 12:32:05.616133 setArgusInterfaceStatus(1)
>>>>> argus[4637]: 30 Jan 08 12:32:05.616159 setArgusInterfaceStatus(1)
>>>>> argus[4637]: 30 Jan 08 12:32:59.824738 ArgusScheduleShutDown(2)
>>>>>
>>>>> argus[4637]: 30 Jan 08 12:32:59.824838 ArgusShutDown(Normal  Shutdown)
>>>>>
>>>>> argus[4637]: 30 Jan 08 12:32:59.824856  
>>>>> ArgusCloseSource(0x10201f70)  starting
>>>>> argus[4637]: 30 Jan 08 12:32:59.824874  
>>>>> ArgusCloseSource(0x10201f70)  deleting source
>>>>> argus[4637]: 30 Jan 08 12:32:59.825600   
>>>>> ArgusCloseModeler(0x10181010) pushing close record 0x1028f2b0
>>>>> argus[4637]: 30 Jan 08 12:32:59.825625  ArgusCloseModeler(0x10181010)
>>>>> argus[4637]: 30 Jan 08 12:32:59.825642 ArgusCloseOutput()   
>>>>> scheduling closure after writing records
>>>>> argus[4637]: 30 Jan 08 12:32:59.825698 ArgusOutputProcess()   
>>>>> received stop record 0 records on the list
>>>>> argus[4637]: 30 Jan 08 12:32:59.825716  
>>>>> ArgusCloseOutput(0x102016b0)  done
>>>>> argus[4637]: 30 Jan 08 12:32:59.825759 ArgusShutDown()
>>>>>
>>>>> # tail /var/log/messages
>>>>> ...
>>>>> Jan 30 12:32:05 sniffer1 kernel: RING: succesfully allocated 0  
>>>>> KB  [tot_mem=12664896][order=12]
>>>>> Jan 30 12:32:05 sniffer1 kernel: RING: allocated 10851 slots   
>>>>> [slot_len=1546][tot_mem=16777216]
>>>>> Jan 30 12:32:05 sniffer1 kernel: RING: succesfully allocated 0  
>>>>> KB  [tot_mem=12664896][order=12]
>>>>> Jan 30 12:32:05 sniffer1 kernel: RING: allocated 10851 slots   
>>>>> [slot_len=1546][tot_mem=16777216]
>>>>>
>>>>> 	and all that shows in syslog is the ring buffer starting as the
>>>>> interfaces come up. I think argus used to syslog the interfaces   
>>>>> opening to
>>>>> syslog which may mean there is a bug in syslogging.
>>>>>
>>>>> Peter Van Epp / Operations and Technical Support
>>>>> Simon Fraser University, Burnaby, B.C. Canada
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>





More information about the argus mailing list