Wondering why argus/rasplit jammed up

Carter Bullard carter at qosient.com
Tue Aug 18 19:38:50 EDT 2015


Hey Kevin,
So the queue length error is saying that rasplit stopped processing, and argus decided to stop sending.
The inability to stop argus maybe fixed in argus-3.0.8.2, which is out as a release candidate, but a few other conditions of a stuck argus have come up, so hold off on shifting for a few days.

The issue with rasplit() being tainted by a kernel bug in migrate.c is a new one on me.
With pf_ring linked into the buggy routine, it could be that a bug in pf_ring could affect both rasplit and argus.

Not sure that we can do anything about that, …, what does Ubuntu say about the kernel bug ???
If a module is "linked in”, does that suggest that the bug could affect these modules or that these modules can affect the bug ???

Carter

> On Aug 18, 2015, at 6:15 PM, Branch Family <branchbunch at gmail.com> wrote:
> 
> Hi Carter,
> 
> My argus and rasplit daemons just jammed up in a way I've never seen before.  My raplit daemon pulls flows directly from argus and fans them out across hourly files in my file system.  I'm on Ubuntu 12.04 server (Security Onion), using argus and argus-clients version 3.0.8.
> 
> The daemons were both still there but all file writes had stopped.  I was able to stop raplit normally but had to do a kill -kill on argus to stop it.  A restart of the daemons seems to have everything working fine again.
> 
> I'm wondering how to interpret this.  Anything stand out to you?
> 
> First sign of trouble with argus:
> 
> Aug 18 19:29:12 nsm argus[10340]: 18 Aug 15 19:29:12.850433 ArgusWriteOutSocket(0x7efcb8729010) max queue exceeded 100001
> 
> First sign of trouble with rasplit (I suspect preceding the argus event):
> 
> [1203679.394043] ------------[ cut here ]------------
> [1203679.400415] kernel BUG at /build/linux-lts-trusty-RbzkRH/linux-lts-trusty-3.13.0/mm/migrate.c:589!
> [1203679.413000] invalid opcode: 0000 [#2] SMP
> [1203679.419195] Modules linked in: ipmi_si iptable_mangle xt_mark 8021q mrp garp stp llc bonding mpt3sas mpt2sas scsi_transport_sas raid_class mptctl mptbase dell_rbu xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_l
> imit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ip6table_filter ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables x86_pkg_temp
> _thermal intel_powerclamp coretemp kvm_intel kvm ipmi_devintf gpio_ich crct10dif_pclmul dcdbas crc32_pclmul ghash_clmulni_intel aesni_intel ablk_helper cryptd joydev lrw gf128mul glue_helper aes_x86_64 mei_me mei mac_hid wmi sb_edac edac
> _core acpi_power_meter lpc_ich shpchp pf_ring(OX) lp parport binfmt_misc zfs(POX) zavl(POX) zcommon(POX) znvpair(POX) spl(OX) zunicode(POX) ses enclosure hid_generic usbhid hid tg3 ptp megaraid_sas pps_core [last unloaded: ipmi_si]
> [1203679.494521] CPU: 23 PID: 10378 Comm: rasplit Tainted: P      D    OX 3.13.0-61-generic #100~precise1-Ubuntu
> [1203679.506886] Hardware name: Dell Inc. PowerEdge R720xd/0X3D66, BIOS 2.2.2 01/16/2014
> [1203679.519140] task: ffff8807c1e08000 ti: ffff8800bfe16000 task.ti: ffff8800bfe16000
> [1203679.531347] RIP: 0010:[<ffffffff811b52e9>]  [<ffffffff811b52e9>] migrate_page+0x49/0x50
> [1203679.543661] RSP: 0018:ffff8800bfe17688  EFLAGS: 00010202
> [1203679.549751] RAX: 06ffff0000002009 RBX: ffffea0021369ac0 RCX: 0000000000000001
> [1203679.561717] RDX: ffffea0021369ac0 RSI: ffffea00278616c0 RDI: ffff8809def3add0
> [1203679.573663] RBP: ffff8800bfe176a8 R08: 0000000000000001 R09: 0000000000016588
> [1203679.585765] R10: ffff88102fff9f00 R11: 0000000000000075 R12: ffffea00278616c0
> [1203679.597968] R13: ffff8809def3add0 R14: 0000000000000001 R15: 0000000000000000
> [1203679.610318] FS:  00007f0c4a5c3700(0000) GS:ffff88100f360000(0000) knlGS:0000000000000000
> [1203679.622667] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [1203679.628836] CR2: 000000001560bc60 CR3: 00000000be4e1000 CR4: 00000000001407e0
> [1203679.640922] Stack:
> [1203679.646771]  ffff8809def3ae08 ffff8809def3add0 ffffea00278616c0 ffffea0021369ac0
> [1203679.658509]  ffff8800bfe176f8 ffffffff811b542d ffffea00278616c0 ffffea0000000001
> [1203679.670219]  0000000000000000 ffffea0021369ac0 ffffea00278616c0 00000000fffffff5
> [1203679.681968] Call Trace:
> [1203679.687691]  [<ffffffff811b542d>] move_to_new_page+0x13d/0x180
> [1203679.693463]  [<ffffffff811b5881>] __unmap_and_move+0x251/0x2c0
> [1203679.699105]  [<ffffffff811b5966>] unmap_and_move+0x76/0x180
> [1203679.704635]  [<ffffffff811b5c7d>] migrate_pages+0xdd/0x210
> [1203679.710052]  [<ffffffff8117d0a0>] ? isolate_freepages+0x220/0x220
> [1203679.715392]  [<ffffffff8117df2c>] compact_zone+0x18c/0x340
> [1203679.720666]  [<ffffffff810d30ac>] ? ktime_get_ts+0x4c/0xe0
> [1203679.725871]  [<ffffffff8117e3a4>] compact_zone_order+0x94/0xd0
> [1203679.730986]  [<ffffffff810e242e>] ? smp_call_function_many+0x26e/0x2c0
> [1203679.736051]  [<ffffffff8117e4e9>] try_to_compact_pages+0x109/0x190
> [1203679.741026]  [<ffffffff81750a02>] __alloc_pages_direct_compact+0xc3/0x1bf
> [1203679.745945]  [<ffffffff81162a95>] __alloc_pages_nodemask+0x9a5/0xbb0
> [1203679.764964]  [<ffffffff811a3fb2>] alloc_pages_current+0xb2/0x170
> [1203679.769706]  [<ffffffff811ad13d>] allocate_slab+0x13d/0x1a0
> [1203679.774314]  [<ffffffff811ad1d0>] new_slab+0x30/0x1d0
> [1203679.778789]  [<ffffffff81752aa8>] __slab_alloc+0x18a/0x2c2
> [1203679.783215]  [<ffffffff811db030>] ? getname_flags.part.25+0x30/0x140
> [1203679.787634]  [<ffffffff8163e0e0>] ? release_sock+0x80/0x90
> [1203679.791924]  [<ffffffff8169aff0>] ? tcp_recvmsg+0x5c0/0xb50
> [1203679.796229]  [<ffffffff811b0ac3>] kmem_cache_alloc+0x1d3/0x1f0
> [1203679.800467]  [<ffffffff811db030>] ? getname_flags.part.25+0x30/0x140
> [1203679.804638]  [<ffffffff811db030>] getname_flags.part.25+0x30/0x140
> [1203679.808703]  [<ffffffff811db1a6>] getname_flags+0x66/0x80
> [1203679.812612]  [<ffffffff811dbd05>] user_path_at_empty+0x35/0xa0
> [1203679.816425]  [<ffffffff81638b5c>] ? sock_aio_read.part.9+0x3c/0x40
> [1203679.820158]  [<ffffffff811dbd81>] user_path_at+0x11/0x20
> [1203679.823912]  [<ffffffff811d05c1>] vfs_fstatat+0x51/0xb0
> [1203679.827594]  [<ffffffff811d06eb>] vfs_stat+0x1b/0x20
> [1203679.831260]  [<ffffffff811d0705>] SYSC_newstat+0x15/0x30
> [1203679.834927]  [<ffffffff811cbac8>] ? vfs_read+0x108/0x180
> [1203679.838585]  [<ffffffff811cbd10>] ? SyS_read+0x70/0xa0
> [1203679.842176]  [<ffffffff811d085e>] SyS_newstat+0xe/0x10
> [1203679.845744]  [<ffffffff8177021d>] system_call_fastpath+0x1a/0x1f
> [1203679.849348] Code: 20 75 28 45 31 c9 31 c9 e8 95 f8 ff ff 85 c0 75 11 48 89 de 4c 89 e7 89 45 e8 e8 63 fd ff ff 8b 45 e8 48 83 c4 10 5b 41 5c 5d c3 <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 40 48
> [1203679.860842] RIP  [<ffffffff811b52e9>] migrate_page+0x49/0x50
> [1203679.864668]  RSP <ffff8800bfe17688>
> [1203679.875097] ---[ end trace 76af004efa71f975 ]---
> 
> Thanks,
> Kevin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20150818/5b5e0fd8/attachment.html>


More information about the argus mailing list