Wondering why argus/rasplit jammed up
Branch Family
branchbunch at gmail.com
Mon Aug 24 09:57:23 EDT 2015
Sorry, I meant this to go to the whole list.
Kevin
On Tue, Aug 18, 2015 at 8:04 PM, Branch Family <branchbunch at gmail.com>
wrote:
> Thanks for the feedback. Headers and kernel should be in alignment and
> argus was installed via apt rather than build on this system.
>
> According to the below link, it looks likely that this is a bug that pops
> up under heavy disk write conditions (that would be my server...) when the
> file system is zfs. This is the only NSM box I have that uses zfs...
>
> https://github.com/zfsonlinux/zfs/issues/3535
>
> It appears the most promising recommendation is to do this:
>
> echo never >/sys/kernel/mm/transparent_hugepage/enabled
>
> which I plan to do if this ever recurs.
>
> Thanks guys!
> Kevin
>
>
> On Tue, Aug 18, 2015 at 7:40 PM, Mike Slifcak <slifcan at gmail.com> wrote:
>
>> Hi Kevin,
>> I don't know for sure what your hardware is.
>> There may be a hardware specific driver that hasn't kept up with the
>> linux kernel behaviors for semaphores/mutexes/paging.
>>
>> Just thought to ask if you have the linux-headers matching your kernel,
>> and if argus components were built including such headers?
>>
>> Is there an older kernel + matching headers combination to build and run
>> argus components with?
>>
>>
>>
>> On 08/18/2015 06:15 PM, Branch Family wrote:
>>
>>> Hi Carter,
>>>
>>> My argus and rasplit daemons just jammed up in a way I've never seen
>>> before. My raplit
>>> daemon pulls flows directly from argus and fans them out across hourly
>>> files in my file
>>> system. I'm on Ubuntu 12.04 server (Security Onion), using argus and
>>> argus-clients
>>> version 3.0.8.
>>>
>>> The daemons were both still there but all file writes had stopped. I
>>> was able to stop
>>> raplit normally but had to do a kill -kill on argus to stop it. A
>>> restart of the daemons
>>> seems to have everything working fine again.
>>>
>>> I'm wondering how to interpret this. Anything stand out to you?
>>>
>>> First sign of trouble with argus:
>>>
>>> Aug 18 19:29:12 nsm argus[10340]: 18 Aug 15 19:29:12.850433
>>> ArgusWriteOutSocket(0x7efcb8729010) max queue exceeded 100001
>>>
>>> First sign of trouble with rasplit (I suspect preceding the argus event):
>>>
>>> [1203679.394043] ------------[ cut here ]------------
>>> [1203679.400415] kernel BUG at
>>> /build/linux-lts-trusty-RbzkRH/linux-lts-trusty-3.13.0/mm/migrate.c:589!
>>> [1203679.413000] invalid opcode: 0000 [#2] SMP
>>> [1203679.419195] Modules linked in: ipmi_si iptable_mangle xt_mark 8021q
>>> mrp garp stp llc
>>> bonding mpt3sas mpt2sas scsi_transport_sas raid_class mptctl mptbase
>>> dell_rbu xt_hl
>>> ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_l
>>> imit xt_tcpudp xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4 xt_state
>>> ip6table_filter
>>> ip6_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp
>>> nf_nat
>>> nf_conntrack_ftp nf_conntrack iptable_filter ip_tables x_tables
>>> x86_pkg_temp
>>> _thermal intel_powerclamp coretemp kvm_intel kvm ipmi_devintf gpio_ich
>>> crct10dif_pclmul
>>> dcdbas crc32_pclmul ghash_clmulni_intel aesni_intel ablk_helper cryptd
>>> joydev lrw gf128mul
>>> glue_helper aes_x86_64 mei_me mei mac_hid wmi sb_edac edac
>>> _core acpi_power_meter lpc_ich shpchp pf_ring(OX) lp parport binfmt_misc
>>> zfs(POX)
>>> zavl(POX) zcommon(POX) znvpair(POX) spl(OX) zunicode(POX) ses enclosure
>>> hid_generic usbhid
>>> hid tg3 ptp megaraid_sas pps_core [last unloaded: ipmi_si]
>>> [1203679.494521] CPU: 23 PID: 10378 Comm: rasplit Tainted: P D OX
>>> 3.13.0-61-generic #100~precise1-Ubuntu
>>> [1203679.506886] Hardware name: Dell Inc. PowerEdge R720xd/0X3D66, BIOS
>>> 2.2.2 01/16/2014
>>> [1203679.519140] task: ffff8807c1e08000 ti: ffff8800bfe16000 task.ti:
>>> ffff8800bfe16000
>>> [1203679.531347] RIP: 0010:[<ffffffff811b52e9>] [<ffffffff811b52e9>]
>>> migrate_page+0x49/0x50
>>> [1203679.543661] RSP: 0018:ffff8800bfe17688 EFLAGS: 00010202
>>> [1203679.549751] RAX: 06ffff0000002009 RBX: ffffea0021369ac0 RCX:
>>> 0000000000000001
>>> [1203679.561717] RDX: ffffea0021369ac0 RSI: ffffea00278616c0 RDI:
>>> ffff8809def3add0
>>> [1203679.573663] RBP: ffff8800bfe176a8 R08: 0000000000000001 R09:
>>> 0000000000016588
>>> [1203679.585765] R10: ffff88102fff9f00 R11: 0000000000000075 R12:
>>> ffffea00278616c0
>>> [1203679.597968] R13: ffff8809def3add0 R14: 0000000000000001 R15:
>>> 0000000000000000
>>> [1203679.610318] FS: 00007f0c4a5c3700(0000) GS:ffff88100f360000(0000)
>>> knlGS:0000000000000000
>>> [1203679.622667] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [1203679.628836] CR2: 000000001560bc60 CR3: 00000000be4e1000 CR4:
>>> 00000000001407e0
>>> [1203679.640922] Stack:
>>> [1203679.646771] ffff8809def3ae08 ffff8809def3add0 ffffea00278616c0
>>> ffffea0021369ac0
>>> [1203679.658509] ffff8800bfe176f8 ffffffff811b542d ffffea00278616c0
>>> ffffea0000000001
>>> [1203679.670219] 0000000000000000 ffffea0021369ac0 ffffea00278616c0
>>> 00000000fffffff5
>>> [1203679.681968] Call Trace:
>>> [1203679.687691] [<ffffffff811b542d>] move_to_new_page+0x13d/0x180
>>> [1203679.693463] [<ffffffff811b5881>] __unmap_and_move+0x251/0x2c0
>>> [1203679.699105] [<ffffffff811b5966>] unmap_and_move+0x76/0x180
>>> [1203679.704635] [<ffffffff811b5c7d>] migrate_pages+0xdd/0x210
>>> [1203679.710052] [<ffffffff8117d0a0>] ? isolate_freepages+0x220/0x220
>>> [1203679.715392] [<ffffffff8117df2c>] compact_zone+0x18c/0x340
>>> [1203679.720666] [<ffffffff810d30ac>] ? ktime_get_ts+0x4c/0xe0
>>> [1203679.725871] [<ffffffff8117e3a4>] compact_zone_order+0x94/0xd0
>>> [1203679.730986] [<ffffffff810e242e>] ?
>>> smp_call_function_many+0x26e/0x2c0
>>> [1203679.736051] [<ffffffff8117e4e9>] try_to_compact_pages+0x109/0x190
>>> [1203679.741026] [<ffffffff81750a02>]
>>> __alloc_pages_direct_compact+0xc3/0x1bf
>>> [1203679.745945] [<ffffffff81162a95>] __alloc_pages_nodemask+0x9a5/0xbb0
>>> [1203679.764964] [<ffffffff811a3fb2>] alloc_pages_current+0xb2/0x170
>>> [1203679.769706] [<ffffffff811ad13d>] allocate_slab+0x13d/0x1a0
>>> [1203679.774314] [<ffffffff811ad1d0>] new_slab+0x30/0x1d0
>>> [1203679.778789] [<ffffffff81752aa8>] __slab_alloc+0x18a/0x2c2
>>> [1203679.783215] [<ffffffff811db030>] ? getname_flags.part.25+0x30/0x140
>>> [1203679.787634] [<ffffffff8163e0e0>] ? release_sock+0x80/0x90
>>> [1203679.791924] [<ffffffff8169aff0>] ? tcp_recvmsg+0x5c0/0xb50
>>> [1203679.796229] [<ffffffff811b0ac3>] kmem_cache_alloc+0x1d3/0x1f0
>>> [1203679.800467] [<ffffffff811db030>] ? getname_flags.part.25+0x30/0x140
>>> [1203679.804638] [<ffffffff811db030>] getname_flags.part.25+0x30/0x140
>>> [1203679.808703] [<ffffffff811db1a6>] getname_flags+0x66/0x80
>>> [1203679.812612] [<ffffffff811dbd05>] user_path_at_empty+0x35/0xa0
>>> [1203679.816425] [<ffffffff81638b5c>] ? sock_aio_read.part.9+0x3c/0x40
>>> [1203679.820158] [<ffffffff811dbd81>] user_path_at+0x11/0x20
>>> [1203679.823912] [<ffffffff811d05c1>] vfs_fstatat+0x51/0xb0
>>> [1203679.827594] [<ffffffff811d06eb>] vfs_stat+0x1b/0x20
>>> [1203679.831260] [<ffffffff811d0705>] SYSC_newstat+0x15/0x30
>>> [1203679.834927] [<ffffffff811cbac8>] ? vfs_read+0x108/0x180
>>> [1203679.838585] [<ffffffff811cbd10>] ? SyS_read+0x70/0xa0
>>> [1203679.842176] [<ffffffff811d085e>] SyS_newstat+0xe/0x10
>>> [1203679.845744] [<ffffffff8177021d>] system_call_fastpath+0x1a/0x1f
>>> [1203679.849348] Code: 20 75 28 45 31 c9 31 c9 e8 95 f8 ff ff 85 c0 75
>>> 11 48 89 de 4c 89
>>> e7 89 45 e8 e8 63 fd ff ff 8b 45 e8 48 83 c4 10 5b 41 5c 5d c3 <0f> 0b
>>> 0f 1f 44 00 00 0f
>>> 1f 44 00 00 55 48 89 e5 48 83 ec 40 48
>>> [1203679.860842] RIP [<ffffffff811b52e9>] migrate_page+0x49/0x50
>>> [1203679.864668] RSP <ffff8800bfe17688>
>>> [1203679.875097] ---[ end trace 76af004efa71f975 ]---
>>>
>>> Thanks,
>>> Kevin
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20150824/baa23014/attachment.html>
More information about the argus
mailing list