rasqlinsert disconnect issues
Carter Bullard
carter at qosient.com
Mon Oct 6 16:31:41 EDT 2014
Looks like every thread is in the same routine, trying to deal with a queue ... hopefully not the same queue, but possible, and that would be a bug . Sorry if this is inconvenient, but could print out the queue in each thread. Looking at this now.
Carter
> On Oct 6, 2014, at 3:08 PM, David Edelman <dedelman at iname.com> wrote:
>
> (gdb) where
> #0 0x08052827 in RaClientSortQueue (sorter=0x977e8b8, queue=0x977e690, type=2) at ./raclient.c:2523
> #1 0x08059b62 in ArgusDrawWindow (ws=0x98502e0) at ./rasqlinsert.c:3006
> #2 0x080530c1 in ArgusOutputProcess (arg=0x0) at ./rasqlinsert.c:458
> #3 0x0066c51f in start_thread () from /lib/libpthread.so.0
> #4 0x005a204e in clone () from /lib/libc.so.6
>
> (gdb) info threads
> 8 Thread 0xb3f05b90 (LWP 21302) 0x00d73424 in __kernel_vsyscall ()
> 7 Thread 0xb4906b90 (LWP 21301) 0x0052f315 in free () from /lib/libc.so.6
> 6 Thread 0xb5307b90 (LWP 21300) 0x08052827 in RaClientSortQueue (sorter=0x977e8b8, queue=0x977e690, type=2) at ./raclient.c:2523
> 5 Thread 0xb5d08b90 (LWP 21299) 0x00d73424 in __kernel_vsyscall ()
> 4 Thread 0xb6709b90 (LWP 21298) 0x00d73424 in __kernel_vsyscall ()
> 3 Thread 0xb720bb90 (LWP 21297) 0x00d73424 in __kernel_vsyscall ()
> * 1 Thread 0xb7fe68d0 (LWP 21287) 0x00d73424 in __kernel_vsyscall ()
>
> (gdb) thread 8
> [Switching to thread 8 (Thread 0xb3f05b90 (LWP 21302))]#0 0x00d73424 in __kernel_vsyscall ()
> (gdb) list
> 2523 qhdr = qhdr->nxt;
> 2524 }
> 2525
> 2526 queue->array[i] = NULL;
> 2527
> 2528 if (!(type & ARGUS_NOSORT)) {
> 2529 qsort ((char *) queue->array, x, sizeof (struct ArgusQueueHeader *), ArgusSortRoutine);
> 2530
> 2531 for (i = 0; i < x; i++) {
> 2532 struct ArgusRecordStruct *ns = (struct ArgusRecordStruct *) queue->array[i];
>
> (gdb) thread 7
> [Switching to thread 7 (Thread 0xb4906b90 (LWP 21301))]#0 0x0052f315 in free () from /lib/libc.so.6
> (gdb) list
> 2543
> 2544 RaSortItems = x;
> 2545 bzero (&ArgusParser->ArgusStartTimeVal, sizeof(ArgusParser->ArgusStartTimeVal));
> 2546
> 2547 #if defined(ARGUS_THREADS)
> 2548 if (type & ARGUS_LOCK)
> 2549 pthread_mutex_unlock(&queue->lock);
> 2550 #endif
> 2551
> 2552 #ifdef ARGUSDEBUG
>
> (gdb) thread 6
> [Switching to thread 6 (Thread 0xb5307b90 (LWP 21300))]#0 0x08052827 in RaClientSortQueue (sorter=0x977e8b8, queue=0x977e690, type=2) at ./raclient.c:2523
> 2523 qhdr = qhdr->nxt;
> (gdb) list
> 2518 keep = 0;
> 2519 }
> 2520
> 2521 if (keep)
> 2522 queue->array[x++] = qhdr;
> 2523 qhdr = qhdr->nxt;
> 2524 }
> 2525
> 2526 queue->array[i] = NULL;
> 2527
>
> (gdb) thread 5
> [Switching to thread 5 (Thread 0xb5d08b90 (LWP 21299))]#0 0x00d73424 in __kernel_vsyscall ()
> (gdb) list
> 2528 if (!(type & ARGUS_NOSORT)) {
> 2529 qsort ((char *) queue->array, x, sizeof (struct ArgusQueueHeader *), ArgusSortRoutine);
> 2530
> 2531 for (i = 0; i < x; i++) {
> 2532 struct ArgusRecordStruct *ns = (struct ArgusRecordStruct *) queue->array[i];
> 2533 if (ns->rank != (i + 1)) {
> 2534 ns->rank = i + 1;
> 2535 ns->status |= ARGUS_RECORD_MODIFIED;
> 2536 }
> 2537 }
> (gdb)
>
>
> (gdb) thread 4
> [Switching to thread 4 (Thread 0xb6709b90 (LWP 21298))]#0 0x00d73424 in __kernel_vsyscall ()
> (gdb) list
> 2528 if (!(type & ARGUS_NOSORT)) {
> 2529 qsort ((char *) queue->array, x, sizeof (struct ArgusQueueHeader *), ArgusSortRoutine);
> 2530
> 2531 for (i = 0; i < x; i++) {
> 2532 struct ArgusRecordStruct *ns = (struct ArgusRecordStruct *) queue->array[i];
> 2533 if (ns->rank != (i + 1)) {
> 2534 ns->rank = i + 1;
> 2535 ns->status |= ARGUS_RECORD_MODIFIED;
> 2536 }
> 2537 }
>
>
> (gdb) thread 3
> [Switching to thread 3 (Thread 0xb720bb90 (LWP 21297))]#0 0x00d73424 in __kernel_vsyscall ()
> (gdb) list
> 2538 }
> 2539
> 2540 } else
> 2541 ArgusLog (LOG_ERR, "RaClientSortQueue: ArgusMalloc(%d) %s\n", sizeof(struct ArgusRecord *), cnt, strerror(errno));
> 2542 }
> 2543
> 2544 RaSortItems = x;
> 2545 bzero (&ArgusParser->ArgusStartTimeVal, sizeof(ArgusParser->ArgusStartTimeVal));
> 2546
> 2547 #if defined(ARGUS_THREADS)
>
>
>
> (gdb) thread 1
>
> (gdb) list
> 2548 if (type & ARGUS_LOCK)
> 2549 pthread_mutex_unlock(&queue->lock);
> 2550 #endif
> 2551
> 2552 #ifdef ARGUSDEBUG
> 2553 ArgusDebug (5, "RaClientSortQueue(0x%x, 0x%x, %d) returned\n", sorter, queue, type);
> 2554 #endif
> 2555 }
> 2556
> (gdb)
>
>
> From: Carter Bullard [mailto:carter at qosient.com]
> Sent: Monday, October 06, 2014 12:02 PM
> To: David Edelman
> Cc: Argus
> Subject: Re: [ARGUS] rasqlinsert disconnect issues
>
> so thread 21287 complains about the mysql error, and thread 21300 is trying to sort the queue. I suspect that 21287 is exiting or has exited, and 21300 is unaware that the sql 'backend' has raised the done flag.
>
> can you type 'where' so we can see which thread is dumping ??
>
> On Oct 4, 2014, at 8:27 PM, David Edelman <dedelman at iname.com> wrote:
>
> GDB to the rescue, I have the session hanging out in a screens instance if you need anything specific, just let me know.
>
> —Dave
>
> (gdb) run -M time 1d -M cache -S localhost:561 -w mysql://argus@localhost/argus/test2macAddrs_%Y_%m_%d -m srcid saddr smac -s stime ltime srcid saddr smac -M rmon - ipv4
> Starting program: /layered_products/argus-clients-3.0.8/bin/rasqlinsert -M time 1d -M cache -S localhost:561 -w mysql://argus@localhost/argus/test2macAddrs_%Y_%m_%d -m srcid saddr smac -s stime ltime srcid saddr smac -M rmon - ipv4
> [Thread debugging using libthread_db enabled]
> [New Thread 0xb7fe68d0 (LWP 21287)]
> Detaching after fork from child process 21288.
> [New Thread 0xb720bb90 (LWP 21289)]
> [Thread 0xb720bb90 (LWP 21289) exited]
> [New Thread 0xb720bb90 (LWP 21297)]
> [New Thread 0xb6709b90 (LWP 21298)]
> [New Thread 0xb5d08b90 (LWP 21299)]
> [New Thread 0xb5307b90 (LWP 21300)]
> [New Thread 0xb4906b90 (LWP 21301)]
> [New Thread 0xb3f05b90 (LWP 21302)]
> [New Thread 0xb3504b90 (LWP 21303)]
> [Thread 0xb3504b90 (LWP 21303) exited]
> rasqlinsert[21287]: Sun 2014-10-05 00:00:14.081 mysql_real_query error Query was empty
> rasqlinsert[21287]: Sun 2014-10-05 00:00:50.374 mysql_real_query error Query was empty
> rasqlinsert[21287]: Sun 2014-10-05 00:00:53.109 mysql_real_query error Query was empty
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xb5307b90 (LWP 21300)]
> 0x08052827 in RaClientSortQueue (sorter=0x977e8b8, queue=0x977e690, type=2) at ./raclient.c:2523
> 2523 qhdr = qhdr->nxt;
> Missing separate debuginfos, use: debuginfo-install libgcc-4.3.2-7.i386
> (gdb) list
> 2518 keep = 0;
> 2519 }
> 2520
> 2521 if (keep)
> 2522 queue->array[x++] = qhdr;
> 2523 qhdr = qhdr->nxt;
> 2524 }
> 2525
> 2526 queue->array[i] = NULL;
> 2527
> (gdb)
>
>
>
> From: David Edelman <dedelman at iname.com>
> Date: Thursday, October 2, 2014 at 4:50 PM
> To: Carter Bullard <carter at qosient.com>
> Cc: Argus <argus-info at lists.andrew.cmu.edu>
> Subject: Re: [ARGUS] rasqlinsert disconnect issues
>
> Okay, I'll try that.
>
>
>
> On Oct 2, 2014, at 00:34, Carter Bullard <carter at qosient.com> wrote:
>
> D3 should print the sql calls, which maybe all that you need to see.
> Carter
>
> On Oct 1, 2014, at 9:46 PM, David Edelman <dedelman at iname.com> wrote:
>
>
> Carter,
>
> I’m running rasqlinsert using –S localhost:561 to process the output of radium which is doing labeling. I’ll fire up an instance of the release code with the same parameter but a different table name and see if there is anything obvious. I’ve built the release code with both .debug and .devel do you have a recommendation for a debug value?
>
> —Dave
>
> From: Carter Bullard <carter at qosient.com>
> Date: Monday, September 29, 2014 at 1:51 PM
> To: David Edelman <dedelman at iname.com>
> Cc: "John T. Myers" <myersj0 at gmail.com>, Argus <argus-info at lists.andrew.cmu.edu>
> Subject: Re: [ARGUS] rasqlinsert disconnect issues
>
> Hey Dave,
> Well, that’s not what we’re striving for, so
> if we can capture what that is all about, I’ll
> fix as soon as I can.
>
> Carter
>
> On Sep 28, 2014, at 5:47 PM, David Edelman <dedelman at iname.com> wrote:
>
>
> The running but seems to no update the database was a problem that I reported and you fixed in one of the very last release candidates. The current 3.0.8 does not have that problem as best I can tell but it does stop after between 8-20 hours with no log messages that I have been able to find.
>
> --Dave
>
> From: Carter Bullard [mailto:carter at qosient.com]
> Sent: Sunday, September 28, 2014 11:34 AM
> To: David Edelman
> Cc: John T. Myers; Argus
> Subject: Re: [ARGUS] rasqlinsert disconnect issues
>
> Well the earlier rasqlinsert.1 code does have a problem that the new one tried
> to fix. I’ll try to see what may be up with mysql_real_query() error messages,
> and see if we missed something there.
>
> But in the OP, the problem is that rasqlinsert.1 is running, but not updating
> the database, in this second report, rasqlinsert.1 is failing ??
>
> Carter
>
>
> On Sep 25, 2014, at 8:11 PM, David Edelman <dedelman at iname.com> wrote:
>
>
>
> Carter,
>
> I am seeing something similar with Netflow data processed by Radium (labels added) and rasqlinsert reading the processed radium data from port 561. In my case the rasqlinsert process dies without any error messages (it is built with .debug and .devel) It isn’t practical for me to enable debuigger output since the failure can be many hours into the run.
>
> The one clue that I do have it that the release candidate set (3.0.8-rc1) worked fine.
>
> I haven’t had time to do much more than go back to the working release.
>
> --Dave
>
> From: argus-info-bounces+dedelman=iname.com at lists.andrew.cmu.edu [mailto:argus-info-bounces+dedelman=iname.com at lists.andrew.cmu.edu] On Behalf Of Carter Bullard
> Sent: Thursday, September 25, 2014 1:34 PM
> To: John T. Myers
> Cc: Argus
> Subject: Re: [ARGUS] rasqlinsert disconnect issues
>
> Hey John,
> rasqlinsert() is multi-threaded, and its possible that
> the cache concurrency thread, the one that is managing the
> database cache, exited if the database fails.
>
> rasqlinsert() should close, if that thread is done.
>
> So you are seeing rasqlinsert() is still running, but
> not updating the database ?? Is rasqlinsert() getting
> bigger ??? (should be generating INSERT and UPDATE
> requests, but the DB thread is not processing them ???)
>
> Carter
>
>
> On Sep 25, 2014, at 10:35 AM, John T. Myers <myersj0 at gmail.com> wrote:
>
>
>
>
> Hello,
>
> I was wondering if it’s normal behavior for rasqlinsert to cease inserting netflow into the database if the connection becomes interrupted? It seems if the mysql database is restarted or any part of the connection between rasqlinsert and the db is broken, it will not attempt to re-connect and continue flow insertion.
>
> Thanks!
> John
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20141006/c4dd756b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2443 bytes
Desc: not available
URL: <https://pairlist1.pair.net/pipermail/argus/attachments/20141006/c4dd756b/attachment.bin>
More information about the argus
mailing list