mentby.com
Blog | Jobs | Help | Signup | Login

MySQL related BE deadlocks - collective wisdom needed



Hi,

I'm calling for collective wisdom of all mysql/Linux gurus here.
I have continuous BE deadlocks (despite my last comments in 9773/9772
last 2 days were nightmare with 4 deadlocks).
Analyzing traces majority of them seems to point to common issue: hang
within mysql libraries during myth access to DB).
Usually hung thread looks following:

Thread 9 (Thread 0x9e4ffb70 (LWP 13256)):
#0  0xb77c6424 in __kernel_vsyscall ()
#1  0xb676581b in read () from /lib/libpthread.so.0
#2  0xb1b6eba8 in vio_read () from /usr/lib/libmysqlclient_r.so.16
#3  0xb1b6ec0b in vio_read_buff () from /usr/lib/libmysqlclient_r.so.16
#4  0xb1b6fbb0 in ?? () from /usr/lib/libmysqlclient_r.so.16
#5  0xb1b7071b in my_net_read () from /usr/lib/libmysqlclient_r.so.16
#6  0xb1b69124 in cli_safe_read () from /usr/lib/libmysqlclient_r.so.16
#7  0xb1b3c8e9 in cli_read_prepare_result () from
/usr/lib/libmysqlclient_r.so.16
#8  0xb1b3cc7a in mysql_stmt_prepare () from /usr/lib/libmysqlclient_r.so.16
#9  0xb1d08f78 in ?? () from /usr/lib/qt/plugins/sqldrivers/libqsqlmysql.so
#10 0xb64b36aa in QSqlResult::savePrepare(QString const&) () from
/usr/lib/libQtSql.so.4
#11 0xb64a55e2 in QSqlQuery::prepare(QString const&) () from
/usr/lib/libQtSql.so.4
#12 0xb6c87bdc in MSqlQuery::prepare (this=0x9e4feb40, query=...) at
mythdbcon.cpp:757
#13 0xb740de84 in get_chan_id_from_db (sourceid=8, serviceid=13122,
networkid=113, transportid=13200) at eithelper.cpp:805
#14 0xb740d8f2 in EITHelper::GetChanID (this=0x8d49800, serviceid=13122,
networkid=113, tsid=13200) at eithelper.cpp:742
#15 0xb740b3b9 in EITHelper::AddEIT (this=0x8d49800, eit=0x9e4feee0) at
eithelper.cpp:346
#16 0xb719e260 in DVBStreamData::HandleTables (this=0x9f5bb1f0, pid=18,
psip=...) at mpeg/dvbstreamdata.cpp:376
#17 0xb717c4c1 in MPEGStreamData::HandleTSTables (this=0x9f5bb278,
tspacket=0x9da53828) at mpeg/mpegstreamdata.cpp:935
#18 0xb717c914 in MPEGStreamData::ProcessTSPacket (this=0x9f5bb278,
tspacket=...) at mpeg/mpegstreamdata.cpp:1010
#19 0xb717c658 in MPEGStreamData::ProcessData (this=0x9f5bb278,
buffer=0x9da4e008 "G\001a\035\345H\032\221\315tE", <incomplete sequence
\366>,
     len=26508) at mpeg/mpegstreamdata.cpp:960
#20 0xb750415c in DVBStreamHandler::RunTS (this=0x9e48678) at
dvbstreamhandler.cpp:258
#21 0xb7502dcf in DVBStreamHandler::run (this=0x9e48678) at
dvbstreamhandler.cpp:111
#22 0xb5673843 in ?? () from /usr/lib/libQtCore.so.4
#23 0xb675ec77 in start_thread () from /lib/libpthread.so.0
#24 0xb5446c6e in clone () from /lib/libc.so.6

In many traces lines 0...12 looks very similar.
I'm on ArchLinux. Sys is quite fresh (all packages refreshed 13/05).
Myth is latest master (compiled 2 days ago).
Issue was present with Arch stock 5.5.12 mysql packages.

I already:
-do mysql downgrade by fresh rebuild & reinstall mysql 5.1.57 (Also all
mysql dependent packages were rebuilded/reinstall)
-backup/delete/initialize/restore myth DB
-turn-off completely MySQL query cache
-change DB schema from MyISAM to InnoDB
-turn-off all other apps using mysql

Unfortunately nothing from above list helps - I still have deadlocks
(this weekend 4. Very frustrating).

I'll be really grateful for any hint helping me to move forward with
this very frustrating situation.

br


Warpme Sun, 24 Jul 2011 06:12:03 -0700

I do not know the MythTV code at all, but that does look like it is waiting
for (or actually reading) data from the DB. Have you looked at what queries
are running when Myth hangs? Try connecting to the database using the mysql
command line tool and running "show full processlist;" to see what it is up
to. You could also try turning on the MySQL general query log to see what it
is doing prior to the problem occurring: http://dev.mysql.com/doc/refman/5.1/en/query-log.html

David


David Edwards Mon, 25 Jul 2011 02:00:10 -0700

mytop might also be able to provide some useful information?

Cheers,

Anthony


Anthony Giggins Mon, 25 Jul 2011 05:31:04 -0700

And, since you converted to InnoDB (which, BTW, for others reading this
thread, converting to InnoDB is not something we're recommending you
do--when we're ready to do so, MythTV will automatically update the
schema to use the InnoDB engine), please post the output of:

SHOW ENGINE INNODB STATUS\G

when it's deadlocked.

Thanks,
Mike


Michael T. Dean Mon, 25 Jul 2011 07:23:55 -0700

Oh, and probably not important, but because it would be useful in
finding out if it's important, do you run mythshutdown?

Mike


Michael T. Dean Mon, 25 Jul 2011 09:33:24 -0700

Mike,

Pls find attached BE trace &  SHOW ENGINE INNODB STATUS\G from
deadlocked BE (protocol version check deadlock).
Regarding mythshutdown - no I'm not using it at all.
When I'm shutting or rebooting server - SIGTERM is used (I'm using
systemd as replacement to SysV init). But I have to admit - many BE
shutdowns are ended with segfaults.
Your previous post suggests switch to InnoDB wasn't good idea. May You
elaborate why ?
Should I switch back to MyISAM ?
br


Warpme Mon, 25 Jul 2011 11:22:55 -0700

This is SHOW ENGINE INNODB STATUS while it's deadlocked?

I'm saying that we will eventually switch all schemas to InnoDB.  In
general we recommend users never modify any of the schema since parts of
MythTV may be written to assume certain schema configuration (such as
all the places that do SELECT COUNT(), which is very fast on MyISAM but
rather inefficient on InnoDB), or future updates to the MythTV schema
may fail if your schema is not the one we expect (the one we created),
thereby preventing a smooth upgrade.  It's the old, "If you're modifying
our data and things break (now or at any time in the future) as a
result, you get to clean up the mess."

More than anything I wanted to make sure we don't have a bunch of users
who are seeing backend deadlocks start trying to convert to InnoDB,
thinking that would fix the problem.

Whether you switch back or not is up to you--but since you're already on
InnoDB, please try to get the InnoDB status while it's locked before
doing so, if the one you sent wasn't while it was locked.

Thanks,
Mike


Michael T. Dean Mon, 25 Jul 2011 11:38:03 -0700

I read it.  Thanks.

Mike


Michael T. Dean Mon, 25 Jul 2011 11:57:48 -0700

Hi *

Thing which most intrigues me is practically lack of similar reports from other MySQL users.
I do quick search on google and wasn't able to find any reports about hangs in mysql client library.
This almost suggest that my issue is probably not mysql itself derived issue (like bug or missconfig/missusage).
My tests with 3 different Mysql versions (2 major ones) are rather conforming this conclusion.

This suggest me to put more attention on my specifics.
We can think here about 3 areas:
1.Usage
2.HW/SW
3.Content

Ad 1:
Here I'm wonder about resources release (especially in non-standard situations like deadlocked BE or hang FE).

FE: I have master BE + 3xFE used in sleep/resume. Before putting FE into sleep, script sends every 1 sec. main menu exit command. When after 5 times FE still exists - script give killall mythfrontend. I believe this is OK. Sometimes I have to kill -9 FE as it hangs. Is it possible that I miss something here ?

BE: I have (few per day) crashes of mythpreviewgen (mainly issue of encoding on Polish DVB-T MPEG4). Maybe segfaulting previewgen is not releasing MySQL resources thus leading to internal MySQL resources exhaustion ?
I'm not using any other jobs on BE like commflag or transcode.

Ad2:
OS: I'm on Arch Linux. Arch is rolling release. I'm partially minimizing negative aspects of rolling release by do full system upgrade only when it is really needed (usually when number of packages to manually rebuild is really high). Rule is: don't touch if it is working OK.
MySLQ issue was present with various kernels (2.6.35/2.6.36 & 2.6.38). I tested also Qt4.6.3 and 4.7.3.
MySQL was 5.1.51; 5.1.57 & 5.5.12. No change observed.

Ad3: Here I mean mainly DB content.
Is it possible that my DB has so specific content (I mean i.e. table sizes) or something else which can cause so strange MySQL behavior ?  
Currently total DB size is approx 370M. In terms of content age DB was initialized 4-5y ago. Is it relevant ?

Hope above analytical view may help to lead somebody to some ideas where to look for more closely ?
br


Warpme Tue, 26 Jul 2011 05:55:03 -0700

A few thoughts based on those dumps. It is entirely possible that any of
these thoughts are misguided, since I know little about the code:

1. I don't see anything here suggesting this is a deadlock in MythTV; most
of the threads are waiting to get a lock, but I'm guessing that the lock is
being held by the thread trying to run the DB query, which is waiting on the
database rather than waiting on a lock in the Myth code. Maybe this is a
terminology issue, but I think it makes a difference in terms of the
possible causes.

2. The thread trying to query the database is hanging while preparing a
statement rather than executing one. The database appears to be idle, unless
I'm misinterpreting the output or if you have some tables that are not
InnoDB. Thus the database does not appear to be blocking due to locks.

3. The thread trying to run the DB query (0x98eafb70 = 2565536624) does not
appear in the list of threads in the DB status output. Again, I may be
misinterpreting the output, but if I'm correct then the thread is trying to
prepare a query on a connection that no longer open.

No idea if any of that is helpful...

David


David Edwards Tue, 26 Jul 2011 08:37:48 -0700

This was all I could find: http://bugs.mysql.com/bug.php?id=50535

But it is inconclusive.

David


David Edwards Tue, 26 Jul 2011 08:42:39 -0700

Warpme, out of curiosity, what happens if you revert

https://github.com/MythTV/mythtv/commit/528324d8

Mike


Michael T. Dean Tue, 26 Jul 2011 10:10:29 -0700

@David,

Thx You found time to look into this issue.

lock is
on the

I'm far less skilled in reading stack traces but Daniel's K comments
pointed me to get closer looks
on all collected traces. Reading some Internet leads me to exactly this
conclusion. If You have time to read my last post in this topic - I'm
currently fully inline with Your hypothesis and trying identify root
cause in external to myth itself areas.

Reading InnoDB statistics shows me exactly what You say.
(seeing this first time I was little "disappointed" as from DB server
all looks OK so we still don't know where is root cause).

Regarding to be sure that we see right statistics: here is mysql_tuner
report about all schemes present in my DB:

-------- Storage Engine Statistics
-------------------------------------------
[--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in InnoDB tables: 147M (Tables: 148)
[!!] Total fragmented tables: 148
-------- Security Recommendations
-------------------------------------------

does not
trying to

Oh THIS is VERY valid observation.
Reading MySQL forums and mysqlclient bug reports I saw reports having
quite similar stack trace.
Identified root cause was situation where on Linux comm.socket failure
wasn't properly propagated to upper levels of libs stack (mysqlclient
lib as well). MySQL devs solution was based on timeouts. That was at 4.x
MySQL level and was included into mainline sources - so I classified
this as similar but already solved.
Called functions in my stack traces indeed are very often related to
vio_read/vio_read_buffer/my_net_read.
At sometime I was suspecting that this might be related to mysql
client<->server communication, but no any relevant entries in LOGs, the
same PIDs for mysql/BE make me to refuse this hypothesis.

Currently I have big hopes with Mike's post about https://github.com/MythTV/mythtv/commit/528324d8
Currently it matches all checks If we assume that from some reason we
are calling MSqlDatabase::Reconnect().
If I'm reading commit code properly, condition for forced DB reconnect
is failed query AND error code = 2006.
Maybe mysqlclient is reacting with hang when short time earlier some
other thread force DB reconnect ? (this is somehow inline with some
mysql bug reports about mysqlclient hangs triggered by previous
client-server comm failures. Solution for this is based on timeout. Do
we use timeouts in mysql operations in myth ?)

For testing above hypothesis I think it is enough to comment commit's
lines 182/183:

+    m_db.close();
+    m_db.open();

@Mike,
Pls confirm that commenting those 2 lines is enough to functionally
furn-off this commit.
I will do such mod and try how it works.

Gents, thx again for trying help !

-br


Warpme Tue, 26 Jul 2011 12:34:16 -0700

Interesting... is there a reason that custom reconnect code was written? I
think that the MySQL client lib has an auto-reconnect option that can be set
when initially creating the DB connection. We successfully use this at work
(it works seamlessly for us), although this is from Perl code using DBI
rather than C++ with Qt. I see a comment in mythdbcon.cpp about reconnect
behaviour from 2003, but that could be out of date.

I also found this which looks very relevant: https://bugreports.qt.nokia.com//browse/QTBUG-17140

David


David Edwards Wed, 27 Jul 2011 01:00:07 -0700

If for no other reason, because we've never had a released version of
MythTV whose minimum required Qt version was 4.6 or higher--and
MYSQL_OPT_RECONNECT support was added to the Qt-MySQL driver in Qt 4.6.  
We had discussed it back in 2010, but since we required Qt 4.5+ at that
time, it wasn't an option.  I had meant to look at re-enabling it when
Qt 4.6 became a requirement, but hadn't yet remembered.

Note that before MySQL 5, reconnecting connections were the default in
the MySQL C API (so, were always enabled for the Qt-MySQL driver).  
Unfortunately, depending on how the application is written,
reconnections may cause issues (since reconnections have effects on
transactions, table locks, temporary tables, session and user variables,
prepared statements, and such), so MySQL 5.0.3 changed the default to
disable automatic reconnections (and MySQL 5.0.13 added
MYSQL_OPT_RECONNECT to make it an application-controllable option).  
Since the Qt-MySQL driver didn't have support for specifying
MYSQL_OPT_RECONNECT, it was impossible to get reconnecting connections
with MySQL 5.0.3+ in Qt applications.  This actually caused a huge
number of problems for us way back in the day (around March 2005)--even
before we switched to Qt4 and made MySQL 5.0.15+ a requirement for
MythTV.  Therefore, much code was written to work around issues.  
Finally in Qt4.6, Qt devs added support for specifying MYSQL_OPT_RECONNECT.

Since 0.24 was released, we've changed the minimum Qt version to 4.6 (as
of May 22), so I will probably change the code to enable
MYSQL_OPT_RECONNECT.  Thanks for the reminder.

Mike


Michael T. Dean Wed, 27 Jul 2011 02:30:16 -0700

Well,
Having 30min Yesterday evening I recompile BE with above 2 lines commented in mythdbconn.cpp.
Unfortunately today I had deadlock again. This time it was in scheduler. Trace is again well known:

Thread 20 (Thread 0xaab0bb70 (LWP 31981)):
#0  0xb7839424 in __kernel_vsyscall ()
#1  0xb67d781b in read () from /lib/libpthread.so.0
#2  0xb1be1ba8 in vio_read () from /usr/lib/libmysqlclient_r.so.16
#3  0xb1be1c0b in vio_read_buff () from /usr/lib/libmysqlclient_r.so.16
#4  0xb1be2bb0 in ?? () from /usr/lib/libmysqlclient_r.so.16
#5  0xb1be371b in my_net_read () from /usr/lib/libmysqlclient_r.so.16
#6  0xb1bdc124 in cli_safe_read () from /usr/lib/libmysqlclient_r.so.16
#7  0xb1baf8e9 in cli_read_prepare_result () from /usr/lib/libmysqlclient_r.so.16
#8  0xb1bafc7a in mysql_stmt_prepare () from /usr/lib/libmysqlclient_r.so.16
#9  0xb1d7bf78 in ?? () from /usr/lib/qt/plugins/sqldrivers/libqsqlmysql.so
#10 0xb65256aa in QSqlResult::savePrepare(QString const&) () from /usr/lib/libQtSql.so.4
#11 0xb65175e2 in QSqlQuery::prepare(QString const&) () from /usr/lib/libQtSql.so.4
#12 0xb6cf9b70 in MSqlQuery::prepare (this=0xaab0aa48, query=...) at mythdbcon.cpp:757
#13 0xb6a45c06 in ProgramInfo::QueryMplexID (this=0x98552a10) at programinfo.cpp:2241
#14 0x080d9c81 in Scheduler::FindNextConflict (this=0x9e92fe0, cardlist=..., p=0x98552a10, j=..., openEnd=0)
    at scheduler.cpp:973
#15 0x080da22b in Scheduler::FindConflict (this=0x9e92fe0, reclists=..., p=0x98552a10, openend=0) at scheduler.cpp:1005
#16 0x080dabdb in Scheduler::TryAnotherShowing (this=0x9e92fe0, p=0x985532e0, samePriority=false, preserveLive=false)
    at scheduler.cpp:1162
#17 0x080dc0ad in Scheduler::MoveHigherRecords (this=0x9e92fe0, move_this=true) at scheduler.cpp:1331
#18 0x080dba8b in Scheduler::SchedNewRecords (this=0x9e92fe0) at scheduler.cpp:1279
#19 0x080d48b2 in Scheduler::FillRecordList (this=0x9e92fe0) at scheduler.cpp:372
#20 0x080e028b in Scheduler::HandleReschedule (this=0x9e92fe0) at scheduler.cpp:2046
#21 0x080df269 in Scheduler::run (this=0x9e92fe0) at scheduler.cpp:1826
#22 0xb56e5843 in ?? () from /usr/lib/libQtCore.so.4
#23 0xb67d0c77 in start_thread () from /lib/libpthread.so.0
#24 0xb54b8c6e in clone () from /lib/libc.so.6

Damn - I'm out of ideas....


Warpme Wed, 27 Jul 2011 02:48:42 -0700

Trying another angle on the disconnection idea, could you please post
the contents of your /etc/my.cnf file? I'm mostly interested in the
wait_timeout parameter, if there is one, but the whole thing might help.

Also, have you posted the contents of a "SHOW PROCESSLIST;" when the
database is locked? I don't recall seeing one yet.

--
Dan Wilga                                                        "Ook."


Dan Wilga Wed, 27 Jul 2011 05:48:26 -0700

A better test would be to apply https://github.com/MythTV/mythtv/commit/dd2d7a14d  to your MythTV (or, if
you're running master, just pull an update).  If you're on 0.24-fixes,
it will be much easier to backport this change than it was to revert the
commit I mentioned.  Just make sure you're using Qt 4.6 or higher if
you're on 0.24-fixes.

Thanks,
Mike


Michael T. Dean Wed, 27 Jul 2011 12:01:54 -0700

Is all of this to indicate that this won't land on 0.24-fixes?

I've been following this thread with great interest since I think it's
the cause of #9704 also, which has been plaguing me for much too long.

Cheers,
b.


Brian J. Murrell Wed, 27 Jul 2011 12:09:29 -0700

Since 0.24-fixes requires Qt 4.5+, we can't use this on it.

OK, actually, we kind of could, but the logs would be littered with

QMYSQLDriver::open: Illegal connect option value 'MYSQL_OPT_RECONNECT=1'

or something like that for anyone using Qt4.5.  In theory, it won't
cause any problems for them, but until we have some reasonable proof
that it's actually fixing the issue, we'll  just test the code in
unstable.  If we do find evidence that it actually works to prevent
these deadlocks, I'll see about adding it to 0.24-fixes--so at least
people using Qt 4.6+ will benefit from it.

Input from those of you experiencing the deadlocks would make it much
easier.  So, if you can, please try building your MythTV with the patch
and using it.

I've attached a version of the patch for 0.24-fixes.

Thanks,
Mike

diff --git a/mythtv/libs/libmythdb/mythdbcon.cpp b/mythtv/libs/libmythdb/mythdbcon.cpp
index aa7c231..ebca711 100644
--- a/mythtv/libs/libmythdb/mythdbcon.cpp
+++ b/mythtv/libs/libmythdb/mythdbcon.cpp
@@ -28,6 +28,10 @@ MSqlDatabase::MSqlDatabase(const QString &name)
                "Unable to init db connection.");
         return;
     }
+    QString connectOptions("MYSQL_OPT_RECONNECT=1");
+    VERBOSE(VB_IMPORTANT, QString("Setting connect options: %1")
+                                  .arg(connectOptions));
+    m_db.setConnectOptions(connectOptions);
     m_lastDBKick = QDateTime::currentDateTime().addSecs(-60);
}


Michael T. Dean Wed, 27 Jul 2011 13:24:39 -0700

Can it not be wrapped in:

#if (QT_VERSION >= QT_VERSION_CHECK(4, 6, 0))

#endif

??


Nigel Jewell Wed, 27 Jul 2011 13:57:38 -0700

Still, first, we need to figure out if it even helps...  Feedback is
good.  :)

Mike


Michael T. Dean Wed, 27 Jul 2011 14:03:20 -0700

Mike,
Milion thx for so quick commits in subject.
I compiled BE/FE and launch stress tests on my production system (40+ rec, many concurrent multirec starting on the same time). Les see how it goes.
Despite potential success (I really hope so), I still want to find root cause.
It looks like mainly me & Brian have this issue - this suggests it is probably our case specific.
Have You any hint about extended logging in MySQL/Qt for looking closely on problem ?
br


Warpme Thu, 28 Jul 2011 01:24:35 -0700

I personally think including this can lead to myth functional improvement derived from improvements of underlaying foundations (Qt 4.6) and it's value is independent on impact on my issue.
Assuming it is not trigger to other MySql bugs (and MySQL bug tracker is silent here) it is definitely worth to include in a way You mention.
br


Warpme Thu, 28 Jul 2011 03:25:56 -0700

I thought there might be a bit more to it than meets the eye! Thanks for the
explanation.

David


David Edwards Thu, 28 Jul 2011 09:39:13 -0700

If you remove the patch, you might be able to replicate the problem by
manually killing the open connections on MySQL; using the mysql command line
client, do a SHOW PROCESSLIST to see the open connections, and use "KILL
<connection id>" to make the server drop the connection. This might confirm
that a stale connection can cause the stack traces you are seeing.

If the problem is reconnections, then as you say, identifying the root cause
could be worthwhile - especially if you think the connections are being
dropped prematurely (ie, before MySQL hits its timeout for inactivity). You
could try turning on the MySQL General Query Log - if I remember correctly,
that logs connections and disconnections.

David


David Edwards Thu, 28 Jul 2011 09:45:03 -0700

At this exact moment, my backend is not responding to any frontends and
is reporting in it's logs:

2011-07-28 19:01:15.472 MythSocket(a035b18:11): readStringList: Error,
timed out after 7000 ms.
2011-07-28 19:01:15.473 Protocol version check failure.
            The response to MYTH_PROTO_VERSION was empty.
            This happens when the backend is too busy to respond,
            or has deadlocked in due to bugs or hardware failure.

It doesn't mean much to me, but here's mine:

+-----+--------+-----------+-------------+---------+-------+-------+-----------------------+
| Id  | User   | Host      | db          | Command | Time  | State |
Info                  |
+-----+--------+-----------+-------------+---------+-------+-------+-----------------------+
| 138 | mythtv | localhost | mythconverg | Sleep   |  2410 |       |
NULL                  |
| 519 | mythtv | localhost | mythconverg | Sleep   | 12320 |       |
NULL                  |
| 520 | mythtv | localhost | mythconverg | Sleep   | 12312 |       |
NULL                  |
| 608 | mythtv | localhost | mythconverg | Sleep   |     2 |       |
NULL                  |
| 609 | mythtv | localhost | mythconverg | Sleep   |    12 |       |
NULL                  |
| 610 | mythtv | localhost | mythconverg | Sleep   |     2 |       |
NULL                  |
| 612 | mythtv | localhost | mythconverg | Sleep   |  1711 |       |
NULL                  |
| 615 | mythtv | localhost | mythconverg | Sleep   |   170 |       |
NULL                  |
| 617 | root   | localhost | NULL        | Query   |     0 | NULL  |
SHOW FULL PROCESSLIST |
+-----+--------+-----------+-------------+---------+-------+-------+-----------------------+
9 rows in set (0.00 sec)

Anything useful there?

I'm just about to install the latest fixes with Michael's small
"MYSQL_OPT_RECONNECT=1" patch posted in another message.

b.


Brian J. Murrell Thu, 28 Jul 2011 17:10:45 -0700

Brian,

This is little off-topic, but maybe it can help You to at least little decrease deadlock's frustration.

5months ago I was on 0.24-fixes and I had very similar situation: MYTH_PROTO deadlocks lading to totally unusable 3 standalone FE + FE as app on desktop. This was so frustrating that I started to be really desperate. I decided give a chance for master.
Rationale to switch was quite simple: even if - by it's active development status - master has many more bugs it can't be worse than 0.24-fixes with it's nasty MYTH_PROTO deadlock.
Another argument was that, assuming my case is case specific, I can't expect dev will be devote enough time to resolve case specific issue - what worse - issue which pops up on code being in maintenance mode.
    
First impression after switching: features stability is not worse than 0.24, but I still had deadlocks (less frequent but still present).
First breakthrough was when Daniel partially merged rec2 branch. This was night and day for DVB-S2 recording part of my system. In my case this was revolution as now I can reliably multirecord concurrent shows starting at the same time.

Next revolution was when code responsible for MYTH_PROTO was modularized. After this code change, resiliency goes highly up: failure in recording subsystem has much less impact on playback availability. Basically since that time majority of deadlocks which I observe are always related to mysqlclient library and according to current hypothesis are related to DB reconnects.

Taking into account current state of Your's WAF I think You can consider switch to master. Probability of improvement is really high, risks that it will be worse is probably near to zero (there is no worse thing than hung, right ;-p).
This IMHO will be also beneficial for devs as MYTH_PROTO deadlock efforts can be consolidated and directed on single myth version.

I'm using master daily on production system and my current overall impression is that it works better (much better) than 0.24.
Comparing both: I didn't notice any areas where master has noticeable regressions to 0.24 (oh, PIP is one from them) but I see many many improvements.
Speaking by emotions: master is much much more pleasure to use.

Hope it helps


Warpme Fri, 29 Jul 2011 01:59:54 -0700

Hi *

Having this thread as kind of "mysql and myth" I'm seizing an opportunity to clarify following mysql-myth related issue:
I have occasionally following error in myth log:  

2011-07-29 03:00:02.713306 I  EITScanner (5): Started passive scan.
2011-07-29 03:00:02.725219 I  TVRec(5): rec->GetPathname(): '/myth/tv/15315_20110729030000.mpg'
2011-07-29 03:00:21.058767 E  DB Error (change_program):
Query was:
UPDATE program SET starttime = ?,     endtime   = ? WHERE chanid    = ? AND       starttime = ?
Bindings were:
:CHANID=21023, :NEWEND 11-07-29T06:00:00, :NEWSTART 11-07-29T05:00:00,
:OLDSTART 11-07-29T04:00:00
Driver error was [2/1062]:
QMYSQL3: Unable to execute statement
Database error was:
Duplicate entry '21023-2011-07-29 05:00:00-0' for key 'PRIMARY'
2011-07-29 03:01:57.645007 I  EITScanner (5): Added 869 EIT Events

It is not destructive nor I can notice any negative side effects but I want be 100% sure that above issue isn't connected with mysql deadlocks.

br


Warpme Fri, 29 Jul 2011 02:42:24 -0700

I've seen similar in the past:

2011-07-24 22:46:31.945 DB Error (Frame rate insert):
Query was:
INSERT INTO recordedmarkup    (chanid, starttime, mark, type, data)  
VALUES ( ?, ?, ?, ?, ?);
Bindings were:
:CHANID=1035, :DATA=29970, :MARK=83569, :STARTTIME 11-07-24T22:00:00,
:TYPE=32
Driver error was [2/1062]:
QMYSQL3: Unable to execute statement
Database error was:
Duplicate entry '1035-2011-07-24 22:00:00-32-83569' for key 'PRIMARY'

and these ones:

2011-07-27 16:27:43.255 Error preparing query: SELECT
r.title,            r.subtitle,     r.description,          
r.category,         r.chanid,       c.channum,              
c.callsign,         c.name,         c.outputfilters,        
r.recgroup,         r.playgroup,    r.storagegroup,          
r.basename,         r.hostname,     r.recpriority,          
r.seriesid,         r.programid,    r.filesize,              
r.progstart,        r.progend,      r.stars,                
r.starttime,        r.endtime,      p.airdate 0,            
r.originalairdate,  r.lastmodified, r.recordid,              
c.commmethod,       r.commflagged,  r.previouslyshown,      
r.transcoder,       r.transcoded,   r.deletepending,        
r.preserve,         r.cutlist,      r.autoexpire,            
r.editing,          r.bookmark,     r.watched,              
p.audioprop 0,      p.videoprop 0,  p.subtitletypes 0,      
r.findid,           rec.dupin,      rec.dupmethod      FROM recorded AS
r LEFT JOIN channel AS c ON (r.chanid    = c.chanid) LEFT JOIN
recordedprogram AS p ON (r.chanid    = p.chanid AND     r.progstartp.starttime) LEFT JOIN record AS rec ON (r.recordid = rec.recordid)
WHERE r.chanid    = :CHANID AND       r.starttime = :RECSTARTTS
2011-07-27 16:27:43.259 Driver error was [2/2013]:
QMYSQL3: Unable to prepare statement
Database error was:
Lost connection to MySQL server during query

2011-07-27 16:27:43.261 MySQL reconnected successfully
2011-07-27 16:27:43.285 DB Error (LoadProgramFromRecorded):
Query was:
SELECT r.title,            r.subtitle,     r.description,          
r.category,         r.chanid,       c.channum,              
c.callsign,         c.name,         c.outputfilters,        
r.recgroup,         r.playgroup,    r.storagegroup,          
r.basename,         r.hostname,     r.recpriority,          
r.seriesid,         r.programid,    r.filesize,              
r.progstart,        r.progend,      r.stars,                
r.starttime,        r.endtime,      p.airdate 0,            
r.originalairdate,  r.lastmodified, r.recordid,              
c.commmethod,       r.commflagged,  r.previouslyshown,      
r.transcoder,       r.transcoded,   r.deletepending,        
r.preserve,         r.cutlist,      r.autoexpire,            
r.editing,          r.bookmark,     r.watched,              
p.audioprop 0,      p.videoprop 0,  p.subtitletypes 0,      
r.findid,           rec.dupin,      rec.dupmethod      FROM recorded AS
r LEFT JOIN channel AS c ON (r.chanid    = c.chanid) LEFT JOIN
recordedprogram AS p ON (r.chanid    = p.chanid AND     r.progstartp.starttime) LEFT JOIN record AS rec ON (r.recordid = rec.recordid)
WHERE r.chanid    = :CHANID AND       r.starttime = :RECSTARTTS
Bindings were:
:CHANID=2917, :RECSTARTTS 11-06-18T20:00:00
Driver error was [2/1064]:
QMYSQL: Unable to execute query
Database error was:
You have an error in your SQL syntax; check the manual that corresponds
to your MySQL server version for the right syntax to use near ':CHANID
AND       r.starttime = :RECSTARTTS' at line 1

and:

2011-07-28 04:14:46.275 Error preparing query: SELECT
r.title,            r.subtitle,     r.description,          
r.category,         r.chanid,       c.channum,        
      c.callsign,         c.name,         c.outputfilters,        
r.recgroup,         r.playgroup,    r.storagegroup,          
r.basename,         r.hostname,     r.recp
riority,            r.seriesid,         r.programid,  
r.filesize,               r.progstart,        r.progend,    
r.stars,                  r.starttime,        r.endtim
e,      p.airdate 0,              r.originalairdate,  r.lastmodified,
r.recordid,               c.commmethod,       r.commflagged,
r.previouslyshown,        r.transcoder,
      r.transcoded,   r.deletepending,          r.preserve,        
r.cutlist,      r.autoexpire,             r.editing,        
r.bookmark,     r.watched,              
p.audioprop 0,      p.videoprop 0,  p.subtitletypes 0,      
r.findid,           rec.dupin,      rec.dupmethod      FROM recorded AS
r LEFT JOIN channel AS c ON (r.chanid
   = c.chanid) LEFT JOIN recordedprogram AS p ON (r.chanid    = p.chanid
AND     r.progstart = p.starttime) LEFT JOIN record AS rec ON
(r.recordid = rec.recordid) WHERE r.c
hanid    = :CHANID AND       r.starttime = :RECSTARTTS
2011-07-28 04:14:46.276 Driver error was [2/2013]:
QMYSQL3: Unable to prepare statement
Database error was:
Lost connection to MySQL server during query

2011-07-28 04:14:46.278 MySQL reconnected successfully
2011-07-28 04:14:46.313 DB Error (LoadProgramFromRecorded):
Query was:
SELECT r.title,            r.subtitle,     r.description,          
r.category,         r.chanid,       c.channum,              
c.callsign,         c.name,         c.outputfilters,        
r.recgroup,         r.playgroup,    r.storagegroup,          
r.basename,         r.hostname,     r.recpriority,          
r.seriesid,         r.programid,    r.filesize,              
r.progstart,        r.progend,      r.stars,                
r.starttime,        r.endtime,      p.airdate 0,            
r.originalairdate,  r.lastmodified, r.recordid,              
c.commmethod,       r.commflagged,  r.previouslyshown,      
r.transcoder,       r.transcoded,   r.deletepending,        
r.preserve,         r.cutlist,      r.autoexpire,            
r.editing,          r.bookmark,     r.watched,              
p.audioprop 0,      p.videoprop 0,  p.subtitletypes 0,      
r.findid,           rec.dupin,      rec.dupmethod      FROM recorded AS
r LEFT JOIN channel AS c ON (r.chanid    = c.chanid) LEFT JOIN
recordedprogram AS p ON (r.chanid    = p.chanid AND     r.progstartp.starttime) LEFT JOIN record AS rec ON (r.recordid = rec.recordid)
WHERE r.chanid    = :CHANID AND       r.starttime = :RECSTARTTS
Bindings were:
:CHANID=1005, :RECSTARTTS 10-11-04T20:00:00
Driver error was [2/1064]:
QMYSQL: Unable to execute query
Database error was:
You have an error in your SQL syntax; check the manual that corresponds
to your MySQL server version for the right syntax to use near ':CHANID
AND       r.starttime = :RECSTARTTS' at line 1

and from earlier logs:

2011-07-06 06:30:04.730 DB Error (SimpleDBStorage::Load()):
Query was:
SELECT value  FROM codecparams WHERE profile = ? AND name = ?
Bindings were:
:WHERENAME=mpeg4minquality, :WHEREPROFILE=5
Driver error was [2/2013]:
QMYSQL3: Unable to execute statement
Database error was:
Lost connection to MySQL server during query

2011-07-06 06:30:04.744 MySQL reconnected successfully
QMYSQLResult::cleanup: unable to free statement handle
QMYSQLResult::cleanup: unable to free statement handle
2011-07-06 06:30:04.845 TVRec(15): rec->GetPathname():
'/video/mythtv/1019_20110705223000.mpg'
2011-07-06 06:30:04.881 MySQL reconnected successfully
2011-07-06 06:30:04.887 MySQL reconnected successfully
2011-07-06 06:30:04.892 MythSocket(ffffffffb4ee15e8:-1):
writeStringList: Error, writeBlock failed. (NetworkFailure)
2011-07-06 06:30:04.892 MythSocket(8f38250:-1): writeStringList: Error,
writeBlock failed. (NetworkFailure)
2011-07-06 06:30:04.893 MainServer, Warning: Unknown socket closing
MythSocket(0xffffffffa777a9d0)
2011-07-06 06:30:04.894 MainServer, Warning: Unknown socket closing
MythSocket(0xffffffffa0ec3050)
2011-07-06 06:30:04.915 MythSocket(ffffffffa777a9d0:-1):
writeStringList: Error, socket went unconnected.
2011-07-06 06:30:04.915 MythSocket(ffffffffa0ec3050:-1):
writeStringList: Error, socket went unconnected.
2011-07-06 06:30:04.968 MythSocket(ffffffffa7402a90:-1):
writeStringList: Error, writeBlock failed. (NetworkFailure)
2011-07-06 06:30:04.971 MythSocket(ffffffffa77f3a58:-1):
writeStringList: Error, writeBlock failed. (NetworkFailure)

The NetworkFailure there is somewhat of a red herring since the database
on the same machine as the backend and at 6:30am, there is no viewing
going on from a remote frontend.

and:

2011-07-08 06:20:26.992 Error preparing query: SELECT inputname,
sourceid, cardid FROM cardinput WHERE cardinputid = :INPUTID
2011-07-08 06:20:26.997 Driver error was [2/2013]:
QMYSQL3: Unable to prepare statement
Database error was:
Lost connection to MySQL server during query

2011-07-08 06:20:26.999 MySQL reconnected successfully
2011-07-08 06:20:27.001 DB Error (CardUtil::GetInputInfo()):
Query was:
SELECT inputname, sourceid, cardid FROM cardinput WHERE cardinputid:INPUTID
Bindings were:
:INPUTID=18
Driver error was [2/1064]:
QMYSQL: Unable to execute query
Database error was:
You have an error in your SQL syntax; check the manual that corresponds
to your MySQL server version for the right syntax to use near ':INPUTID'
at line 1

QMYSQLResult::cleanup: unable to free statement handle
2011-07-08 06:20:27.056 MythSocket(9d53258:-1): writeStringList: Error,
writeBlock failed. (NetworkFailure)

and again, no viewing by any remote FE at 6:20am.

Now that I look at it more closely, it seems i get mysql errors more
often that i thought I did.  But I'm not convinced that they are the
cause of lockups and recording failures since they just do not correlate.

For example, I did apply the "MYSQL_OPT_RECONNECT=1" patch last night,
had no mysql errors but I still had a failed recording with the only
anomalous looking log entries being:

2011-07-28 21:00:56.705 MythSocket(96f35a0:12): readStringList: Error,
timed out after 30000 ms.
2011-07-28 21:00:56.706 Connection to backend server lost
2011-07-28 21:00:56.707 MythCoreContext: Connecting to backend server:
10.75.22.2:6543 (try 1 of 5)
2011-07-28 21:00:56.713 Using protocol version 63
2011-07-28 21:00:56.717 MainServer::ANN Playback
2011-07-28 21:00:56.718 adding: pvr as a client (events: 0)
2011-07-28 21:00:56.721 MainServer::ANN Monitor
2011-07-28 21:00:56.722 adding: pvr as a client (events: 1)
2011-07-28 21:01:26.730 MythSocket(96f3b40:9): readStringList: Error,
timed out after 30000 ms.
2011-07-28 21:01:26.731 Reconnection to backend server failed
2011-07-28 21:01:26.732 Unable to find active recorder for this
recording, realtime flagging will not be enabled.
2011-07-28 21:01:26.817 MythCoreContext: Connecting to backend server:
10.75.22.2:6543 (try 1 of 5)
2011-07-28 21:01:26.830 Using protocol version 63
2011-07-28 21:01:26.831 MainServer::ANN Playback

And subsequent to that clients were again unable to connect to the
backend which required a restart to be usable with any FE, including
mythweb.

The stack trace when it was locked up did have the tell-tale mysql
thread in it:

Thread 19 (Thread 0xb3aa3b70 (LWP 15517)):
#0  0x00ee9422 in __kernel_vsyscall ()
No symbol table info available.
#1  0x00c95f5b in read () at ../sysdeps/unix/syscall-template.S:82
No locals.
#2  0x03480ab8 in vio_read () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#3  0x03480b16 in vio_read_buff () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#4  0x03481d13 in ?? () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#5  0x034820a7 in my_net_read () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#6  0x0347ba3f in cli_safe_read () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#7  0x03448f3e in cli_read_prepare_result () from
/usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#8  0x0344640e in mysql_stmt_prepare () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#9  0x03400592 in QMYSQLResult::prepare (this=0x8c5a600, query=...) at
../../../sql/drivers/mysql/qsql_mysql.cpp:883
        r = <value optimized out>
        encQuery = {static shared_null = {ref = {_q_value = 1}, alloc0, size = 0, data = 0x1e013a8 "", array = ""}, static shared_empty{ref = {_q_value = 13}, alloc = 0, size = 0, data = 0x1e013bc "", array
= ""}, d = 0x8ef3be0}
#10 0x007dc061 in QSqlResult::savePrepare (this=0x90a8ef0, query=...) at
kernel/qsqlresult.cpp:597
No locals.
#11 0x007ccf80 in QSqlQuery::prepare (this=0xb3aa2aec, query=...) at
kernel/qsqlquery.cpp:920
No locals.
#12 0x0058bbae in MSqlQuery::prepare (this=0xb3aa2aec, query=...) at
mythdbcon.cpp:627
        ok = true
#13 0x005c340f in SimpleDBStorage::Load (this=0x93e646c) at
mythstorage.cpp:14
        query = {<QSqlQuery> = {d = 0x92225e0}, m_db = 0x8c55268,
m_isConnected = true, m_returnConnection = true, m_last_prepared_query{static null = {<No data fields>}, static shared_null = {ref = {_q_value
= 3398}, alloc = 0, size = 0, data = 0x1e0147a, clean = 0, simpletext0, righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, static shared_empty = {ref = {_q_value = 495}, alloc = 0, size0, data = 0x1e0148e, clean = 0, simpletext = 0, righttoleft = 0,
asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x92f3230,
static codecForCStrings = 0x0}}
        bindings = {{d = 0x8ef43c8, e = 0x8ef43c8}}
#14 0x03862e0b in ConfigurationGroup::Load (this=0x94a92d8) at
mythconfiggroups.cpp:91
        it = {_M_current = 0x8d2bee0}
#15 0x03862e0b in ConfigurationGroup::Load (this=0x9001b00) at
mythconfiggroups.cpp:91
        it = {_M_current = 0x8f9abfc}
#16 0x01215440 in ConfigurationDialog::Load (this=0xb3aa2e64) at
../libmyth/mythconfigdialogs.h:110
No locals.
#17 0x011d525d in RecordingProfile::loadByID (this=0xb3aa2e5c,
profileId=17) at recordingprofile.cpp:1299
        result = {<QSqlQuery> = {d = 0x8f5fd78}, m_db = 0xa992cb60,
m_isConnected = true, m_returnConnection = true, m_last_prepared_query{static null = {<No data fields>}, static shared_null = {ref = {_q_value
= 3398}, alloc = 0, size = 0, data = 0x1e0147a, clean = 0, simpletext0, righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, static shared_empty = {ref = {_q_value = 495}, alloc = 0, size0, data = 0x1e0148e, clean = 0, simpletext = 0, righttoleft = 0,
asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x9040428,
static codecForCStrings = 0x0}}
        type = {static null = {<No data fields>}, static shared_null{ref = {_q_value = 3398}, alloc = 0, size = 0, data = 0x1e0147a, clean0, simpletext = 0, righttoleft = 0, asciiCache = 0, capacity = 0,
reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value495}, alloc = 0, size = 0, data = 0x1e0148e, clean = 0, simpletext = 0,
righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, d = 0x9342c88, static codecForCStrings = 0x0}
#18 0x011d588c in RecordingProfile::loadByType (this=0xb3aa2e5c,
name=..., cardtype=...) at recordingprofile.cpp:1356
        hostname = {static null = {<No data fields>}, static shared_null
= {ref = {_q_value = 3398}, alloc = 0, size = 0, data = 0x1e0147a, clean
= 0, simpletext = 0, righttoleft = 0, asciiCache = 0, capacity = 0,
reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value495}, alloc = 0, size = 0, data = 0x1e0148e, clean = 0, simpletext = 0,
righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, d = 0x93db408, static codecForCStrings = 0x0}
        recid = 17
        result = {<QSqlQuery> = {d = 0x9078e40}, m_db = 0xa9937d40,
m_isConnected = true, m_returnConnection = true, m_last_prepared_query{static null = {<No data fields>}, static shared_null = {ref = {_q_value
= 3398}, alloc = 0, size = 0, data = 0x1e0147a, clean = 0, simpletext0, righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, static shared_empty = {ref = {_q_value = 495}, alloc = 0, size0, data = 0x1e0148e, clean = 0, simpletext = 0, righttoleft = 0,
asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x9019bb0,
static codecForCStrings = 0x0}}
#19 0x0169e71f in load_profile (cardtype=..., tvchain=0x0,
rec=0x92d6cf0, profile=...) at tv_rec.cpp:4015
        profileName = {static null = {<No data fields>}, static
shared_null = {ref = {_q_value = 3398}, alloc = 0, size = 0, data0x1e0147a, clean = 0, simpletext = 0, righttoleft = 0, asciiCache = 0,
capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref{_q_value = 495}, alloc = 0, size = 0, data = 0x1e0148e, clean = 0,
simpletext = 0, righttoleft = 0, asciiCache = 0, capacity = 0, reserved
= 0, array = {0}}, d = 0xb3aa2e18, static codecForCStrings = 0x0}
        profileName = DWARF-2 expression error: DW_OP_reg operations
must be used either alone or in conjuction with DW_OP_piece.

b.


Brian J. Murrell Fri, 29 Jul 2011 05:05:09 -0700

It would be nice if I could, but ultimately "real life" just has so much
going on currently that I can't take on a/the project of maintaining my
own myth package{s,ing} across as many machines as I have here.  I
currently rely on mythbuntu's 0.24-fixes for all of that.

And ultimately, currently, my worst case with 0.24-fixes is that I get a
lockup which I can resolve simply by restarting the backend.  Annoying,
yes, but quick to resolve -- even if I do have to do it pretty much
every time we sit down to watch.

With master I risk a regression creeping in that takes more than a
trivial amount of time to fix and in the meanwhile WAF is going even
lower because we are unable to watch for days.  I wouldn't even have an
option to fall-back to 0.24-fixes since my database schema would likely
be incompatible.

I fully appreciate and respect the "work in progress" status of master
and as such, stick to code meant for production (and WAF :-) ).

I do appreciate your input on all of that though.

Cheers,
b.


Brian J. Murrell Fri, 29 Jul 2011 05:15:23 -0700

Hi,

Unfortunately option MYSQL_OPT_RECONNECT also not helping me. Just 1h ago I had deadlock again.
Trace looks famous. Don't have time Yet to find is it exactly scenario identified by David.
Trace, BE log and DB stats/proc lst. attached in trac #9792.


Warpme Fri, 29 Jul 2011 06:24:52 -0700

+1 here.

I would encourage the project leadership to call for a feature freeze on
v0.25pre today, and then stabilize, resolve issues, and ship it, while
simultaneously opening a v0.26pre branch for larger existing or ongoing
work.

Current master is just that solid.  I tried both master and 0.24-fixes
recently for my new clean install from scratch.  The comparison wasn't even
fair, in terms of usability, quality, and stability.

$0.02,
Eric


E. Westbrook Fri, 29 Jul 2011 12:06:25 -0700

Recently I got idea to switch to different code paths in MySQL clientlib.
This is result of one from my current hypothesis assuming that myth is
putting "unfortunate" (in sense not yet catched mysql bug) load on MySQL
and issue I have is simple libsqlclient bug.

Assuming this, and fact that bug seems to be in communication code, I
decided to possibly change mysqlclient code execution path.
Simplest move is switch client-server comm. from Unix socket TCP. btw: I
also enable MySQL general logging (huh log has 350M).

System is under test since Yesterday.
Having already +80 rec. I noticed in LOG some MySQL related errors not
seen in past when I had Unix socket comm.

So it is possible that by changing comm. I change code execution path in
way that now mysqlclient is not hanging, and BE has chance to log error.

Indeed, in past hang on traces was usually during mplexid query. As BE
was already hung - this error wasn't logged in BE log. Now it is.

I'm attaching part of BE log with those errors.
I was trying to correlate 8:07:14 & 15:05:18 entries with mysql query
log. Any non-standard things I'm able to catch are something like this:

   659 Prepare    SELECT mplexid FROM channel WHERE chanid = ?
   659 Reset stmt    
   659 Execute    SELECT mplexid FROM channel WHERE chanid = 12805
   659 Close stmt    
  1052 Connect    mythtv@mythtv on mythconverg
  1052 Query     SET NAMES utf8    
  1052 Query    SELECT mplexid FROM channel WHERE chanid = :CHANID
   368 Prepare    SELECT mplexid FROM channel WHERE chanid = ?
   368 Reset stmt    
   368 Execute    SELECT mplexid FROM channel WHERE chanid = 13101
   368 Close stmt    
     5 Prepare    SELECT mplexid FROM channel WHERE chanid = ?
     5 Reset stmt    
     5 Execute    SELECT mplexid FROM channel WHERE chanid = 12805
     5 Close stmt    
     7 Prepare    SELECT mplexid FROM channel WHERE chanid = ?
     7 Reset stmt    
     7 Execute    SELECT mplexid FROM channel WHERE chanid = 13101
     7 Close stmt    
     6 Prepare    SELECT mplexid FROM channel WHERE chanid = ?
     6 Reset stmt    
     6 Execute    SELECT mplexid FROM channel WHERE chanid = 12805
     6 Close stmt

It looks like at 1052 there was DB reconnect.
Fortunately clientlib not hang here.

BE log reported DB error is 1064 which means SQL syntax error. I don't
know is it falsely reported by MySQL as SQL syntax issue - but in
reality is related to reconnect, or it is rather sign of myth bug.
I think it is reconnect result but somebody having expierience with
mysql logs can say more here.
Anyway, why here we have ":CHANID" instead of concrete number ?

-br

---------------------------------------------------

2011-07-30 15:15:06.624277 I [4660/4677] TVRecEvent tv_rec.cpp:3960 (TuningNewRecorder) - TVRec(6): rec->GetPathname(): '/myth/tv/21109_20110730151500.mpg'
2011-07-30 15:15:07.368863 I [4660/4687] Scheduler scheduler.cpp:2008 (HandleReschedule) - Reschedule requested for id -1.
2011-07-30 15:15:37.743362 I [4660/4688] HouseKeeping housekeeper.cpp:225 (RunHouseKeeping) - Running housekeeping thread
2011-07-30 15:16:00.224076 E [4660/4687] Scheduler mythdbcon.cpp:771 (prepare) - Error preparing query: SELECT mplexid FROM channel WHERE chanid = :CHANID
2011-07-30 15:16:00.224110 E [4660/4687] Scheduler mythdbcon.cpp:773 (prepare) - Driver error was [2/1064]:
QMYSQL3: Unable to prepare statement
Database error was:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELE-' at line 1

2011-07-30 15:16:00.229599 E [4660/4687] Scheduler mythdb.cpp:178 (DBError) - DB Error (QueryMplexID):
Query was:
SELECT mplexid FROM channel WHERE chanid = :CHANID
Bindings were:
:CHANID=12802
Driver error was [2/1064]:
QMYSQL: Unable to execute query
Database error was:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ':CHANID' at line 1

2011-07-30 15:16:03.205256 I [4660/4687] Scheduler scheduler.cpp:2066 (HandleReschedule) - Scheduled 516 items in 55.4 = 12.28 match + 43.10 place
2011-07-30 15:16:59.404465 I [4660/4682] EIT eitscanner.cpp:124 (RunEventLoop) - EITScanner (9): Added 89 EIT Events

---------------------------------------------------

2011-07-31 08:00:01.406633 I [4660/4675] TVRecEvent eitscanner.cpp:210 (StartPassiveScan) - EITScanner (5): Started passive scan.
2011-07-31 08:00:01.422118 I [4660/4675] TVRecEvent tv_rec.cpp:3960 (TuningNewRecorder) - TVRec(5): rec->GetPathname(): '/myth/tv/15315_20110731080000.mpg'
2011-07-31 08:00:02.819950 I [4660/4696] ProcessRequest mainserver.cpp:1305 (HandleAnnounce) - MainServer::ANN Monitor
2011-07-31 08:00:02.819965 I [4660/4696] ProcessRequest mainserver.cpp:1307 (HandleAnnounce) - adding: mythtv as a client (events: 0)
2011-07-31 08:02:07.280352 I [4660/4676] EIT eitscanner.cpp:124 (RunEventLoop) - EITScanner (5): Added 788 EIT Events
2011-07-31 08:02:07.282358 I [4660/4687] Scheduler scheduler.cpp:2008 (HandleReschedule) - Reschedule requested for id -1.
2011-07-31 08:02:41.604296 I [4660/4687] Scheduler scheduler.cpp:2066 (HandleReschedule) - Scheduled 529 items in 33.9 = 7.19 match + 26.75 place
2011-07-31 08:03:53.067936 I [4660/4688] HouseKeeping housekeeper.cpp:225 (RunHouseKeeping) - Running housekeeping thread
2011-07-31 08:06:22.440556 I [4660/4676] EIT eitscanner.cpp:124 (RunEventLoop) - EITScanner (5): Added 23 EIT Events
2011-07-31 08:06:22.441243 I [4660/4687] Scheduler scheduler.cpp:2008 (HandleReschedule) - Reschedule requested for id -1.
2011-07-31 08:06:37.589034 N [4660/4689] Expire autoexpire.cpp:263 (CalcParams) - AutoExpire: CalcParams(): Max required Free Space: 12.0 GB w/freq: 14 min
2011-07-31 08:06:38.389587 N [4660/4689] Expire autoexpire.cpp:624 (SendDeleteMessages) - Expiring 421 MB for 12805 at 2011-07-29T11:05:00 => "33 (Ale Kino!)":"Fri Jul 29 11:05:00 2011"
2011-07-31 08:06:59.046034 I [4660/4687] Scheduler scheduler.cpp:2066 (HandleReschedule) - Scheduled 529 items in 36.3 = 7.25 match + 29.01 place
2011-07-31 08:06:59.048620 I [4660/4687] Scheduler scheduler.cpp:2008 (HandleReschedule) - Reschedule requested for id 0.
2011-07-31 08:07:14.695176 E [4660/4687] Scheduler mythdbcon.cpp:771 (prepare) - Error preparing query: SELECT mplexid FROM channel WHERE chanid = :CHANID
2011-07-31 08:07:14.695206 E [4660/4687] Scheduler mythdbcon.cpp:773 (prepare) - Driver error was [2/1064]:
QMYSQL3: Unable to prepare statement
Database error was:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELE-' at line 1

2011-07-31 08:07:14.712440 E [4660/4687] Scheduler mythdb.cpp:178 (DBError) - DB Error (QueryMplexID):
Query was:
SELECT mplexid FROM channel WHERE chanid = :CHANID
Bindings were:
:CHANID=8012
Driver error was [2/1064]:
QMYSQL: Unable to execute query
Database error was:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ':CHANID' at line 1

2011-07-31 08:07:27.292364 I [4660/4687] Scheduler scheduler.cpp:2066 (HandleReschedule) - Scheduled 529 items in 27.9 = 0.00 match + 27.89 place
2011-07-31 08:08:25.256751 I [4660/4676] EIT eitscanner.cpp:124 (RunEventLoop) - EITScanner (5): Added 13 EIT Events
2011-07-31 08:08:54.075082 I [4660/4688] HouseKeeping housekeeper.cpp:225 (RunHouseKeeping) - Running housekeeping thread
2011-07-31 08:11:12.972574 I [4660/4676] EIT eitscanner.cpp:124 (RunEventLoop) - EITScanner (5): Added 2 EIT Events
2011-07-31 08:11:12.973386 I [4660/4687] Scheduler scheduler.cpp:2008 (HandleReschedule) - Reschedule requested for id -1.
2011-07-31 08:11:48.213467 I [4660/4687] Scheduler scheduler.cpp:2066 (HandleReschedule) - Scheduled 529 items in 34.9 = 7.37 match + 27.50 place

---------------------------------------------------

2011-07-31 15:05:02.348898 I  adding: mythtv as a client (events: 0)
2011-07-31 15:05:02.363475 I  MainServer::ANN Monitor
2011-07-31 15:05:02.363488 I  adding: mythtv as a client (events: 1)
2011-07-31 15:05:18.072358 E  Error preparing query: SELECT mplexid FROM channel WHERE chanid = :CHANID
2011-07-31 15:05:18.072394 E  Driver error was [2/1064]:
QMYSQL3: Unable to prepare statement
Database error was:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELE-' at line 1
2011-07-31 15:05:18.073951 E  DB Error (QueryMplexID):
Query was:
SELECT mplexid FROM channel WHERE chanid = :CHANID
Bindings were:
:CHANID=12805
Driver error was [2/1064]:
QMYSQL: Unable to execute query
Database error was:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ':CHANID' at line 1
2011-07-31 15:05:30.464480 I  Reschedule interrupted, will retry
2011-07-31 15:05:30.465030 I  Reschedule requested for id 0.
2011-07-31 15:05:30.465041 I  Reschedule requested for id 0.
2011-07-31 15:05:55.317334 I  Scheduled 505 items in 24.5 = 0.00 match + 24.51 place

---------------------------------------------------


Warpme Sun, 31 Jul 2011 08:09:13 -0700

There are a couple of points worth noting in regard to my lockups...

My lockups typically exhibit:

2011-07-31 10:31:31.179 Protocol version check failure.
            The response to MYTH_PROTO_VERSION was empty.
            This happens when the backend is too busy to respond,
            or has deadlocked in due to bugs or hardware failure.
2011-07-31 10:31:31.180 Unable to find active recorder for this
recording, realtime flagging will not be enabled.

in the BE logs and no FEs can connect at that time.

This invariably happens when recording so it's somehow related to what
goes on during recordings (of course which is heavy on mysql access with
the associated commflagging happening).

I discovered yesterday that if I wait long enough (not sure how long or
what events correlate, yet) the deadlock will resolve itself and FEs can
connect again.

My BE is at this moment locked up and FEs cannot connect.  I am
recording a single program from my HVR-950Q on clearqam.

The stack trace interestingly enough has evidence of EIT scanning:

Thread 28 (Thread 0xb461cb70 (LWP 1802)):
#0  0x008ac422 in __kernel_vsyscall ()
No symbol table info available.
#1  0x0028b342 in pthread_cond_timedwait@@GLIBC_2.3.2 () at
../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_timedwait.S:179
No locals.
#2  0x08c0a20f in QWaitConditionPrivate::wait (this=0x9f30f98,
mutex=0x9f30f78, time=400) at thread/qwaitcondition_unix.cpp:85
        tv = {tv_sec = 1312123834, tv_usec = 503275}
        ti = {tv_sec = 1312123834, tv_nsec = 903275000}
        code = <value optimized out>
#3  QWaitCondition::wait (this=0x9f30f98, mutex=0x9f30f78, time=400) at
thread/qwaitcondition_unix.cpp:159
        returnValue = <value optimized out>
#4  0x0100dc44 in EITScanner::RunEventLoop (this=0x9f30f78) at
eitscanner.cpp:159
        list_size = 0
        rate = 1
        sz = {2000, 1800, 1600, 1400, 1200}
        rt = {0, 0.200000003, 0.400000006, 0.600000024, 0.800000012}
        t = {m_timer = {mds = -1}, m_running = false}
        eitCount = 0
#5  0x0100ca78 in EITThread::run (this=0x9f30f88) at eitscanner.cpp:30
No locals.
#6  0x08c0932e in QThreadPrivate::start (arg=0x9f30f88) at
thread/qthread_unix.cpp:248
        data = 0x9f3aba0
#7  0x0028696e in start_thread (arg=0xb461cb70) at pthread_create.c:300
        __res = <value optimized out>
        __ignore1 = <value optimized out>
        __ignore2 = <value optimized out>
        pd = 0xb461cb70
        now = <value optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {2715636, 0, 4001536,
-1268661208, 1607656650, -863626846}, mask_was_saved = 0}}, priv = {pad
= {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <value optimized out>
        robust = <value optimized out>
        freesize = <value optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#8  0x020caa4e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
No locals.

but I have disabled EIT scanning per another theory about these lockups.

Cheers,
b.


Brian J. Murrell Sun, 31 Jul 2011 08:24:54 -0700

Likely because we issued a prepare call, then got the DB reconnect, so
we lost the prepared statement and the Qt MySQL drivers issued the
prepared-statement SQL (with placeholders) as if it were normal SQL.

Mike


Michael T. Dean Sun, 31 Jul 2011 11:43:12 -0700

Nice. It looks like we are narrowing problem. So now Q is why I'm
loosing connection to DB.
More and more it looks like issue of clientlib.
One thing intriguing me: why there is no other reports about such issue.
I saw some reports about hangs is clientlib - but all they were related
to noticeable events in comm.infrastructure.
It will be good to somehow log activity of this lib.
Do anybody has idea how to do that ?
We need longstanding on-line logging, not debugging as issue is sporadic
and usually I need to hunt day or two for it.
br


Warpme Sun, 31 Jul 2011 13:31:57 -0700

Hmm. It looks like my earlier post about possible workaround by
switching client-server comm. from socket to TCP is not working.
Today I had another deadlock with famous stack trace. DB log hasn't any
non-standard entries.

So what will be most reasonable as next steep:
1.try upgrade mysql to 5.5.14
2.pray for miracle that somebody finds[hints] solution
3.prepare to change OS (OMG)

br


Warpme Mon, 01 Aug 2011 08:34:20 -0700

I=92ve experienced a few deadlocks recently and opened ticket #9951.
There are 2 backtraces attached, would someone have a second to look
at those to see if my deadlocks are related to mysql as well?  I'm
happy to assist with testing/providing info as I can.

I have a general idea of how to interpret the backtrace but don=92t know
how to identify which thread(s) actually caused the deadlock.

Thanks
Dave


Dave Badia Tue, 02 Aug 2011 17:43:51 -0700

Well, by definition, a deadlock has to involve more than one thread.

being discussed in this thread, in particular with mysql database deadlocks.

Cheers,
b.


Brian J. Murrell Wed, 03 Aug 2011 04:02:04 -0700

Would any of the linked FOSS tools help in tracking this down?

http://www.infoworld.com/d/data-management/10-essential-mysq[..]

Doug

--

Ben Franklin quote:

"Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety."


Doug Lytle Wed, 03 Aug 2011 06:45:49 -0700

I believe commit [4dfcdb8dd] plus the last patch on this ticket:
    http://code.mythtv.org/trac/ticket/5501
should resolve the problem completely.

Once [4dfcdb8dd] has gotten testing on a number of people's systems we
can backport it to 0.24-fixes.

The patch in the ticket makes the SQL connections per thread connections
as required by the Qt4 API while the commit updates our SQL reconnection
logic to work properly with Qt4, which looks like the primary problem.

These fix would not have been possible without the error reports, plus
the discussion here, plus Michael T. Dean's encyclopedic memory and
unmatched google-fu. Thanks All.

-- Daniel


Daniel Kristjansson Thu, 04 Aug 2011 13:30:26 -0700

Thought I'd mention that Daniel K just pushed a change that reworks
reconnections.  It removes the MySQL-library provided auto-reconnect
that I just put in (for the exact same reasons MySQL changed their API
to disable auto-reconnect--because it can't work with certain
application designs, and MythTV has changed since 0.21 to a design where
auto-reconnect cannot work).  So there's no need for further testing
that patch.

We have high hopes that this will work around the frequent MySQL
connection drop-out issues some users were seeing.  And I'd still be
very interested to hear details if anyone can identify the specific
MySQL server/client library configuration that's causing these frequent
drop outs.

If you're running master, please pull an update (to 4dfcdb8dd0c80 or
later), and please report back if it helps.  For those of you on
0.24-fixes, I will try to create a backport patch and post it here for
testing later today.  We plan to let this sit in unstable for a bit of
testing and to prove it actually works before pushing it to -fixes, though.

Mike


Michael T. Dean Thu, 04 Aug 2011 13:39:47 -0700

Indeed.  I saw it already and am drooling over it.  :-)

Not as high as mine, let me assure you.  :-)

You post it and I will build and run it.

Totally understandable and fair.  I don't mind being a guinea pig for
this.  It can't really be worse than the current situation (OK.  It
*could* be, but is not likely to be).

Cheers and many thanks!

b.


Brian J. Murrell Thu, 04 Aug 2011 13:44:07 -0700

I posted a lightly-tested backport on http://code.mythtv.org/trac/ticket/9704  .  You can download it at:

http://code.mythtv.org/trac/raw-attachment/ticket/9704/mytht[..]

It should be applied in the top-level of the mythtv checkout (the
directory that contains mythtv and mythplugins directories--since it
changes both mythtv and mythplugins):

patch -Np1 -i ~/mythtv-0.24-backport_reconnect_fixes.patch

Note that because of changes to mythplugins, you will need to recompile
plugins.  Also, the patch includes a binary/API version change--so it
will enforce the requirement to recompile plugins.  (There are no
changes to DB schema version or protocol version, though.)

Thanks to all those who test it.

Mike


Michael T. Dean Thu, 04 Aug 2011 21:15:31 -0700

Daniel, Mike,

Milion thx You are working on this issue !
At some point of time I was almost sure issue is my specific and I can't
expect You will loose time on this particular one.

mine. Taking into account mysql popularity - it seems like mine problem
root cause isn't within mysqlclient lib itself but rather in
environment/app which uses this lib.

Last days I upgrade switch back to MyISAM+upgrade to 5.5.15. With this
change - I still have at least deadlock per day (with famous stack trace).
So by this, 2 days ago I decided to do in one steep full
reinstall+upgrade ALL system packages. Now I'm on tests phase.
As I'm trying to find root cause - I want to test one change at time, so
I will apply Daniel's commits as soon as I will have first deadlock.

BTW: What I observe: this lib is reentrant and I see, during hang, other
treads successfully accessing DB. It looks like lib hangs are only in
context of given thread.
Looking for hangs in vio_net I found interesting thread on mysql forums: http://bugs.mysql.com/bug.php?id=33384
It is related to lib crash not hang but with quite similar stack trace -
so maybe it is somehow correlated ?

br


Warpme Fri, 05 Aug 2011 00:45:21 -0700

Unfortunately those commits completely hose LiveTV for me.  See http://code.mythtv.org/trac/ticket/9964

Maybe my setup may be problematic - FE & BE on one box but mysql
(5.1.41) on a separate Ubuntu 10.04 LTS server.

-- Lawrence


Lawrence Rust Fri, 05 Aug 2011 02:20:56 -0700

I can still immediately reproduce the same deadlock I've been seeing 95% of
the time when I start mythfrontend of late, even with the back-ported patch.

2011-08-05 09:13:49.857 MythCoreContext: Connecting to backend server: xxx.xxx.xxx.xxx:6543 (try 1 of 1)
2011-08-05 09:13:56.861 MythSocket(21f1ea0:50): readStringList: Error, timed out after 7000 ms.
2011-08-05 09:13:56.861 Protocol version check failure.
                        The response to MYTH_PROTO_VERSION was empty.
                        This happens when the backend is too busy to respond,
                        or has deadlocked in due to bugs or hardware failure.
2011-08-05 09:13:56.861 Unable to determine master backend time zone settings.  If those settings differ from local settings, some functionality will fail.

--
Jarod Wilson
jarod*******


Jarod Wilson Fri, 05 Aug 2011 06:16:36 -0700

Hi,

Probably this not so helpful - but it think it is worth to mention.
Being on 0.24 I had regularly this type of error.
I believe however only part of them was related to DB.
Other part I believe was related to mythproto server implementation.
At some point in time mythproto server was totally rewritten in master
and since that time I not experienced single deadlock with
"MYTH_PROTO_VERSION was empty" - beside those which were related to DB.
May You look on stack trace during such deadlock and check is it related
to DB access hang ?
Call stack usually looks like in my OP.
It think will be helpful as I believe devs prefer to address various
deadlocks in structured way.
This thread is devoted to DB area and recent Daniel's commits are
targeted only to this area.
Worst thing may happen here when potentially misleading feedback lead to
changes in proper code....
Thx


Warpme Fri, 05 Aug 2011 09:24:38 -0700

I seem to get both, or they are related and/or one and the same.

That is, I get the:

2011-08-05 09:13:56.861 Protocol version check failure.
                         The response to MYTH_PROTO_VERSION was empty.
                         This happens when the backend is too busy to
respond,
                         or has deadlocked in due to bugs or hardware
failure.

errors on my backend when a frontend tries to connect after the backend
has begun recording(s).

But I also see those DB threads in my stack traces.

b.


Brian J. Murrell Fri, 05 Aug 2011 09:42:11 -0700

Lawrence, there appears to be an issue with the demo recorder you are
using or the configuration thereof. I've committed something that
should restore the old behaviour in that case, but still print out
an error message so you can still trace the source of the issue.
Basically, the DB schema does not allow for the recorder input to be
NULL, but somehow it is in that insert.

-- Daniel


Daniel Kristjansson Fri, 05 Aug 2011 12:01:19 -0700

So I applied this and just finished with a couple of recordings.  A new
artifact since applying this patch (on top of v0.24.1-61-gcccad82) is
that a recording failed due to running out of disk on the storage directory.

This has _never_ happened before as the free space manager always
maintains enough free space to allow even multiple recordings to a
single storage directory occur.

This happened more than 30 minutes ago leaving only 4KB in the storage
directory:

$ df -k /storage1
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdb             480721640 456302308         4 100% /storage1

and even now, more than 30 minutes later, nothing has been removed to
free up space.

AutoExpire has been running since I applied that patch, but not since
that storage directory filled up at ~16:30:

2011-08-05 11:59:13.835 AutoExpire: CalcParams(): Max required Free
Space: 1.0 GB w/freq: 15 min
2011-08-05 12:14:14.114 AutoExpire: CalcParams(): Max required Free
Space: 1.0 GB w/freq: 15 min
2011-08-05 12:29:14.746 AutoExpire: CalcParams(): Max required Free
Space: 1.0 GB w/freq: 15 min
2011-08-05 12:44:15.204 AutoExpire: CalcParams(): Max required Free
Space: 1.0 GB w/freq: 15 min
2011-08-05 12:59:15.658 AutoExpire: CalcParams(): Max required Free
Space: 1.0 GB w/freq: 15 min
2011-08-05 13:14:16.116 AutoExpire: CalcParams(): Max required Free
Space: 1.0 GB w/freq: 15 min
2011-08-05 13:29:16.575 AutoExpire: CalcParams(): Max required Free
Space: 1.0 GB w/freq: 15 min
2011-08-05 13:44:17.031 AutoExpire: CalcParams(): Max required Free
Space: 1.0 GB w/freq: 15 min
2011-08-05 13:59:17.490 AutoExpire: CalcParams(): Max required Free
Space: 1.0 GB w/freq: 15 min
2011-08-05 14:14:17.952 AutoExpire: CalcParams(): Max required Free
Space: 1.0 GB w/freq: 15 min
2011-08-05 15:27:40.096 AutoExpire: CalcParams(): Max required Free
Space: 1.0 GB w/freq: 15 min
2011-08-05 15:58:13.429 AutoExpire: CalcParams(): Max required Free
Space: 1.0 GB w/freq: 15 min
2011-08-05 16:00:09.970 AutoExpire: CalcParams(): Max required Free
Space: 3.0 GB w/freq: 14 min
2011-08-05 16:00:10.932 AutoExpire: CalcParams(): Max required Free
Space: 3.0 GB w/freq: 9 min
2011-08-05 16:13:14.288 AutoExpire: CalcParams(): Max required Free
Space: 3.0 GB w/freq: 9 min

Thoughts?

b.


Brian J. Murrell Fri, 05 Aug 2011 14:08:20 -0700

And it's still not been run, now 1h15m since it was last run.  It's
clearly stuck.

Imagine my surprise at this thread's stack trace:

Thread 11 (Thread 0xaebe0b70 (LWP 13581)):
#0  0x00eea422 in __kernel_vsyscall ()
No symbol table info available.
#1  0x00acaf5b in read () at ../sysdeps/unix/syscall-template.S:82
No locals.
#2  0x03f4aab8 in vio_read () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#3  0x03f4ab16 in vio_read_buff () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#4  0x03f4bd13 in ?? () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#5  0x03f4c0a7 in my_net_read () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#6  0x03f45a3f in cli_safe_read () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#7  0x03f46185 in ?? () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#8  0x03f152a6 in cli_stmt_execute () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#9  0x03f12a9f in mysql_stmt_execute () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#10 0x0280dba2 in QMYSQLResult::exec (this=0xa682ce58) at
../../../sql/drivers/mysql/qsql_mysql.cpp:1014
        timeVector = {{d = 0xa683c2b8, p = 0xa683c2b8}}
        stringVector = warning: can't find linker symbol for virtual
table for `QVector<QByteArray>' value
warning:   found `QString::shared_empty' instead
{{d = 0x3e8a490, p = 0x3e8a490}}
        nullVector = {{d = 0xa68251c0, p = 0xa68251c0}}
        values = {{d = 0xa6827d68, p = 0xa6827d68}}
        r = <value optimized out>
        currBind = 0xa6850e70
#11 0x00dd442c in QSqlQuery::exec (this=0xaebdf9b8) at
kernel/qsqlquery.cpp:939
No locals.
#12 0x00c5f1f3 in MSqlQuery::exec (this=0xaebdf9b8) at mythdbcon.cpp:504
        result = 3
#13 0x088c4096 in ProgramInfo::LoadProgramFromRecorded (this=0xa6828038,
_chanid=1019, _recstartts=...) at programinfo.cpp:1574
        query = {<QSqlQuery> = {d = 0xa68533c0}, m_db = 0x9c366e8,
m_isConnected = true, m_returnConnection = true, m_last_prepared_query{static null = {<No data fields>}, static shared_null = {ref = {_q_value
= 3812}, alloc = 0, size = 0, data = 0x3e8a47a, clean = 0, simpletext0, righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, static shared_empty = {ref = {_q_value = 632}, alloc = 0, size0, data = 0x3e8a48e, clean = 0, simpletext = 0, righttoleft = 0,
asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d0xa6844a50, static codecForCStrings = 0x0}}
        is_reload = 8
        new_basename = {static null = {<No data fields>}, static
shared_null = {ref = {_q_value = 3812}, alloc = 0, size = 0, data0x3e8a47a, clean = 0, simpletext = 0, righttoleft = 0, asciiCache = 0,
capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref{_q_value = 632}, alloc = 0, size = 0, data = 0x3e8a48e, clean = 0,
simpletext = 0, righttoleft = 0, asciiCache = 0, capacity = 0, reserved
= 0, array = {0}}, d = 0xa6800010, static codecForCStrings = 0x0}
#14 0x088abe28 in ProgramInfo (this=0xa6828038, _chanid=1019,
_recstartts=...) at programinfo.cpp:234
No locals.
#15 0x08078ff6 in AutoExpire::FillDBOrdered (this=0x993ada8,
expireList=..., expMethod=2) at autoexpire.cpp:980
        pginfo = 0xa683ada0
        chanid = 1019
        recstartts = {d = {d = 0xa6824850}}
        querystr = {static null = {<No data fields>}, static shared_null
= {ref = {_q_value = 3812}, alloc = 0, size = 0, data = 0x3e8a47a, clean
= 0, simpletext = 0, righttoleft = 0, asciiCache = 0, capacity = 0,
reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value632}, alloc = 0, size = 0, data = 0x3e8a48e, clean = 0, simpletext = 0,
righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, d = 0xa68513b8, static codecForCStrings = 0x0}
        where = {static null = {<No data fields>}, static shared_null{ref = {_q_value = 3812}, alloc = 0, size = 0, data = 0x3e8a47a, clean0, simpletext = 0, righttoleft = 0, asciiCache = 0, capacity = 0,
reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value632}, alloc = 0, size = 0, data = 0x3e8a48e, clean = 0, simpletext = 0,
righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, d = 0xa681c680, static codecForCStrings = 0x0}
        orderby = {static null = {<No data fields>}, static shared_null
= {ref = {_q_value = 3812}, alloc = 0, size = 0, data = 0x3e8a47a, clean
= 0, simpletext = 0, righttoleft = 0, asciiCache = 0, capacity = 0,
reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value632}, alloc = 0, size = 0, data = 0x3e8a48e, clean = 0, simpletext = 0,
righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, d = 0xa684b0f0, static codecForCStrings = 0x0}
        msg = {static null = {<No data fields>}, static shared_null{ref = {_q_value = 3812}, alloc = 0, size = 0, data = 0x3e8a47a, clean0, simpletext = 0, righttoleft = 0, asciiCache = 0, capacity = 0,
reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value632}, alloc = 0, size = 0, data = 0x3e8a48e, clean = 0, simpletext = 0,
righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, d = 0xa684bd60, static codecForCStrings = 0x0}
        maxAge = 65576948
        query = {<QSqlQuery> = {d = 0xa681cc40}, m_db = 0xa68012f0,
m_isConnected = true, m_returnConnection = true, m_last_prepared_query{static null = {<No data fields>}, static shared_null = {ref = {_q_value
= 3812}, alloc = 0, size = 0, data = 0x3e8a47a, clean = 0, simpletext0, righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, static shared_empty = {ref = {_q_value = 632}, alloc = 0, size0, data = 0x3e8a48e, clean = 0, simpletext = 0, righttoleft = 0,
asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d0xa68513b8, static codecForCStrings = 0x0}}
#16 0x08076b45 in AutoExpire::FillExpireList (this=0x993ada8,
expireList=...) at autoexpire.cpp:760
        expMethod = 2
#17 0x0806f5b4 in AutoExpire::ExpireRecordings (this=0x993ada8) at
autoexpire.cpp:387
        fsit = {_M_current = 0x0}
        fsMap = {{d = 0xa681caa0, e = 0xa681caa0}}
        expireList = {<std::_Vector_base<ProgramInfo*,
std::allocator<ProgramInfo*> >> = {_M_impl{<std::allocator<ProgramInfo*>>{<__gnu_cxx::new_allocator<ProgramInfo*>> = {<No data fields>}, <No data
fields>}, _M_start = 0xa6846f00, _M_finish = 0xa6847158,
_M_end_of_storage = 0xa6847300}}, <No data fields>}
        deleteList = {<std::_Vector_base<ProgramInfo*,
std::allocator<ProgramInfo*> >> = {_M_impl{<std::allocator<ProgramInfo*>>{<__gnu_cxx::new_allocator<ProgramInfo*>> = {<No data fields>}, <No data
fields>}, _M_start = 0x0, _M_finish = 0x0, _M_end_of_storage = 0x0}},
<No data fields>}
        fsInfos = {<std::_Vector_base<FileSystemInfo,
std::allocator<FileSystemInfo> >> = {_M_impl{<std::allocator<FileSystemInfo>>{<__gnu_cxx::new_allocator<FileSystemInfo>> = {<No data fields>}, <No
data fields>}, _M_start = 0xa681c180, _M_finish = 0xa681c1e8,
_M_end_of_storage = 0xa681c1e8}}, <No data fields>}
        truncateMap = {{d = 0x1dcdff4, e = 0x1dcdff4}}
        query = {<QSqlQuery> = {d = 0xaebdffa8}, m_db = 0x81b6ff4,
m_isConnected = 156, m_returnConnection = 96, m_last_prepared_query{static null = {<No data fields>}, static shared_null = {ref = {_q_value
= 3812}, alloc = 0, size = 0, data = 0x3e8a47a, clean = 0, simpletext0, righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, static shared_empty = {ref = {_q_value = 632}, alloc = 0, size0, data = 0x3e8a48e, clean = 0, simpletext = 0, righttoleft = 0,
asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x3d0f00,
static codecForCStrings = 0x0}}
#18 0x0806e835 in AutoExpire::RunExpirer (this=0x993ada8) at
autoexpire.cpp:310
        timer = {mds = 58394290}
        curTime = {d = {d = 0xa6822788}}
        next_expire = {d = {d = 0xa68fe500}}
#19 0x0807452b in AutoExpire::ExpirerThread (param=0x993ada8) at
autoexpire.cpp:617
        expirer = 0x993ada8
#20 0x00ac396e in start_thread (arg=0xaebe0b70) at pthread_create.c:300
        __res = <value optimized out>
        __ignore1 = <value optimized out>
        __ignore2 = <value optimized out>
        pd = 0xaebe0b70
        now = <value optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {11354100, 0,
4001536, -1363278808, 703299828, 228556200}, mask_was_saved = 0}}, priv
= {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0,
canceltype = 0}}}
        not_first_call = <value optimized out>
        robust = <value optimized out>
        freesize = <value optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#21 0x01d45a4e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

So, looks like this latest mysql patch is not fixing the problem.

Cheers,
b.


Brian J. Murrell Fri, 05 Aug 2011 14:42:00 -0700

Thanks for making this change - it restores LiveTV again from my import
recorder and enables me to continue testing.  However, the BE produces
this error messge:

2011-08-06 10:04:58.926519 E  Original query failed, but resend with
empty strings in place of NULL strings worked.
DB Error (MSqlQuery):

The full log output from both the FE & BE can be seen here: http://www.softsystem.co.uk/download/mythtv/mythdb-nullstrin[..]

I checked my BE setup and found nothing odd so I deleted all capture
cards, video sources, connections and channels and re-created the import
and demo recorders but got the same message.

--
Lawrence


Lawrence Rust Sat, 06 Aug 2011 01:43:23 -0700

Yeah, this is definitely not helping.  Even with this patch applied I am
still getting deadlocks (which prevent clients connecting) along with
the dreaded:

2011-08-06 11:00:47.635 MythCoreContext: Connecting to backend server:
10.75.22.2:6543 (try 1 of 5)
2011-08-06 11:00:47.674 MythCoreContext: Connecting to backend server:
10.75.22.2:6543 (try 1 of 5)
2011-08-06 11:00:54.641 MythSocket(9c9f428:11): readStringList: Error,
timed out after 7000 ms.
2011-08-06 11:00:54.643 Protocol version check failure.
            The response to MYTH_PROTO_VERSION was empty.
            This happens when the backend is too busy to respond,
            or has deadlocked in due to bugs or hardware failure.

With the familiar stacktrace:

Thread 18 (Thread 0xb05fab70 (LWP 2051)):
#0  0x00c20416 in __kernel_vsyscall ()
No symbol table info available.
#1  0x0050552b in read () at ../sysdeps/unix/syscall-template.S:82
No locals.
#2  0x0548fbf8 in vio_read () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#3  0x0548fc56 in vio_read_buff () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#4  0x05490cfb in ?? () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#5  0x0549186a in my_net_read () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#6  0x05489d3f in cli_safe_read () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#7  0x0545ab4e in cli_read_prepare_result () from
/usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#8  0x0545af01 in mysql_stmt_prepare () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#9  0x034ccd0f in QMYSQLResult::prepare (this=0xa30645c0, query=...) at
../../../sql/drivers/mysql/qsql_mysql.cpp:900
        r = <value optimized out>
        encQuery = {static shared_null = {ref = {_q_value = 41}, alloc0, size = 0, data = 0x25c73c4 "", array = ""}, static shared_empty{ref = {_q_value = 4}, alloc = 0, size = 0, data = 0x25c73d8 "", array""}, d = 0xa31c8338}
#10 0x06cc506a in QSqlResult::savePrepare (this=0xa30645c0, query=...)
at kernel/qsqlresult.cpp:600
No locals.
#11 0x06cb75c8 in QSqlQuery::prepare (this=0xb05f99b8, query=...) at
kernel/qsqlquery.cpp:924
No locals.
#12 0x00601029 in MSqlQuery::prepare (this=0xb05f99b8, query=...) at
mythdbcon.cpp:667
        ok = 8
#13 0x0889cf37 in ProgramInfo::LoadProgramFromRecorded (this=0xa31bd9b8,
_chanid=1009, _recstartts=...) at programinfo.cpp:1570
        query = {<QSqlQuery> = {d = 0xa8aa42f0}, m_db = 0x8d0c9b0,
m_isConnected = true, m_returnConnection = true, m_last_prepared_query{static null = {<No data fields>}, static shared_null = {ref = {_q_value
= 3678}, alloc = 0, size = 0, data = 0x25c749e, clean = 0, simpletext0, righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, static shared_empty = {ref = {_q_value = 608}, alloc = 0, size0, data = 0x25c74b2, clean = 0, simpletext = 0, righttoleft = 0,
asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d0xa319c738, static codecForCStrings = 0x0}}
        is_reload = 8
        new_basename = {static null = {<No data fields>}, static
shared_null = {ref = {_q_value = 3678}, alloc = 0, size = 0, data0x25c749e, clean = 0, simpletext = 0, righttoleft = 0, asciiCache = 0,
capacity = 0, reserved = 0, array = {0}}, static shared_empty = {ref{_q_value = 608}, alloc = 0, size = 0, data = 0x25c74b2, clean = 0,
simpletext = 0, righttoleft = 0, asciiCache = 0, capacity = 0, reserved
= 0, array = {0}}, d = 0xb2e00010, static codecForCStrings = 0x0}
#14 0x08884e28 in ProgramInfo::ProgramInfo (this=0xa31bd9b8,
_chanid=1009, _recstartts=...) at programinfo.cpp:234
No locals.
#15 0x08078ff6 in AutoExpire::FillDBOrdered (this=0x8ccd4b8,
expireList=..., expMethod=10003) at autoexpire.cpp:980
        pginfo = 0xa304e9f8
        chanid = 1009
        recstartts = {d = {d = 0xa2f40a28}}
        querystr = {static null = {<No data fields>}, static shared_null
= {ref = {_q_value = 3678}, alloc = 0, size = 0, data = 0x25c749e, clean
= 0, simpletext = 0, righttoleft = 0, asciiCache = 0, capacity = 0,
reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value608}, alloc = 0, size = 0, data = 0x25c74b2, clean = 0, simpletext = 0,
righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, d = 0xb2ed2c70, static codecForCStrings = 0x0}
        where = {static null = {<No data fields>}, static shared_null{ref = {_q_value = 3678}, alloc = 0, size = 0, data = 0x25c749e, clean0, simpletext = 0, righttoleft = 0, asciiCache = 0, capacity = 0,
reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value608}, alloc = 0, size = 0, data = 0x25c74b2, clean = 0, simpletext = 0,
righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, d = 0xa5060228, static codecForCStrings = 0x0}
        orderby = {static null = {<No data fields>}, static shared_null
= {ref = {_q_value = 3678}, alloc = 0, size = 0, data = 0x25c749e, clean
= 0, simpletext = 0, righttoleft = 0, asciiCache = 0, capacity = 0,
reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value608}, alloc = 0, size = 0, data = 0x25c74b2, clean = 0, simpletext = 0,
righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, d = 0xb2eb8658, static codecForCStrings = 0x0}
        msg = {static null = {<No data fields>}, static shared_null{ref = {_q_value = 3678}, alloc = 0, size = 0, data = 0x25c749e, clean0, simpletext = 0, righttoleft = 0, asciiCache = 0, capacity = 0,
reserved = 0, array = {0}}, static shared_empty = {ref = {_q_value608}, alloc = 0, size = 0, data = 0x25c74b2, clean = 0, simpletext = 0,
righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, d = 0xa31a8be8, static codecForCStrings = 0x0}
        maxAge = 39612404
        query = {<QSqlQuery> = {d = 0xa31b1ed8}, m_db = 0xb2e0b4d0,
m_isConnected = true, m_returnConnection = true, m_last_prepared_query{static null = {<No data fields>}, static shared_null = {ref = {_q_value
= 3678}, alloc = 0, size = 0, data = 0x25c749e, clean = 0, simpletext0, righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, static shared_empty = {ref = {_q_value = 608}, alloc = 0, size0, data = 0x25c74b2, clean = 0, simpletext = 0, righttoleft = 0,
asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d0xb2ed2c70, static codecForCStrings = 0x0}}
#16 0x08076b04 in AutoExpire::FillExpireList (this=0x8ccd4b8,
expireList=...) at autoexpire.cpp:753
        expMethod = 2
#17 0x0806f5b4 in AutoExpire::ExpireRecordings (this=0x8ccd4b8) at
autoexpire.cpp:387
        fsit = {_M_current = 0x0}
        fsMap = {{d = 0xa31c3400, e = 0xa31c3400}}
        expireList = {<std::_Vector_base<ProgramInfo*,
std::allocator<ProgramInfo*> >> = {_M_impl{<std::allocator<ProgramInfo*>>{<__gnu_cxx::new_allocator<ProgramInfo*>> = {<No data fields>}, <No data
fields>}, _M_start = 0xa8ac2f10, _M_finish = 0xa8ac2ff4,
_M_end_of_storage = 0xa8ac3010}}, <No data fields>}
        deleteList = {<std::_Vector_base<ProgramInfo*,
std::allocator<ProgramInfo*> >> = {_M_impl{<std::allocator<ProgramInfo*>>{<__gnu_cxx::new_allocator<ProgramInfo*>> = {<No data fields>}, <No data
fields>}, _M_start = 0x0, _M_finish = 0x0, _M_end_of_storage = 0x0}},
<No data fields>}
        fsInfos = {<std::_Vector_base<FileSystemInfo,
std::allocator<FileSystemInfo> >> = {_M_impl{<std::allocator<FileSystemInfo>>{<__gnu_cxx::new_allocator<FileSystemInfo>> = {<No data fields>}, <No
data fields>}, _M_start = 0xa2fca728, _M_finish = 0xa2fca790,
_M_end_of_storage = 0xa2fca790}}, <No data fields>}
        truncateMap = {{d = 0x280fff4, e = 0x280fff4}}
        query = {<QSqlQuery> = {d = 0xb05f9fa8}, m_db = 0x81b6ff4,
m_isConnected = 132, m_returnConnection = 15, m_last_prepared_query{static null = {<No data fields>}, static shared_null = {ref = {_q_value
= 3678}, alloc = 0, size = 0, data = 0x25c749e, clean = 0, simpletext0, righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, static shared_empty = {ref = {_q_value = 608}, alloc = 0, size0, data = 0x25c74b2, clean = 0, simpletext = 0, righttoleft = 0,
asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d = 0x3d0f00,
static codecForCStrings = 0x0}}
#18 0x0806e835 in AutoExpire::RunExpirer (this=0x8ccd4b8) at
autoexpire.cpp:310
        timer = {mds = 35019186}
        curTime = {d = {d = 0xb2e058b8}}
        next_expire = {d = {d = 0xa2fb3d28}}
#19 0x0807452b in AutoExpire::ExpirerThread (param=0x8ccd4b8) at
autoexpire.cpp:617
        expirer = 0x8ccd4b8
#20 0x004fde99 in start_thread (arg=0xb05fab70) at pthread_create.c:304
        pd = 0xb05fab70
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {5300212, 0, 4001536,
-1335909336, -2138265490, -1602815218}, mask_was_saved = 0}}, priv{pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0,
canceltype = 0}}}
        not_first_call = <value optimized out>
        robust = <value optimized out>
        pagesize_m1 = <value optimized out>
        sp = <value optimized out>
        freesize = <value optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#21 0x0278373e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

I even upgrade to Natty on this machine in hopes that a newer mysql
would help.  Clearly it has not.  :-(

b.


Brian J. Murrell Sat, 06 Aug 2011 08:37:06 -0700

If you looked at the change that Daniel put in to make this work, you will
see that this was an intentional message so that the developers will go and
fix the various queries, but have it still "working" in the mean time.

I checked my BE setup and found nothing odd so I deleted all capture

Yes, and you will continue to get it until the queries themselves are fixed.
Relax :)

To quote the commit message:

Note: If you see the DB error message, it really does indicate code that
needs fixing.

It will get fixed, it's only been 21h since that commit went in :)


Gavin Hurlbut Sat, 06 Aug 2011 09:10:19 -0700

expecting.

and that it looked like the cause was a bug in the import recorder, not
a setup issue.

Maybe my bug report #9964 should be reopened as an import recorder
problem.

--
Lawrence


Lawrence Rust Sat, 06 Aug 2011 09:41:42 -0700

It be better as a new ticket that just pasted in the warning you get.
One of the things that commit did was print NULL for null strings
in the debugging and not just a blank, to differentiate it from
blank strings which are allowed in NOT NULL columns.

I think the import recorder problem is just a plain old bug in
light in terms of error detection and avoidance and features have

-- Daniel


Daniel Kristjansson Sat, 06 Aug 2011 17:48:37 -0700

Drat!

I haven't completely given up hope though. With the changes in the
MThread patch we should be fully obeying the Qt restrictions on
QSqlQuery usage. I don't think that could ever be backported to
0.24-fixes. But if it does solve the problem it will give us a
good reason to start stabilizing for a 0.25 release..

-- Daniel


Daniel Kristjansson Sat, 06 Aug 2011 17:52:43 -0700

OK, done here:  http://code.mythtv.org/trac/ticket/9968

--
Lawrence


Lawrence Rust Sun, 07 Aug 2011 00:00:17 -0700

Yeah, only a day after upgrading to master I have a deadlocked (i.e.
doesn't respond to client connection requests) BE again.  The infamous
mysqlclient_r library is implicated, again:

Thread 22 (Thread 0xb2fbdb70 (LWP 2283)):
#0  0x008fa416 in __kernel_vsyscall ()
No symbol table info available.
#1  0x0065652b in read () at ../sysdeps/unix/syscall-template.S:82
No locals.
#2  0x0284dbf8 in vio_read () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#3  0x0284dc56 in vio_read_buff () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#4  0x0284ecfb in ?? () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#5  0x0284f86a in my_net_read () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#6  0x02847d3f in cli_safe_read () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#7  0x0284bdf2 in cli_advanced_command () from
/usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#8  0x028186bb in ?? () from /usr/lib/libmysqlclient_r.so.16
No symbol table info available.
#9  0x09399141 in QMYSQLResult::exec (this=0xa22dc228) at
../../../sql/drivers/mysql/qsql_mysql.cpp:937
        timeVector = {{d = 0x66ec4b4, p = 0x66ec4b4}}
        stringVector = {{d = 0x66ec4b4, p = 0x66ec4b4}}
        nullVector = {{d = 0x66ec4b4, p = 0x66ec4b4}}
        values = {{d = 0xa214c480, p = 0xa214c480}}
        r = 0
#10 0x00789752 in QSqlQuery::exec (this=0xb2fbcf08) at
kernel/qsqlquery.cpp:943
No locals.
#11 0x00374bf5 in MSqlQuery::exec (this=0xb2fbcf08) at mythdbcon.cpp:621
        __FUNCTION__ = "exec"
        result = 6
#12 0x00ff5c3c in CardUtil::GetInputInfo (input=..., groupids=0x0) at
cardutil.cpp:1047
        query = {<QSqlQuery> = {d = 0xa1e11948}, m_db = 0x9bf6b28,
m_isConnected = true, m_returnConnection = true, m_last_prepared_query{static null = {<No data fields>}, static shared_null = {ref = {_q_value
= 5721}, alloc = 0, size = 0, data = 0x66ec49e, clean = 0, simpletext0, righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, static shared_empty = {ref = {_q_value = 11962}, alloc = 0, size0, data = 0x66ec4b2, clean = 0, simpletext = 0, righttoleft = 0,
asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d0xa20435c0, static codecForCStrings = 0x0}}
#13 0x013bf48b in TVRec::IsBusy (this=0x990b510, busy_input=0xb2fbcf9c,
time_buffer=5) at tv_rec.cpp:2421
        list = {<QList<QString>> = {{p = {static shared_null = {ref{_q_value = 4268}, alloc = 0, begin = 0, end = 0, sharable = 1, array{0x0}}, d = 0xa1e29730}, d = 0xa1e29730}}, <No data fields>}
        dummy = {<InputInfo> = {_vptr.InputInfo = 0x16fc828, name{static null = {<No data fields>}, static shared_null = {ref = {_q_value
= 5721}, alloc = 0, size = 0, data = 0x66ec49e, clean = 0, simpletext0, righttoleft = 0, asciiCache = 0, capacity = 0, reserved = 0, array{0}}, static shared_empty = {ref = {_q_value = 11962}, alloc = 0, size0, data = 0x66ec4b2, clean = 0, simpletext = 0, righttoleft = 0,
asciiCache = 0, capacity = 0, reserved = 0, array = {0}}, d0xa2121478, static codecForCStrings = 0x0}, sourceid = 0, inputid = 19,
cardid = 0, mplexid = 0}, chanid = 0}
        chanid = 4111
        pendinfo = {info = 0x0, recordingStart = {d = {d = 0xa2050328}},
hasLaterShowing = false, canceled = false, ask = false, doNotAskfalse, possibleConflicts = {<std::_Vector_base<unsigned int,
std::allocator<unsigned int> >> = {_M_impl = {<std::allocator<unsigned
int>> = {<__gnu_cxx::new_allocator<unsigned int>> = {<No data fields>},
<No data fields>}, _M_start = 0x0, _M_finish = 0x0, _M_end_of_storage0x0}}, <No data fields>}}
        has_pending = false
#14 0x080812b0 in EncoderLink::IsBusy (this=0x98cdce0, busy_input=0x0,
time_buffer=5) at encoderlink.cpp:132
No locals.
#15 0x080f2263 in Scheduler::HandleIdleShutdown (this=0x98e6fd0,
blockShutdown=@0xb2fbd2ed, idleSince=..., prerollseconds=0,
idleTimeoutSecs=300, idleWaitForRecordingTime=5,
statuschanged=*******:2543
        curtime = {d = {d = 0xa91151c8}}
        recording = false
        it = {i = 0x9934940}
        __FUNCTION__ = "HandleIdleShutdown"
#16 0x080ed768 in Scheduler::run (this=0x98e6fd0) at scheduler.cpp:1927
        curtime = {d = {d = 0xa22842f8}}
        statuschanged = false
        secs_to_next = 15
        done = true
        lockit = {val = 160329741}
        prerollseconds = 0
        firstRun = false
        startIter = {_M_cur = 0xa2019b00, _M_first = 0xa2019a08, _M_last
= 0xa2019c08, _M_node = 0x9ad9828}
        maxSleep = 60000
        __FUNCTION__ = "run"
        idleWaitForRecordingTime = 5
        lastSleepCheck = {d = {d = 0xa1e5f4b8}}
        idleSince = {d = {d = 0xa9174228}}
        wakeThreshold = 300
        idleTimeoutSecs = 300
        blockShutdown = false
        schedRunTime = 66
#17 0x064bbda2 in QThreadPrivate::start (arg=0x98e6fd0) at
thread/qthread_unix.cpp:320
        thr = 0x98e6fd0
        data = 0x98e94e0
#18 0x0064ee99 in start_thread (arg=0xb2fbdb70) at pthread_create.c:304
        pd = 0xb2fbdb70
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {6680564, 0, 4001536,
-1292118984, -783331229, -281730298}, mask_was_saved = 0}}, priv = {pad
= {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <value optimized out>
        robust = <value optimized out>
        pagesize_m1 = <value optimized out>
        sp = <value optimized out>
        freesize = <value optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#19 0x02bb573e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130

Doesn't seem to have helped here.

Master seems even worse in fact.  The process is spinning in a tight
loop trying to get a futex and getting an ETIMEDOUT.  It's not even
killable with a SIGTERM or SIGHUP.  It did end up dying with a SIGINT.

So no further ahead, really.

b.


Brian J. Murrell Mon, 08 Aug 2011 15:04:55 -0700

Since you are already using master, I would strongly suggest a git pull, and
then make distclean...  Daniel has applied the ticket from #5501 into master
now, which includes a fairly comprehensive rework of how database
connections are handled with respect to thread-safety, etc.  I would expect
that if the issues were DB-related, the current code in master may well fix
the problem, or at least make it a lot better, and more predictable.

Of course, it *is* the development branch, but you are already on it...
so... :)


Gavin Hurlbut Mon, 08 Aug 2011 15:27:42 -0700

FWIW, I'm using mythbuntu's packages.  I am at da10d33.

How recent?  I see some recent patches on that ticket, but no commit
notices/hashes on it.  I do see in github that he's been quite active in
the last couple of days though.  I'm updating to 8f37c6e in any case.

That would be great.

Yeah.

b.


Brian J. Murrell Mon, 08 Aug 2011 17:40:51 -0700

Ideally you would update to [8e783b75] or later. Binaries with [8f37c6e]
can fail to start depending on what your linker does with some static
initializations, but it shouldn't experience those deadlocks in mysql
if my diagnosis of the problem was correct.

-- Daniel


Daniel Kristjansson Wed, 10 Aug 2011 13:58:38 -0700

Ahhhh.  That's what mythbuntu's binaries must be doing.  Is there a
known fix for it?

b.


Brian J. Murrell Wed, 10 Aug 2011 14:45:41 -0700

Yeah just update to [8e783b75] or later. The ubuntu autobuilds should
pick it up soon if they haven't already.

-- Daniel


Daniel Kristjansson Thu, 11 Aug 2011 12:41:36 -0700

Yes, indeed on both cases.  I am now running 915da4f.  We'll see how
much better this behaves.

Cheers,
b.


Brian J. Murrell Thu, 11 Aug 2011 20:36:26 -0700

Daniel, Mike,

Sorry for long silence - I was on holidays :-)

3 weeks ago I do full system upgrade.
I have configured 33 testing rec.rules   avg 10-15 user rec.rules. This
gives 45-50 rec.per day.

Running 20110803-ge41e314 (upgraded OS but no recent Daniel's mysql
enhancements) for 5 days I observed following:
-so far no 9773 (myth_proto) type deadlocks
-I had one 9792-like (scheduller) deadlock (but with little different
symptoms)
-trace of above deadlock don't have any references to
/usr/lib/libmysqlclient_r.so.16

Next I upgrade 20110807-gda10d33 (with 4dfcdb8: Fix SQL reconnection
logic. Refs #9704. Refs #9773. Refs #9792;).
Running it for 5 day tests:
-no 9773 & 9792 deadlocks
-in those 5 days period I had reported 1 DB successful reconnect.

Next I upgrade to 20110812-g50606cd. This build is running continuously
since 08/12.
-no any deadlocks
-no any DB reconnects in BE log.

My conclusions:

1. It looks like deadlocks with entries referencing to
/usr/lib/libmysqlclient_r.so.16 are more likely results of specific OS
components combination, as after change OS components I wasn't able to
catch them in 10days. Before upgrade I had them avg. 1 per day.
If I read traces correctly - directly involved components on stack trace
are: kernel,libpthread,mysql,Qt & myth.
OS upgrade from 08/03 changed: kernel,libpthread(ad part of glibc) & myth.
I think best potential candidate as root cause is glibc. Upgrade was
from 2.13 to 2.14.
Second candidate is kernel. Upgrade was from 2.6.38.7 to 2.6.39.3.
I plan eventually to do tests with reverted kernel - but I'm not sure
will I found right time to do this as I'm experimenting on production
system.

2. It looks like 4dfcdb8: (Fix SQL reconnection logic) solves deadlocks
like in #9773 & #9792, but reports about successful DB reconnection
tells that there is still place for improvement.

3. 20110812-g50606cd so far works great for me (10d uptime, 500  rec. -
so far no deadlocks nor DB reconnects).
I have plan extend code freeze on this git pull to another 10d and see
how it goes.
Currently only DB related issues I have with 20110812-g50606cd are
reflected in following log entries (avg. 1 per day):

2011-08-22 11:07:27.198329 E DB Error (change_program):
Query was:
UPDATE program SET starttime = ?, endtime = ? WHERE chanid = ? AND
starttime = ?
Bindings were:
:CHANID=12808, :NEWEND 11-08-24T03:40:00, :NEWSTART 11-08-23T17:25:00,
:OLDSTART 11-08-23T16:35:00
Driver error was [2/1062]:
QMYSQL3: Unable to execute statement
Database error was:
Duplicate entry '12808-2011-08-23 17:25:00-0' for key 'PRIMARY'
2011-08-22 11:07:27.239906 E DB Error (change_program):
Query was:
UPDATE program SET starttime = ?, endtime = ? WHERE chanid = ? AND
starttime = ?
Bindings were:
:CHANID“05, :NEWEND 11-08-24T03:40:00, :NEWSTART 11-08-23T17:25:00,
:OLDSTART 11-08-23T16:35:00
Driver error was [2/1062]:
QMYSQL3: Unable to execute statement
Database error was:
Duplicate entry '9305-2011-08-23 17:25:00-0' for key 'PRIMARY'

I want to say recent Daniel's mysql related commits are GREAT for making
myth production/server grade software.

Daniel, really BIG thank You for impressive rightness in problem
diagnosis and v.quick & 100% effective solution.
This is REALLY impressive, especially taking into account that system in
question is remote install without direct access and it has many 3rd
party components.
This is really incredible !.

br


Warpme Wed, 24 Aug 2011 04:47:13 -0700

My long silence was to be sure I was seeing success without trying to
"jnix" it.  I have been considering over the last couple of days to send
an update.

I have been running pretty solid for a week and a half on 20110811.915da4f

I didn't do any other upgrades of system software at the time so I can
quite confidently say that the fixes in the Myth code are what has
returned stability.

How are you accounting for that?  i.e. what are you looking for in the
log so that I can see how many I am experiencing.

I've gotten the following DB errors in the last day or so:

2011-08-23 15:37:22.862807 E [1588/22518] RecThread mythdb.cpp:192
(DBError) - DB Error (Resolution insert):
Query was:
INSERT INTO recordedmarkup    (chanid, starttime, mark, type, data)
VALUES ( ?, ?, ?, ?, ?);
Bindings were:
:CHANID=1045, :DATA=720, :MARK=13215, :STARTTIME 11-08-23T15:30:00,
:TYPE=30
Driver error was [2/1062]:
QMYSQL3: Unable to execute statement
Database error was:
Duplicate entry '1045-2011-08-23 15:30:00-30-13215' for key 'PRIMARY'

2011-08-23 15:37:22.868539 E [1588/22518] RecThread mythdb.cpp:192
(DBError) - DB Error (Resolution insert):
Query was:
INSERT INTO recordedmarkup    (chanid, starttime, mark, type, data)
VALUES ( ?, ?, ?, ?, ?);
Bindings were:
:CHANID=1045, :DATA=480, :MARK=13215, :STARTTIME 11-08-23T15:30:00,
:TYPE=31
Driver error was [2/1062]:
QMYSQL3: Unable to execute statement
Database error was:
Duplicate entry '1045-2011-08-23 15:30:00-31-13215' for key 'PRIMARY'

Absolutely!

Yes, really, really BIG thanks from here too.  My wife was actually
watching a Myth recorded show yesterday, with no assistance from me.

Many many thanks!

b.


Brian J. Murrell Wed, 24 Aug 2011 05:43:49 -0700

Pls look for "MySQL reconnected successfully" string in be logs.
In last 12 days, 600 rec I had 2 of them. Both exactly at the same:

1st:
2011-08-15 06:00:01.626423 I  Tuning recording: "Jak to jest zrobione?":"Jak to jest zrobione? (How It's Made), odc. 2": channel 15315 on cardid 5, sourceid 8
2011-08-15 06:00:01.627545 I  MySQL reconnected successfully

2nd:
2011-08-22 06:00:01.561354 I  Tuning recording: "Jak to jest zrobione?: OÅ‚Ã
³wki/Recykling metalu/Kawa": channel 15315 on cardid 5, sourceid 8
2011-08-22 06:00:01.562367 I  MySQL reconnected successfully

This is inline with symptoms of majority scheduler deadlocks I observed before redesign MySQL reconnect logic: I had bunch of "gueued" recordings and holding queue was opened exactly at 6:00AM.
It looks like there is clear correlation between 6:00AM, tunning to channel 15315 on cardid 5, sourceid 8 and DB reconnect.
I think this might be sign of another very subtle bug.
Channel 15315 is very frequently used by me so thing is even more interesting that reconnect is always at the same time.
I checked sys logs & cron - no any relevant entries or jobs at 6:00AM. Interesting.
-br


Warpme Fri, 26 Aug 2011 01:39:47 -0700

Interesting.  I have no such pattern:

2011-08-09 16:00:06.923299 I [20242/20259] Scheduler mythdbcon.cpp:234
(Reconnect) - MySQL reconnected successfully
2011-08-11 08:59:59.367130 I [1559/2010] Scheduler mythdbcon.cpp:234
(Reconnect) - MySQL reconnected successfully
2011-08-05 18:08:22.311 MySQL reconnected successfully
2011-08-05 18:08:31.159 MySQL reconnected successfully
2011-08-05 19:59:01.659 MySQL reconnected successfully
2011-08-06 17:43:39.218 MySQL reconnected successfully
2011-08-06 17:43:39.612 MySQL reconnected successfully
2011-08-08 16:29:41.669276 I [1505/2283] Scheduler mythdbcon.cpp:234
(Reconnect) - MySQL reconnected successfully

b.


Brian J. Murrell Fri, 26 Aug 2011 04:32:16 -0700

warpme, maybe you have a database- or I/O-intensive cron job occurring
once a week at 6:00am on Monday that causes problems?

Mike


Michael T. Dean Fri, 26 Aug 2011 07:20:12 -0700

Mike,

Indeed, at first glance it looks like cron-like behaviour (every 7-days,
6:00AM)
That was also my suspect so I carefully checked cron configs & logs.
Nothing relevant & worth to mention.
I can't find any aspect of my server configured to do something at 6:00AM.
I have to look on this in longer test window - maybe this will show
something meaningful...
-br


Warpme Fri, 26 Aug 2011 14:46:23 -0700

I see You are greping log output.
Unfortunately this filters reconnect context information.
May You post every reconnect entry with few leading lines ?
-br


Warpme Fri, 26 Aug 2011 14:51:03 -0700

Oh - I forgot to add this in first replay.
Frequency of reconnects in Your case compared to mine suggest me that
Your environment (OS) is now (compared to my OS after upgrade) much more
"causing" reconnects as I see noticeable difference in reconnects
distribution/density in time.
What glibc/kernel are You using ?
Is it acceptable for You to do full OS upgrade and see results ?

-br


Warpme Fri, 26 Aug 2011 15:01:28 -0700

If anacron is installed on your machine it can be a bit tricky to
determine exactly when a job is going to be run. There is a delay
parameter; see man anacron.


Jan Ceuleers Sat, 27 Aug 2011 02:30:57 -0700

Hi,

Have you checked the weekly jobs of periodic (if Linux have them, I'm
on FreeBSD) too?
--
Regards,
Torfinn Ingolfsen


Torfinn Ingolfsen Sat, 27 Aug 2011 08:38:56 -0700

Out of curiosity--especially for the 0.24-fixes users, but also for
those using master--how do those of you who are experiencing DB issues
have your database configured?

1) Is your mysqld server running on the same host as the master
mythbackend server?
2) If so, what did you specify for the DBHostName on your master
mythbackend server's config.xml/mysql.txt file(s)?

I'm starting to wonder if those of you who are seeing problems may have
configured DBHostName to use a host name or routable IP address--which
forces Qt-MySQL drivers to use a TCP/IP connection to the MySQL server
using the network stack.  If, at some point, certain tools on certain
distros detect a network issue, they will unconfigure the entire network
(even removing the IP address from the NIC)--basically, they rip the
entire network out from under all running applications.  For an
application that requires a reliable connection to a database, this
means Bad Things happen.

Because of this, I recommend a) running the mysqld server on the master
backend host and b) using localhost for the DBHostName in the
config.xml/mysql.txt for the master backend host, only.  Please see http://www.gossamer-threads.com/lists/mythtv/users/490328#49[..]  for
information--and note that you can use localhost for /only/ the
DBHostName in a multi-host setup (you cannot use localhost/127.0.0.1/::1
for the this backend/master backend "IP Address" settings).

If you've been having issues with MySQL disappearing on you, I'd
appreciate your replying with answers to the 2 questions, above.  If you
do have a system with mysqld running on the master mythbackend host, and
it's not already, please try changing DBHostName to localhost to see if
it improves stability (or fixes the issue for you).

Thanks.

Mike


Michael T. Dean Tue, 30 Aug 2011 07:00:22 -0700

FWIW, I'm now a(n initially reluctant, but now satisfied with) master
user using a build with the fixes applied.

Yes.

OK.  So given that my backend runs as the user "mythtv":

mythtv    1588  1586  2 Aug15 ?        08:03:22 /usr/bin/mythbackend
--logfile /var/log/mythtv/mythbackend.log --user mythtv

I'm going to assume you are interested in
~mythtv/.mythtv/{config.xml,mysql.txt}.

My config.xml is empty and mysql.txt has:

$ grep DBHostName ~mythtv/.mythtv/mysql.txt
DBHostName=localhost

I would hope that would be using the most performant and reliable
communication mechanism available.

Already doing so (and have been forever).

b.


Brian J. Murrell Tue, 30 Aug 2011 07:09:59 -0700

Mike,
It is great You are trying trace mysql reconnect issue.
Sometime ago I had hypothesis related server<->lib comm.aspect, so I
move Qt-mysgl lib.comm from unix socket to TCP/IP.
After few days of tests I didn't any change in reconnects
density/distribution in time.
In fact my experience is that os upgrade had clearly noticeable impact
on reconnects density/distribution: avg.1 per day vs avg.1 per 5 days.
This was on short observation window (6 days) so it might be pure hazard
thing.

Answering Your questions:
a\yes. My DB is on the master be host
b\my DB server talks with client lib via unix socket

Now I switch from Dillon's cron (as it is dead proj.from 2y) to dcron,
make sure all my cron jobs are under my full control and begin next
long-term testing window.
I will report result as soon as something interesting pops-up or after 3
weeks if all will be OK (well, I strongly hope for latter).

-br


Warpme Tue, 30 Aug 2011 13:23:33 -0700

Small update:
2 days ago I update BE to v0.25pre-3260-ga15740a-dirty-20110829 and
enable active EIT scan.
Unfortunately today I had deadlocked BE. Details are in: http://code.mythtv.org/trac/ticket/9704#comment:48
This time however there is no any symptoms of client mysqllib hang in
traces nor any mysql reconnect reports in be logs.
I hope it means mysql issue is resolved and this deadlock is results of
easy to correct thread locking issue.
-br


Warpme Tue, 30 Aug 2011 22:34:05 -0700

I think we can close this tread. Last 3 weeks I was testing latest
master. What I see:
-code seems to be really stable if EIT active scan is no used (1000 rec.
no single deadlock, no single DB reconnect)
-with active EIT scan, code seems to be stable as long as LiveTV is not used
-if I use LiveTV on tuner other than tuner with enabled active scan - no
issues, deadlocks, etc.
-if I use LiveTV and active scan on the same tuner - quite frequently
(once per 5-10 LiveTV starts) I have issue described in http://code.mythtv.org/trac/ticket/10016.  Once per few days also I have
deadlock with MYTH_PROTO_RESPONSE empty (trace attached in #10016).
Once again - thx for recent great improvements in code. Now it works
great as PVR.
Cleaning issue from 10016 will make it really good system.
-br


Warpme Wed, 21 Sep 2011 00:11:18 -0700



Related Topics

Post a Comment