Thursday, November 27, 2008

Transaction Recovery in JBossAS

It started out so innocently - running my J2EE app under high load, I would notice this message repeat many times in the JBossAS log file:

[com.arjuna.ats.internal.jta.resources.arjunacore.norecoveryxa]
Could not find new XAResource to use for recovering
non-serializable XAResource ...


The desire to hunt down the meaning of this error message set me off on a long voyage that involved learning about the innards of the JBoss Transaction Manager (JBossTS), stepping through its code, writing custom Transaction Manager objects, and reading through many pages of documentation, forum threads and JIRA issues.

What I came out of this huge endeavor is the fact that the JBossTS integration with JBossAS is not very well documented and is not easy to use. What I hope to achieve with this blog is to provide what I feel is lacking in the current JBossAS documentation, and that is a single page where you can find all the information you need in order to assure your application running in JBossAS is fully recoverable in the event of an XA transaction failure. (note: I am using JBossAS 4.2.1, hopefully, later JBossAS versions will make what I am going to discuss easier)

First, let me say this - if you think your application deployed in JBossAS can recover from transaction failures, I would recommend you read this blog and double check your configuration. Because unless you took very specific steps to configure your JBossAS deployment to support XA transaction recovery, you will not be able to recover from failed transactions - even if you are using JBossAS with JBossTS integrated and even if you are using XA datasources (i.e. your data sources are defined in your -ds.xml files via <xa-datasource>). The only outward indication that you will get that tells you your app is not able to perform transaction recovery is when you actually get a failure, which causes that above message ("Could not find new XAResource to use for recovering non-serializable XAResource") to appear in your log file.

Why is this? Because out-of-box, JBossAS is not fully configured to perform recovery. You have to manually configure JBossAS to fully enable this feature - and it's not as simple as just setting some configuration settings or deploying some additional canned services. It also involves writing/deploying your own custom Transaction Manager code and integrating it into the JBossAS server, because there are problems with some of the current code shipped with JBossAS. This is why I think many people think they can recover from transaction failures, but really cannot - because I'm not convinced many people know enough about this issue to write this custom Transaction Manager code and deploy it properly in JBossAS. This blog will hopefully help put all of this information in a single place and get people "on the road to recovery".

I will now walk you through everything that happened in my lengthy investigation of this issue. You can follow my train of thought in the JBossTS forum thread I started that shows you the progression of this investigation.



Before we begin, I must point out that for any of this to work at all, you must deploy your data sources to use XA, i.e. your -ds.xml files need to use <xa-datasource>. To learn how to switch your data sources over to use XA, refer to the JBossAS wiki, specifically the part that talks about "Parameters specific for javax.sql.XADataSource usage"

Special Note To Oracle Users: XA recovery will also not work unless you grant special privileges to your XA datasource's user (i.e. the user whose credentials you define in the xa-datasource definition). If you do not do this, "XAException.XAER_RMERR" errors will occur. The privileges you need to grant include:
GRANT SELECT ON sys.dba_pending_transactions TO db_user;
GRANT SELECT ON sys.pending_trans$ TO db_user;
GRANT SELECT ON sys.dba_2pc_pending TO db_user;
GRANT EXECUTE ON sys.dbms_system TO db_user;
GRANT SELECT ON v$xatrans$ TO db_user;

The first four you definitely need. The last one is Oracle version dependent. As documented in http://www.orafaq.com/wiki/XA_FAQ, "for Oracle 7.3 databases one needs to run the XAVIEW.SQL SQL script as user SYS. This script will create the V$XATRANS$ view. Grant select access on it to PUBLIC. This script is located in the $ORACLE_HOME/rdbms/admin directory. Please note, XAVIEW.SQL is not required for XA applications running on Oracle8 and above." Talk to your DBA to see which are required for your specific database installation. I've only tested XA on Oracle10g so I can't speak for other versions.




First, of course, was the fact that this all started by me getting those XA recovery failure messages in the JBossAS server log. (And for the curious, I believe the entire reason why I was getting those was because, under heavy load, my application was maxing out its connection pool, which actually went over my processes/sessions limit in Oracle - Oracle promptly rejected the extra connection attempts causing my transactions to fail. Bumping up my Oracle processes/sessions configuration seems to have fixed the cause of most, if not all, of my failures). Anyway, back to the XA recovery error message - at the time, it wasn't very intuitive what that log message was saying, but it sounded bad enough for me to search the 'net for this error message and see what others were saying about it. Alot of people reported seeing this, but not much was said as to how you go about correcting it, and more specifically, how to correct it within the context of the JBossAS application server. I found wiki pages that talk about this error message (such as this one), but only from the perspective of the JBossTS standalone product. This goes back to my assertion that the JBossAS needs more documentation on its integration with JBossTS, because the JBossTS documentation only gets you so far (and is why I submitted JIRA JBAS-6244). The JBossTS documentation itself seemed comprehensive, my concern was the lack of JBossAS documentation on its integration with JBossTS. For example, the JBossTS wiki page tells me this:

You need to provide an instance of a XAResourceRecovery
implementation and tie it into the recovery process


I'm sure this makes perfect sense to the developer familiar with the JBossTS API and to the core JBossAS developers themselves that are integrating JBossTS into the server. But to a J2EE developer - the guy who is simply deploying his J2EE/EJB3 app in JBossAS and who should be free from having to worry about the internals of the app server's transaction manager - this is very confusing and leads to more questions than answers. For example, what is "XAResourceRecovery"? How do I provide one? And how do I tie it into the recovery process? I found no easy answers to those questions in the JBossAS documentation.

Searching the JBossTS documentation further, I found information on the XAResourceRecovery class. It turns out this is a JBossTS API that provides the hook necessary to recover from a transaction failure for a particular resource, like a JDBC data source (that answers the question, "what is XAResourceRecovery?"). JBossTS provides a few of its own implementations out-of-box and because JBossAS ships with the JBossTS product, JBossAS itself comes with these XAResourceRecovery implementations out-of-box as well (and this answers the "how do I provide one?" question - but as you will see shortly, that is not the end of this story). From the JBossTS documentation, I see:

Recovery of XA datasources can sometimes
be implementation dependant, requiring developers to
provide their own XAResourceRecovery instances. However,
JBossTS ships with several out-of-the-box implementations
that may be useful.


This wiki page lists two implementations. One is specific to Oracle, but since my app needs to support both Oracle and PostgreSQL (and hopefully more in the future), I want to use the second one: com.arjuna.ats.internal.jdbc.recovery.JDBCXARecovery. As the JBossTS documentation states, "this recovery implementation should work on any datasource that is exposed via JNDI." (emphasis is mine, because as I will show you shortly, this class is completely unusable when deployed within JBossAS).

OK, so I should be golden now. All I need to do in order to enable XA recovery for any data source deployed in my JBossAS is to set those JDBCXARecovery configuration properties specified in that wiki page and tie that JDBCXARecovery implementation into the recovery process. Now, how do I do this? (this is that third question I had asked myself earlier). Unfortunately, there is no JBossAS documentation that I found that describes how to do this. But I did see another JBossTS wiki page that describes how to do this for the standalone JBossTS product, which states:

To inform the recovery system about each
of the XAResourceRecovery instances, it is necessary to
specify their class names through property variables.
Any property variable found in the properties file, or
registered at runtime, which starts with the name
com.arjuna.ats.jta.recovery.XAResourceRecovery will be
assumed to represent one of these instances, and its
value should be the class name.
...
Additional information that will be passed to the
instance when it is created may be specified after
a semicolon.
...
Note: These properties need to go into the JTA section
of the property file.


OK, now I know how to do this for the standalone JBossTS. But how/where do I do this for the JBossAS integration of JBossTS? I found the file "jbossjta-properties.xml" located in the "<jboss-install-dir>/server/default/conf" directory - this, as it turns out, defines the properties that configure the internals of the JBossTS integrated in the JBossAS server. Based on the instructions on how to configure the JBossTS product that was previously discussed in the JBossTS documentation, I added this in that jbossjta-properties.xml file:

<properties depends="arjuna" name="jta">
<!-- add this to tie in the recovery object to my JBossTS -->
<property
name="com.arjuna.ats.jta.recovery.XAResourceRecoveryJDBC"
value="com.arjuna.ats.internal.jdbc.recovery.JDBCXARecovery"/>
<property name="DatabaseJNDIName" value="java:/MyDS"/>
<property name="UserName" value="my-db-username"/>
<property name="Password" value="my-db-password"/>
...


That is done exactly how it is documented. However, it doesn't work. I get this at runtime:

java.lang.NullPointerException
at javax.naming.InitialContext.getURLScheme(InitialContext.java:269)
at javax.naming.InitialContext.getURLOrDefaultInitCtx(InitialContext.java:318)
at javax.naming.InitialContext.lookup(InitialContext.java:392)
at com.arjuna.ats.internal.jdbc.recovery.JDBCXARecovery.createDataSource(JDBCXARecovery.java:174)
...


I actually had to grab the JBossTS source code and step through it in a debugger to see what's really happening and why this NPE is thrown. I give the full details as to why this NPE occurs in one of my forum thread posts - go here for the technical details - but suffice it to say, in order for those three properties (DatabaseJNDIName, UserName, Password) to be read in by the recovery implementation, I had to provide a parameter to the first property (the parameter value could be anything - I could specify "foo" if I wanted - but this parameter is meant to be the URL to a property file, so I specified the name of the property file itself):

<property
name="com.arjuna.ats.jta.recovery.XAResourceRecoveryJDBC"
value="com.arjuna.ats.internal.jdbc.recovery.JDBCXARecovery;jbossjta-properties.xml"/>


OK, that hurdle has been jumped. Start up again and... whoops:

java.lang.ClassCastException: org.jboss.resource.adapter.jdbc.WrapperDataSource
at com.arjuna.ats.internal.jdbc.recovery.JDBCXARecovery.createDataSource(JDBCXARecovery.java:174)


Back to the debugger and I found that JDBCXAResovery is trying to cast the object found from the JNDI lookup to a XADataSource, but JBossAS does not bind that type of object to JNDI - it binds this WrapperDataSource, which is not a XADataSource. Therefore, this JDBCXARecovery object can never work when deployed in JBossAS. This is one reason why relying on documentation for the standalone JBossTS product is insufficient and why the lack of JBossAS integration docs is really needed. This probably works in some cases, but it most certainly does not work (and will never work) when integrated with JBossAS.

Now, it turns out this class-cast problem has already been discussed on a prior forum thread and reported in a JIRA - JBTM-319. I wish I knew that before I started all of this (did I mention we need JBossAS docs on this? :)

Reading that JIRA, it looks like there is a XAResourceRecovery implementation written specifically for deployment inside of JBossAS 4.2 (AppServerJDBCXARecovery) and it was introduced in version 4.2.3.SP8 (I'll assume it made it into that version's distribution). However, I'm using an earlier version of JBossAS, so I had to take the source code for AppServerJDBCXARecovery.java, compile and bundle its binary in a jar file, and deploy that jar into my JBossAS's "server/default/lib" directory.

I found that the Javadocs for that class describe how to configure this. In addition, it looks like Jonathan Halliday very recently added some documentation on the JBossTS wiki that discusses this as well - it refers to the Javadoc for the technical details. He does, however, confirm my findings that this class is not in earlier versions of JBossAS - "Note that AppServerJDBCXARecovery is not present in JBossAS (you need to download and build it from source) or early EAP releases"

The Javadocs say, in part:

To use this class, add an XAResourceRecovery
entry in the jta section of jbossjta-properties.xml for
each datasource for which you need recovery, ensuring the
value ends with ;<datasource-name> i.e. the same value
as is in the -ds.xml jndi-name element. You also need the
XARecoveryModule enabled and appropriate values for
nodeIdentifier and xaRecoveryNode set. See the JBossTS
recovery guide if you are unclear on how the recovery
system works.


*sigh* - back to reading about the internals of JBossTS to learn what "appropriate values for nodeIdentifier and xaRecoveryNode" means. I'm sure, again, this makes perfect sense to someone familar with the JBossTS product, but I really find it annoying that a J2EE deployer needs to know all of this just to enable XA recovery. But OK, hopefully this will get easier in the future. Marching forward...

The Javadoc instructions tell me to refer to the JBossTS Recovery Guide, and it is in there that I read:

A value of * will force JBossTS to recover
(and possibly rollback) all transactions irrespective of
their node identifier and should be used with caution.
The contents of com.arjuna.ats.jta.xaRecoveryNode
should be alphanumeric and match the values of
com.arjuna.ats.arjuna.xa.nodeIdentifier.


This leads me back to "jbossjta-properties.xml" and lo-and-behold I do see a "com.arjuna.ats.arjuna.xa.nodeIdentifier" property set here - its value is "1". I didn't look deeply into what this actually identifies, but I assume it identifies this JBossTS instance (but I could be wrong on this).

So, following the instructions, I went back to my "jbossjta-properties.xml" and configured it to use this new recoverer instead of the unusable JDBCXARecovery implementation and to use the appropriate value for xaRecoveryNode:

<property
name="com.arjuna.ats.jta.recovery.XAResourceRecoveryJDBC"
value="com.arjuna.ats.internal.jdbc.recovery.AppServerJDBCXARecovery;MyDS"/>
<!-- xaRecoveryNode should match value in nodeIdentifier or be * -->
<property name="com.arjuna.ats.jta.xaRecoveryNode" value="1"/>


I'm really getting close now! There is one slight problem, however. When I run my application server for the very first time, my data source is not deployed yet! My application requires the user to run through a "post-installation" UI in order to do things like tell me what database vendor the user is using (Postgres or Oracle), the JDBC URL, database username, password, etc. My application then writes out some deployment information and hot deploys the ds.xml at runtime (a great feature provided to me by JBossAS - hot deployment of data sources is very cool).

Anyway, this causes problems because before my user runs this "post-install" step, I have no data source deployed and this recovery object will dump an ugly stack trace to the log because it can't find the data source. The exception is an MBeanException with a root cause of "javax.management.InstanceNotFoundException: jboss.jca:name=MyDS,service=ManagedConnectionFactory is not registered." This is to be expected, looking at the code of AppServerJDBCXARecovery.

So what I had to do is modify the AppServerJDBCXARecovery code so it can tolerate the times when the data source is not deployed. (I'll post a follow up with a URL to tell you where you can find this modified code, its not checked into svn yet, but will be soon. It is a pretty simple change [update: the source can now be viewed here]).

At this point, I recompiled my custom version of AppServerJDBCXARecovery, bundled it in a .jar and placed that jar in my JBossAS's server/default/lib directory and restarted the server. At this point, no errors occur at startup, and after deploying my data source, I confirmed that the recovery object is able to obtain my XADataSource!

And that's it, it is that simple. :-) [update: not so fast, after I wrote this blog, I hit another problem that is documented in JIRA JBTM-441. This is bad because a very common recovery use case (the database or network crashes) causes recovery to fail until you restart your app server, and this is true for all currently released JBossAS versions, 4.3 and under as of today, 12/6/2008) You must build a patched version of AppServerJDBCXARecovery, attached to that JIRA, and deploy it yourself to work around the problem]

At this point, I have an application with XA data sources deployed and a transaction manager configured to recover any transactions that fail. I plan on writing some test code in which I can force transaction failures to occur, so I can actually test that the recovery features are fully enabled, but at this point, I have very little doubt that things would work. Once I see that JBossTS is able to get my XADataSource, its just a matter of JBossTS doing what it does best - which includes performing this transaction recovery.

PHEW! All of this investigation took alot of time and energy, way too much time for my liking. Hopefully, I can save a few hours (or days :) of someone else's time with this information. It could have turned my several days into about 30 minutes. :}

12 comments:

  1. Cool article, John. Thanks for posting. I'm curious...do you work for RedHat?

    ReplyDelete
  2. Thanks for compliment. Yes, I do work for Red Hat.

    ReplyDelete
  3. I guess I could have figured that out by looking at your profile. :)

    ReplyDelete
  4. This has been a great help. The issue I have now is my data source uses the security-domain.

    So the AppServerJDBCXARecovery connection is looking for the userid/password from the XADataSourceProperties instead of the security-domain.

    Is this a configuration error on my part? Or is the security-domain option supported by AppServerJDBCXARecovery?

    Thanks!

    ReplyDelete
  5. da1shark - i do not think that is supported. See the source and look at createDataSource() and getXADataSource() - looks like it gets all properties from XADataSourceProperties and that's it. You might want to ask this question on the JBossTM forum at jboss.org. Possibly requires an enhancement JIRA to be written against the JBossTM integration with JBossAS.

    ReplyDelete
  6. Thanks for the helpful write-up.

    Just one question, after your steps, did you ever run into:

    DEBUG [com.arjuna.ats.jta.logging.loggerI18N] (Thread-6)
    [com.arjuna.ats.internal.jta.recovery.info.secondpass] Local XARecoveryModule - second pass

    DEBUG [com.arjuna.ats.internal.jbossatx.jta.AppServerJDBCXARecovery] (Thread-6)
    AppServerJDBCXARecovery datasource classname = oracle.jdbc.xa.client.OracleXADataSource

    ERROR [com.arjuna.ats.internal.jbossatx.jta.AppServerJDBCXARecovery] (Thread-6)
    AppServerJDBCXARecovery.createConnection got exception java.sql.SQLException: Invalid argument(s) in call
    java.sql.SQLException: Invalid argument(s) in call
    at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112)
    at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146)
    at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208)
    at oracle.jdbc.xa.client.OracleXADataSource.getXAConnection(OracleXADataSource.java:99)
    at com.arjuna.ats.internal.jbossatx.jta.AppServerJDBCXARecovery.createConnection(AppServerJDBCXARecovery.java:258)
    at com.arjuna.ats.internal.jbossatx.jta.AppServerJDBCXARecovery.getXAResource(AppServerJDBCXARecovery.java:115)
    at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.resourceInitiatedRecovery(XARecoveryModule.java:683)
    at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass(XARecoveryModule.java:179)
    at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWork(PeriodicRecovery.java:237)
    at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:163)

    ReplyDelete
  7. I have not seen that one. Looks like a problem in the Arjuna code? I would check with JBossTS guys on that.

    ReplyDelete
  8. If you do any further XA recovery testing I suggest you prepare a test case (heavy load) that is designed to leave a transaction in a data source 'in-doubt' and then be 100% certain that the TranManager can cleanup (commit)the in-doubt tran during tran recovery. This is one of the hardest test cases to pass.

    ReplyDelete
  9. Thanks for your effort on this issue!
    I’m getting the following error, is that the same as da1shark mentioned?
    I’m trying to use the com.arjuna.ats.internal.jbossatx.jta.AppServerJDBCXARecovery shipped with JBOSS 4.3.0.GA_CP06

    16:35:52,904 FATAL [LdapLoginModule] Somebody tried to authenticate user with username null!
    16:35:52,909 ERROR [AppServerJDBCXARecovery] AppServerJDBCXARecovery.createDataSource got exception java.lang.SecurityException: Failed to authenticate principal=null, securityDomain=jmx-console
    java.lang.SecurityException: Failed to authenticate principal=null, securityDomain=jmx-console
    at org.jboss.jmx.connector.invoker.AuthenticationInterceptor.invoke(AuthenticationInterceptor.java:97)
    at org.jboss.mx.server.Invocation.invoke(Invocation.java:88)
    at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
    at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
    at org.jboss.invocation.jrmp.server.JRMPProxyFactory.invoke(JRMPProxyFactory.java:180)
    at sun.reflect.GeneratedMethodAccessor375.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155)
    at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94)
    at org.jboss.mx.server.Invocation.invoke(Invocation.java:86)
    at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
    at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
    at org.jboss.invocation.local.LocalInvoker$MBeanServerAction.invoke(LocalInvoker.java:169)
    at org.jboss.invocation.local.LocalInvoker.invoke(LocalInvoker.java:118)
    at org.jboss.invocation.InvokerInterceptor.invokeLocal(InvokerInterceptor.java:209)
    at org.jboss.invocation.InvokerInterceptor.invoke(InvokerInterceptor.java:195)
    at org.jboss.jmx.connector.invoker.client.InvokerAdaptorClientInterceptor.invoke(InvokerAdaptorClientInterceptor.java:66)
    at org.jboss.proxy.SecurityInterceptor.invoke(SecurityInterceptor.java:70)
    at org.jboss.proxy.ClientMethodInterceptor.invoke(ClientMethodInterceptor.java:74)
    at org.jboss.proxy.ClientContainer.invoke(ClientContainer.java:100)
    at $Proxy46.invoke(Unknown Source)
    at com.arjuna.ats.internal.jbossatx.jta.AppServerJDBCXARecovery.createDataSource(AppServerJDBCXARecovery.java:167)
    at com.arjuna.ats.internal.jbossatx.jta.AppServerJDBCXARecovery.hasMoreResources(AppServerJDBCXARecovery.java:129)
    at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.resourceInitiatedRecovery(XARecoveryModule.java:679)
    at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass(XARecoveryModule.java:179)
    at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWork(PeriodicRecovery.java:237)
    at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:163)

    ReplyDelete
  10. yes, schuster-c, it looks like that's the same thing.

    ReplyDelete
  11. John,

    Thanks for your hardwork. I've following your directions but it doesn't seem to actually recover the resource. I posted the details here:

    http://www.jboss.org/index.html?module=bb&op=viewtopic&p=4267135

    ReplyDelete
  12. We are almost 2 year's later ...
    Has any body configured this properly with effective XA-transactions recovery across reboots ?

    ReplyDelete