Skip to the content.

502 Proxy Error solved

I was getting mysterious, intermittent proxy errors using Apache mod_proxy to talk to an older version of Glassfish (2.x). It turned out the root cause was size of HTTP headers as cookies were accumulating, which caused the underlying app server to close the socket without a response.

A similar phenomenon was documented here:
http://kenai.com/jira/browse/KENAI-2727

The fix was trivial: increase the "receive-buffer-in-bytes" setting in domain.xml to 8192 bytes.

Finding it was a pain, because the older version of Glassfish had a bug such that the underlying IOException that aborted the request never got logged anywhere!
http://java.net/jira/browse/GLASSFISH-5181
Read More

JSF + JPA without EJB or Spring

I've had success using JSF 2.0 with JPA on Glassfish 3.0.1, without using Spring or EJB for managing middle-tier services or DAOs. (I actually do like Spring - the goal in this case was to minimize dependencies, moving parts and learning curve, rather than to avoid any particular technology.)

A similar approach was detailed here:
http://www.swview.org/blog/best-way-use-jpa-web-tier

 Using only the JSF @ManagedBean and @ManagedProperty annotations, along with the JPA @PersistenceContext, you can refactor business services or DAOs into other JSF managed beans that are injected into the one exposed directly to the UI. What this looks like in practice:

 orders.xhtml:
 

OrderBean.java:
@ManagedBean
public class OrderBean {
@ManagedProperty(value="#{orderDao}")
private OrderDao orderDao;
private List orders;

@PostConstruct
public void init() {
this.orders = orderDao.getOrders();
}
public List getOrders() {
return orders;
}
}

OrderDao.java:
@ManagedBean
@ApplicationScoped
public class OrderDao {
@PersistenceContext(...)
private EntityManager em;

public List<Order> getOrders() {
return em.findAll(Order.class);
}
}
In this example, the DAO (OrderDao) is just another JSF managed bean, but not actually referenced directly from a page, only from another managed bean. This approach lets you isolate the logic tied to EntityManagers from the rest of your "normal" JSF managed beans, and makes it (marginally) easier to unit-test your managed beans because you can mock the DAO at a higher level instead of mocking the entity manager.

 This does NOT buy you declarative transactions or any of the other good stuff you get with Spring or EJB. (You can inject a UserTransaction with @Resource, and call it explicitly, though.) So it works best for simple apps with mostly read operations and basic single-object CRUD transactions.

 Also, with Java EE 6 CDI, all this may become moot because @Named and @Inject annotations effectively blur the line between all the different managed bean flavors (EJB, JSF, Spring/POJO), although I haven't found a good way to replace JSF @ViewScoped without buying into Seam.
Read More

WebSphere "invalid Oracle URL specified" error


Another unhelpful WebSphere error, this time with an assist from Oracle.

This happened to me when I configured my JDBC data source with the default wpdbJDBC_oracle JDBC provider, using the XA datasource (OracleXADataSource), and used the "container managed" J2C authentication alias instead of "component managed".  The WebSphere admin console will successfully test the connection, but when you use it in a web application, it will fail with this "Invalid Oracle URL specified" error.  It was so hard to track down because it made me focus on the JDBC URL, which wasn't ever the problem.  It never occurred to me that the admin console and the web applications would somehow be getting connections and signing into Oracle differently, which tricked me into thinking that my configuration was really ok when it wasn't.

For the record, the web application was just doing a straight JNDI datasource lookup without any resource-ref mapping in web.xml, using the same JNDI name as bound in the server.

Also, changing to a different non-XA JDBC provider using plain OracleConnectionPoolDataSource resulted in "invalid username/password".

When I changed the datasource to use the component-managed alias instead, and restarted, everything worked.
Read More

WebSphere Portal error - "Puma requested entity type Group from VMM but received Entity"

Got this error when trying to upgrade from WebSphere Portal 6.0 to 7.0.

com.ibm.portal.puma.SchemaViolationException: EJPSG0053E: Puma requested entity type Group from VMM but received Entity.

This sets a record for one of the most unhelpful errors ever. It can be caused by a bad entry in LDAP - in my particular case, it was a few groups whose objectClasses (groupOfUniqueNames and top) had somehow been saved with base-64 encoding instead of as plain text, although the values themselves when decoded were correct. The same exact LDAP directory was also working fine with WebSphere 6.0.

One of IBM's pages implies that the configuration itself is broken - when in my case, the configuration was perfectly fine and the problem was a bad entry in LDAP. The error gives no help tracking down the broken entry.

https://www-304.ibm.com/support/docview.wss?uid=swg21419580

Other IBM resources imply that the specific configuration issue is making sure that the group and user LDAP object classes are unique - i.e., don't use "top". In my case, I was doing pretty standard stuff - groupOfUniqueNames and inetOrgPerson.

http://publib.boulder.ibm.com/infocenter/wpzosdoc/v6r1/index.jsp?topic=/com.ibm.wp.msg.doc/messages/com.ibm.wps.puma.resources.Messages.html

https://www-304.ibm.com/support/docview.wss?uid=swg1PK80507
Read More

Javascript, getYear and cookie expiration

Chrome and Firefox both return an offset from 1900 for date.getYear() instead of a four-digit year (pre-Y2K, a two-digit year). So new Date().getYear() returns 111 instead of 2011.

Old news, right? Well, I've been getting away with this for a while and not realizing it, in logic to expire cookies, and didn't notice until switching to Chrome.

The cookie expiration function was something like this

Date d = new Date();

d.setYear(d.getYear() - 1); // last year = 110
document.cookie = "COOKIE_NAME=; expires=" + d.toGMTString();

Firefox was happily setting the cookie's expiration to the year 0110, which was wrong but still accomplishing the end goal of expiring the cookie.

Chrome, on the other hand, mangled toGMTString to only print the three digits "110" and was not interpreting the cookie as expired.

Of course, the better question is why write this code yourself anyway, when jQuery, Dojo, etc., all have their own cookie APIs.
Read More

Javascript associative array and iteration order

I just noticed that some browsers, like IE9 and Chrome, don't preserve insertion order of keys when iterating with a for-in loop ["for (key in array)"]. Although technically Javascript never guaranteed preserving order in a for-in loop, so many popular browsers did anyway that I was taken by surprise.

The Chrome behavior is especially odd - it preserves order of insertion for non-numeric keys, but iterates numeric keys in numeric order first. It also coerces numeric strings to numbers.

This code example illustrates

var foo=new Object();


foo["111_"] = 1;
foo["222_"] = 2;
foo["333_"] = 3;
foo["444"] = 4;
foo["555"] = 5;
foo["666"] = 6;

var str = "";
for (var key in foo) {
str = str + foo[key] + " ";
}
document.write(str);

In Chrome this prints "4 5 6 1 2 3". In IE 7, IE 8, and Firefox, it prints "1 2 3 4 5 6".

This behavior (particularly the coercion of numeric strings) has been tracked as a bug in Chrome - but unclear whether it will ever be fixed
http://code.google.com/p/chromium/issues/detail?id=37404

ECMAScript spec says "order of enumerating... is not defined"
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf

Read More

commons-logging classloader pain

I recently learned about commons-logging's classloader behavior the hard way when moving applications that use log4j from Tomcat to WebLogic. I had naively put commons-logging.jar in my WebLogic system classpath to emulate Tomcat's behavior, and things subsequently broke. This web site provided a very detailed and erudite explanation:

http://www.qos.ch/logging/classloader.jsp

Basically, it boils down to weird subtlety in commons-logging's behavior. If commons-logging.jar is in the system classpath in a J2EE environment, the "current" or "thread context" classloader for each individual webapp contains WEB-INF/lib, but commons-logging itself is loaded in the system (parent) classpath. commons-logging looks for log4j using the thread-context classloader and therefore will "see" log4j in WEB-INF/lib, but tries to actually load it in the system classpath, and breaks with a NoClassDefFoundError.

Various solutions:

  • Only put commons-logging-api.jar in the system classpath, as Tomcat does. (I didn't realize there were two different variations of the JAR at first. I know, RTFM.) This stripped-down commons-logging will pretend that log4j doesn't exist. Downside: logging output from third-party APIs like Hibernate, Digester, etc. will go who-knows-where, but not the same place as your application's log4j output.

  • Put log4j in the system classpath too. This works, but has the bad side effect that all applications will share a single log4j configuration and this makes logging unmanageable.

  • Put commons-logging.jar in your WEB-INF/lib directory. I'd heard bad things about this, especially if you want to deploy the same WAR in Tomcat. But for whatever reason it worked fine with Tomcat 4.1.31 and WebLogic 8.1. So this may be my preferred solution.



Anyhow, if this was confusing to someone who has been doing J2EE professionally for 5-6 years, I can imagine how confusing it must be to someone just trying to do this for the first time. The BileBlog puts it a lot less charitably.
Read More

cost of supporting old browsers

I've begun to question the wisdom of standards for websites that require support for older browsers like Netscape 4. Backwards compatibility is all well and good, but I feel like often times clients--government ones in particular--formulate these requirements without any real cost-benefit analysis.

Keeping support for Netscape 4, IE 3, etc. has a very real cost, with respect to CSS and DHTML/Javascript one-offs, or having to pass on new patterns like AJAX. But many statistics show that fewer than 1% of all browsers are Netscape 4--even going back a year or two. (See here for an example.) Most people are using IE 6.

On the other hand, a commercial client I visited had done a very careful cost-benefit analysis on operating system support--they knew exactly how many of their users were still using Windows 98 and how much it would tick them off if they were forced to upgrade. So they might be stuck supporting it, but at least they could justify the decision.

So, before making requirements to support Netscape 4, clients should do their homework, and ask themselves what the benefit is--is anyone really still using it and can't upgrade? And how much extra time and money are the developers going to spend on workarounds to support these old browsers, compared to the value delivered?
Read More

Wireless adventure

I must have bad luck with my home wireless network. I had a D-Link DI-514 wireless router, which worked great until I got a new IBM Thinkpad X40 with the built-in Centrino. After various firmware upgrades, configuration changes, etc., I could not get the two to work together--I was getting roughly 30% packet loss and 500 ms ping times, with the notebook right next to the router antenna.

So I got a new DI-524 which had a big Centrino-verified logo on the front. But, after unpacking it and hooking it up, it won't talk to my cable modem (also a D-Link product, a DCM-200) and the "WAN" link light doesn't even come on.

I take the thing back to the store, bring home another DI-524 *and* a Linksys BEFW11S4 just to be safe. Sure enough, they *all* have the same problem with the WAN link light. What are the odds?

After some googling and a call to D-Link tech support (who was way more knowledgeable than I expected) it turns out that this mass-market consumer equipment doesn't do a good job of auto-sensing 10- vs. 100-mbit ethernet, and my old cable modem is only 10mb. After a firmware upgrade it all worked.

The design lesson here is the "principle of least astonishment"--I'm somewhat knowledgeable about network stuff, but so far every piece of 100mbit ethernet equipment I've used doesn't just break if the other end is only 10mbit. So when that link light didn't come on, I immediately assumed the unit was defective, especially when there was no documentation of this issue. I could have saved myself (and Best Buy's returns counter) a whole lot of trouble if the problem were more clear.


Read More

displaytag sorting/paging performance vs. native SQL

displaytag is an easy-to-use JSP tag library for printing out HTML data tables with sorting and paging. But, there's a catch. In an enterprise application, collections are the result of some database query, so implementing sorting and paging entirely within the presentation layer is not going to be as efficient as doing both natively in the database with ORDER BY, LIMIT and OFFSET (assume PostgreSQL syntax). Instead, displaytag expects the entire collection to be in memory, displaying only the necessary ones, and sorting the Java collection itself.

I ran some tests to see exactly how much this affects performance. On a non-scientific test (YMMV) with a 250 row result set, sorting with displaytag added an 34% overhead over a native ORDER BY. There was a much more dramatic difference for paging, though. On a test where I loaded 250 rows from the DB and displayed only 25 (pagesize=25), displaytag took 141% longer on average than a "LIMIT 25" which only returns the 25 rows to actually be displayed. The overhead factor will scale with the total number of rows in the original, unlimited query.

So, displaytag clearly is not as efficient as sorting and paging in native SQL. But is it enough of a difference to matter? Well, it depends on your application--it's a classic tradeoff between performance and elegant design. Separation of concerns says paging results conceptually belongs in the presentation layer; in a layered J2EE application, you can push sorting/paging down through the business and persistence layers, but it isn't pretty.

My personal opinion, though, is that performance normally shouldn't be enough of a factor to dissuade you from using displaytag for sorting/paging results. If your query returns so many rows that you can't afford the overhead from the undisplayed rows, then you may not be thinking about paging the right way--paging should be thought of as purely a UI layer construct to save the user from scrolling or downloading a large HTML document, not to save the database from working too hard. How meaningful is it to jump from page 1 to page 47 out of 62 anyway? If your query returns more than, say, 300 rows (15 pages with 20 rows each), you probably should think about making the user provide additional search criteria first rather than blindly paging the output.

What about the other option--coupling the presentation layer directly to the database? It's easy to imagine an extension to displaytag that takes a SQL query rather than a Java collection, and handles sorting/paging natively by dynamically appending ORDER BY and LIMIT/OFFSET clauses. (Indeed, the MS toolset seems to actively encourage this pattern.)

This is fine for prototyping or simple apps that don't have much business logic beyond the basic CRUD transactions. But, this can rapidly become unmaintainable--not only are there many more places to touch if the DB schema changes, but you are also at risk of introducing bugs by accessing tables directly and potentially bypassing domain logic (business rules) tied to particular fields. Still, it may be worth considering tihs as a strategy for hand-optimizing queries with special performance requirements that outweigh maintainability, especially if the query is more relational than object-oriented in nature.

In summary, displaytag is a good, simple choice for many applications that need HTML tables with sorting and paging. There may be a performance hit, but a clean, maintainable design often is more
important.
Read More

USB flash drives and Linux

I've had to set up a USB stick with Linux recently. It's fairly easy if you know the right magic.

Here are some links to useful resources:

Post on ExtremeTech.com
Flash memory HOWTO on ibiblio

Essentially it boils down to adding a line to /etc/fstab and then mounting /dev/sda1 or /dev/sdb1 on some mount point (directory). You don't strictly need to edit /etc/fstab, but if you don't you will need to be root in order to write anything on it. Here's the magic for /etc/fstab to allow non-root to get read/write access:
/dev/sda1 /mnt/usbstick vfat        rw,user,noauto 0 0

(assuming your mount point is /mnt/usbstick.)

If that doesn't work, check /var/log/messages and see if your stick is on /dev/sdb1 instead.

I'm looking forward to it being as easy to use USB sticks under a default Linux distro as it is under Windows, though. I hate to say it, but under Windows, they "just work."
Read More

Avoiding Anemic Domain Models with Hibernate

One of Hibernate's most under-appreciated features is its ability to persist private fields. This feature is useful for avoiding what Martin Fowler calls the Anemic Domain Model anti-pattern, where domain objects (entities) are reduced to "dumb" record structures with no business logic. In an Anemic Domain Model, you lose all the benefits of OOP: polymorphism, data hiding, encapsulation, etc.

The Anemic Domain Model may have originally evolved from EJB CMP, which requires any persistent field to be accessible directly with a public getter/setter. Developers using POJO frameworks like Hibernate often duplicate the same pattern, though, simply replacing the entity beans with POJOs.

This is not just an academic discussion; this has real consequences for the quality of a codebase. (Academically, this is part of the OOP-RDBMS "impedance mismatch"--in particular, that there is no distinction between a setter/constructor call that actually mutates/constructs an object and one that is merely incidental to materializing an existing object's state from persistent storage.) Let's say you're developing a system for issue tracking with a business rule like "anyone can create a ticket or change its status, but only managers can raise it to 'critical.'" A fragment of an Issue object might look like this (some detail omitted to focus on encapsulation/data hiding issues):
public class Issue {

private String m_status;
public String getStatus() {
return m_status;
}
public void setStatus(String newStatus) {
if (newStatus == STATUS_CRITICAL && !getCurrentUser().isManager()) {
throw new SecurityException("critical.requires.manager");
}
m_status = newStatus;
}
}
This looks great until you realize that setStatus(STATUS_CRITICAL) is also going to be called from the persistence layer in materializing an existing Issue that is already critical, not just when making an explicit change through the UI workflow. Since anyone can view any issue, SecurityException will be thrown when a non-manager tries to view an issue that is already critical. We immediately recognize that the persistence layer needs a way to get "privileged" access to set the underlying field directly, bypassing business logic.

The typical workaround is to give up encapsulation and move the business logic into the corresponding service layer object (e.g., stateless session bean) for issue transactions:
public class IssueManager {

public Issue findIssueById(Long id) ;
public Issue newIssue(... fields ...) {
// begin TX
// ... setup new issue
if (status == STATUS_CRITICAL && !getCurrentUser().isManager()) {
throw new SecurityException("critical.requires.manager");
}
issue.setStatus(status);
// ...
// commit TX
}
public void changeStatus(Long id, String status) {
// begin TX, load issue
if (status == STATUS_CRITICAL && !getCurrentUser().isManager()) {
throw new SecurityException("critical.requires.manager");
}
issue.setStatus(status);
// commit TX
}
}
Now, two real consequences are apparent. First, giving up encapsulation leads to cut-and-paste programming, violating the "don't repeat yourself" principle; this increases the risk of error of the business rule not being cut-and-paste again somewhere it's needed. Second, you lose polymorphism; it is now very difficult to have a subclass of Issue with slightly different business rules. (For example, maybe the main Issue has no restriction on setting status, but a specific type of issue has the critical-requires-manager rule.)

It's true that you could have two separate sets of getters/setters in the Issue itself, one that applies business logic and one that allows direct access and is only used by persistence. This would address the polymorphism issue. But if that direct accessors are also public (as EJB CMP requires) then you still lose data hiding; nothing prevents your service layer/transaction scripts from calling these methods directly.

If you're using Hibernate, though, there is a very elegant solution. Hibernate is effectively "privileged" by manipulating bytecode, so it can touch private fields directly. Hibernate gives you two options in the above scenario:
  • You can have two separate bean-style properties linked to the same underlying field, one with private getters/setters and the other with public. The private methods access the underlying field directly, and the public ones apply business rules. This is the preferred approach, but has the downside of verbosity, plus you have to use different property names in HQL (private) and everywhere else (public).
  • Hibernate can also persist fields directly by using the "access" attribute on @hibernate.property and so on. The upside is that this is more concise with only a single public bean-style property, but using access="field" requires the field name to exactly match the private instance variable name; this won't work if you have some kind of Hungarian naming convention like "m_foo". You can do something like access="MyFieldAccessor" where MyFieldAccessor is a custom class implementing net.sf.hibernate.property.PropertyAccessor, implementing your naming convention (mapping bean property names to member var names) but that requires extra effort.
There are other uses for this feature in Hibernate:
  • Primary keys are generally supposed to be immutable by normal business logic, set only within the persistence layer. So, "setId" methods can almost always be private or protected.
  • Collections getters and setters can also be kept private, to preserve data hiding (prevent rep exposure). Otherwise, when business logic can manipulate a collection directly, it's difficult to enforce business rules on the collection elements, or even to ensure the elements are of the correct type. (The latter may partially be addressed by generics in Java 5 and/or Hibernate 3.)
I believe JDO also instruments classes at runtime to get similar privileged access to persistent fields.
Read More

more PostgreSQL performance junk

Someone at work was running a big delete (100k rows) and it was taking forever, as if it were hung. We couldn't figure out what was going on, and there was clearly a non-linear effect: smaller deletes on 10k rows were completing very fast, less than 5 sec, while 100k rows was still running after 20 mins.

We think the big delete was taking so long because PostgreSQL may try to keep a rollback buffer for the whole delete operation in memory or something like that, causing thrashing or something like that. I haven't tried this on Oracle, but I'm guessing that it and other databases may be smarter about managing their physical storage directly (RAM vs disk) rather than relying on the underlying OS.

but then again, if you're touching 100k rows at once, probably not a bad idea to commit every so often anyway, so as to avoid a long-running transaction that could potentially hose other users.
Read More

PostgreSQL on cygwin: "Bad system call"

I was getting this "Bad system call" message from PostgreSQL 7.4.x on cygwin. It was working earlier, and I thought I had done everything right--cygserver was up and running, so I didn't know what was up. ipcs gave the same error.

Turns out I had forgot the magic word: "CYGWIN=server". The first time I installed PostgreSQL (and read the docs), I had just set the var on the command line (CYGWIN=server pg_ctl start ...) and never put it in my profile. Easy enough to fix.
Read More

PostgreSQL performance of "where exists"

Today I was looking into the performance of a PostgreSQL query with a "... where exists (select 1 from ...)" subquery:
select foo_id from foo

where exists (select 1 from bar where bar.foo_id = foo.foo_id)
I was surprised to find out that this query actually ran faster when I restructured it with a SELECT DISTINCT and a JOIN:
select distinct(foo_id) from bar

join foo on bar.foo_id=foo.foo_id
Some references on the web I've found suggest that EXISTS is the preferred way to write the above query in general. Because it's a boolean condition, in theory the database needs to scroll fewer rows because it can stop as soon as the first match is found; and the DISTINCT can be expensive if the results from the join version would not have been unique.

An ancient PostgreSQL mailing list post indicates that rewriting the query as a JOIN may be faster than EXISTS in PostgreSQL, because the join can take advantage of indexes while EXISTS does a nested loop. But, then again, I'm still using PostgreSQL 7.3.x, and EXISTS handling may well have been improved in 7.4.
Read More