Character Encodings
Jim Winstead has posted a couple of entries on character encodings (1, 2). Some good info in there. My three big tips for dealing with character encodings is this: 1. Know your character encoding and make sure you’re using that encoding everywhere. No look again because you probably missed a place where you didn’t think about encoding. Particularly don’t forget that printing something to System.err or System.out in Java uses the platform default encoding and so characters that can’t be represented in that encoding become question marks. 2. When practical, use US-ASCII and escaped characters for anything outside of it’s range. Most formats which support different encodings also provide a way to represent a character which isn’t in the current encoding through something like HTML’s entities or java’s escape codes (\u8222 etc). Most encodings are compatible with US-ASCII (EBDIC being a notable exception) so even if people forget to use the right encoding they can generally get away with it. 3. Remember that character encodings, despite their name do not apply to characters – they apply to byte sequences which represent characters. If you have a char variable in Java it has no character encoding as far as you are concerned, it’s just that character. The JVM can choose any representation it likes for that character in physical memory and you shouldn’t care (it actually happens to choose UTF-16 I think but you still shouldn’t care). You *do* however have to worry about character encodings when you convert from characters (or Strings which are really just a fancy array of chars) to byte streams. This happens when you use String.getBytes(), or print the characters to any kind of output stream. You also have to worry about the reverse process, new String(byte[]) and reading from an input stream. The first two items should be pretty clear to you if you’ve done any work with character encodings, the third may seem unimportant, but it will help stop you from expecting code like the following to work:
String str = new String("my string"); byte[] utf8Bytes = str.getBytes("UTF-8"); String strISO88591 = new String(utf8Bytes, "ISO-8859-1");
Naturally this won’t work because of rules 1 and 3. Rule 1 is broken in that you used a different encoding when working with the same data and 3 is broken because you expected strISO88591 to be a String using ISO-8859-1 character encoding, but it doesn’t because String objects don’t have a character encoding (as far as you should be concerned). The big exception to rule 3 is when you’re using a language which doesn’t guarantee it will support whatever characters you throw at it, in which case you basically either have to work only with byte arrays and never let the language string functions near them. In general though I’d suggest you find a better language or a better library. If I were to add fourth suggestion, it would be: remember that just because your character is valid, doesn’t mean the font you’re using can display it. Most fonts can’t display anything more than the characters in ISO-8859-1 and a few select others so if you’re working with mathematical symbols or characters from other languages you’ll need to find a special font that supports them. BTW, yes I have spent far too much time working with character encodings and tracking down where people stuffed up with character encodings.
He Wants An Apple
Leo Simmons wants an apple. He suggests that someone buy a nice shiny new 17″ powerbook and send him their old ratty 15″ powerbook. I happen to have an old ratty 15″ powerbook and would love a nice shiny new 17″ powerbook – I can even justify it. I can’t however afford it. So let me offer some mac hater a chance to show just how much they hate Mac’s in three easy steps. 1. Buy a nice shiny new 17″ powerbook. 2. Give the nice shiny new 17″ powerbook to me. 3. There is no step three. I will then quite happily give my old ratty 15″ powerbook to Leo and everyone will be happy. The mac hater can brag to all his friends about how he hates macs so much that he gave away a perfectly good 17″ powerbook to some total stranger because it’s totally worthless to him, I can enjoy my nice shiny new 17″ powerbook and Leo can enjoy his nice ratty old 15″ powerbook.
Windows L&F
Glen Stampoultzis complains (rightfully) about the Windows XP look and feel. It fits in a lot better than his example makes out for our usage – not sure if that’s just because we never use JFileChooser (use java.awt.FileChooser instead, JFileChooser is awful in every L&F) or if it’s some particular setting somewhere. Anyway, if you’d like to pick up a whole bunch of fixes for the Windows L&F (both XP and non-XP), take a look at WinLaf. It provides some really nice fixes for the L&F on Windows and is very easy to integrate. Be warned though, it occurred a very heavy performance penalty in our application, though I still can’t say why exactly. We wound up removing it again because we just didn’t have time to fix it before release. If a few more people get behind it and particularly if someone hits it hard with a profiler it could be a very useful project. OS X users should check out QuaQua which is a similar kind of thing for OS X. I haven’t used QuaQua myself though.
Yes!
I’ve reclaimed top spot on the google ranking! Ha! Take that Adrian Sutton! (If you have no idea what I’m talking about see the front page and this entry.)
WhereIs Redesign
It seems WhereIs Australia is getting a face lift and much needed usability improvements. Gone are the six clicks to get your directions and cryptic instructions, now it’s replaced with a simple, enter start and end destination (on the one page no less – AND you don’t have to pick the street type from a drop down anymore) and bang you’ve got your results. No more “Did you really mean exactly what you just typed in or were you only joking” page. Better yet, the Wacky Mario Ramp feature has been turned off. For those of you not familiar with WhereIs.com.au and Brisbane’s roads – Brisbane has a lot of on and off ramps that go winding around at all kinds of weird angles and quite often one ramp leads onto another which leads to another. If you were unfortunate enough to have to navigate through such a section (like the rather central Riverside Expressway) using WhereIs instructions, you’d be trying to follow something like: (Straight) South East FreeWay (Straight) Riverside Express Way (Right) Ramp (Left) Ramp (Straight) Ramp (Left) Ramp (End) Destination Needless to say you had no chance whatsoever. Now however you’d get something more like: Turn left at CORONATION DR, BRISBANE Turn right at CORONATION DR [RAMP], BRISBANE Continue along BOOMERANG ST, BRISBANE Turn left at MILTON RD, BRISBANE Much nicer. Sadly, on this trip it leads me straight into a dentists chair. Perhaps getting lost would have been more pleasant.
Dental Natropathy
I just had a phone call that went something like: Receptionist: Hello, Paddington medical centre Me: Hi, I’d like to make dental appointment with John McKenny. Receptionist: A dental appointment? Me: Yes. Receptionist: I’m sorry but John McKenny is a natropath who used to work here. We don’t have any dentists. Me: Hmmm, I think Medibank Private have stuffed up their records somehow then. I think it’s probably best if I don’t let Mr McKenny near my teeth. Receptionist: That’s probably a good idea.
New ASF Machines
Apparently, the ASF took delivery of a few new machines today. I just can’t get the image of Sam Ruby sitting around ASF head office and suddenly there’s a knock on the door and he finds a pile of orphaned servers wrapped in a blanket. Then again, I always was weird…
Google Wars
It appears the war of the Adrian Sutton’s is hotting up on Google. (Hint for those that just read the RSS feeds: take a look at the main page) The once unstoppable Professor Adrian Sutton who ruled supreme as number one search result for “Adrian Sutton” has dropped significantly down to third place, though he now has two entries in the top five with his surprise appearance in some meeting minutes. The new kid on the block Adrian Sutton has roared up the charts to take the number one spot just ahead of my own Randomness which held the top spot less than two days ago. I also hold the fifth spot with my appearance in a CVS commit message for FreeCard Stay tuned as this pathetically geeky race continues to unfold!
URL Escaping is Evil
I have come to the conclusion that URL escaping is evil and must be banished from the face of the earth. I’ve got no idea how it manages to work at all – every implementation seems to be different and the support for different character sets is a major hit and miss affair. Take for instance the string: © Adrian Sutton
It looks like a pretty simple string and all. It should be encoded as: %C2%A9%20Adrian%20Sutton
assuming UTF-8 character encoding (and I literally mean assuming since there’s no possible way to know for sure). If however you were to use the javascript escape()
function you could get any one of: %u00A9+Adrian+Sutton
%C2%A9%20Adrian+Sutton
%u00A9%29Adrian%20Sutton
It’s impossible to tell if the + sign in the first two is an encoded space or an actual plus sign (there’s no requirement for + to be escaped in URIs so many implementations leave it as is). Then you have to deal with the rather odd %u00A9 syntax which seems to be half URI escaping, half HTML entity and finally you get to worry about which character set was in use. For the record, here’s what your browser makes of it:
MarchFest Wrap-up
Wow, what a fantastic day. MarchFest was yesterday and for those who didn’t make it, you missed a sensational day. While there’s always a few things that go wrong when you put on a big production like MarchFest is, things went exceptionally smoothly and all the reports coming back have been really positive. It was particularly good to see the number of people who offered to help out and did so with such talent and energy. We even had a few people email us completely out of the blue offering to help out. Anyway, I’m going to bed as I haven’t had much sleep this weekend and I spent most of my waking time either lugging around heavy staging, PA systems or lighting or running madly between the two venues to make sure that the next band at each venue got set up quickly with all the sound stuff they need. The life of a stage manager and sound engineer in one is never easy. Good fun though. Hopefully now I’ll have some more time to do some recording with Soul Purpose and finish writing my musical.
JavaScript Fun
Nick Chalko talks about setting onsubmit dynamically. The solution he received from Alan Gutierrez which is good, but overly complicated. Since I work for a company that just so happens to do some amazingly funky stuff with JavaScript, here’s some fun you can have with it. Firstly, lets take the original solution:
<form name='CDiceComponent_0' id='findApplication'
action='/cdiceWebApp/custom/component/bridge.jsp' method='post'>
<script>
document.findApplication.onsubmit = function() {return checkForm();}
</script>
and simplify it to:
<form name='CDiceComponent_0' id='findApplication'
action='/cdiceWebApp/custom/component/bridge.jsp' method='post'>
<script>
 document.findApplication.onsubmit = checkForm;
</script>
JavaScript functions are just string variables so you can pass them around by just dropping the () at the end. Here’s how you could store the existing onsubmit function:
HttpClient – Moving On Up
The vote to start the motion of HttpClient out of jakarta-commons to become a fully fledged Jakarta sub-project has been declared passed. I’ve just done up an initial draft of the proposal that will need to be put to the Jakarta PMC to approve the move (they noted that it was coming and that it was most likely to pass at the meeting they just recently had). This is the first bit of Apache “politics” I’ve been involved in so I’ll be interested in the feedback. I’m just not sure if it’s considered politics when everyone agrees as they seem to do so often on the HttpClient list. It’s good to have a team that’s working so well together.