Solr Search Index Backups?
If you have a massive set of documents that you’re using Solr to search (let’s say a few million HTML pages) how much should you worry about losing the search index?
It is of course always possible to reindex the original documents, but that would take a fair while, so should you keep a backup of the search index? If you restored the backup, how would you identify which documents needed updating?
Amazon EC2 As A Webhost
We need to move our company wiki and JIRA instance to a server with more RAM and CPU to spare as they’re pretty slow on the current overloaded virtual server, so we’ve been looking at a few different options. One that came up was using Amazon’s EC2 and S3 services. We knew straight off that we didn’t need the scalability they offered but getting some experience using them could be beneficial and we really didn’t know anything about what they actually offered so it was worth a quick look.
Where I’ve Been….
You might have noticed a distinct lack of posts recently, that’s because I’ve finally followed through on the whole engagement thing that happened a couple of years ago. I took the opportunity to have three weeks off work and detechify (detox for techies) so didn’t wind up blogging while I was away. I must admit it was quite nice to spend the week away on honeymoon without even mobile phone reception on a tropical island. Next time we’ll try not to do it in the middle of winter.
Converting A Partition Between Bootcamp and OS X
I play with OSs a bit, so a when I get my new MacBook, I obviously installed Windows via bootcamp. Later, Apple released something under NDA which also required a dedicated disk partition (for arguments sake, let’s call it Leopard), so I installed it over the top of my bootcamp partition.
Of course, one you reformat a bootcamp partition, Bootcamp assistent refuses to do anything with your system making it impossible to reinstall Windows. You can however fix this quite easily with the diskutil command line utility, without needing to repartition. Just reformat the drive as MS-DOS FAT32.
You Know Your Server Install Is Minimal When…
$ unzip ~ephox/vmware-debian-etch-r0-mini.zip
bash: unzip: command not found
Might have been just a little bit picky when I first installed that box….
JCR Woes
So we’ve got a new internal system that we’ve built on top of JCR. Currently we’re using Jackrabbit as the repository, but eventually it will be ported over to something like IBM Portal or something like that. Unfortunately, right now we’re deploying the app to a pretty limited server – both in terms of CPU and RAM.
It turns out that using Jackrabbit with the Derby persistence manager in that kind of situation is a horrible, horrible idea. Everything works great on systems with modest amounts of CPU and RAM but once we deploy to that poor little virtual server in the sky page load times skyrocket and the whole thing becomes unusable.
UI Design and Preferences
Ken Coar complains about some of the changes in FireFox 2.0 and mentions:
My basic plaint is as usual: when changing the user interface, don’t violate the Principle of Least Astonishment and force the change on the user. Make it the default, perhaps, but always provide a preference option that lets the user keep the old UI behaviour. The user should be in charge of changing his work habits, not the software.
Dependency Management
If ever there was a problem that just wouldn’t die it has to be inter-project dependency management. If your code depends on external libraries, it’s pretty simple to pick whichever solution you prefer – either grabbing a version from a repository or checking jars into your source control. However, if you depend on a project that you control, it gets so much messier.
If the projects are small, it’s probably a good idea to just set up the build system to build them all together – effectively making them separate projects even if for development purposes you can just build this bit or that bit and utilize precompiled versions of the dependencies.
PermGen Nightmares
That permanent generation had better provide some pretty damn amazing optimizations or I’m going to add it’s inventor to my list of people to track down and torture at next JavaOne. It turns out you can only reload a Tomcat webapp so many times before the PermGen space fills up and everything dies. That’s annoying enough in development but it means I can’t upload a new WAR file to our test server regularly without also bouncing Tomcat. Right now our continuous integration server (Bob the Builder) is doing just that after every commit so the server isn’t staying up all that long.
Caching in Tomcat – SOLVED!
It took forever, but I’ve finally how to stop Tomcat adding a Pragma: no-cache header to any resources in a secure context. You need to set disableProxyCaching to false, so if you’re using basic authentication you need a valve like:
<Valve className="org.apache.catalina.authenticator.BasicAuthenticator"
disableProxyCaching="false" />
That needs to go within the
<Context>
<Valve className="org.apache.catalina.authenticator.BasicAuthenticator"
disableProxyCaching="false" />
</Context>
I found deploying as a WAR didn’t work (it seemed to ignore the context.xml), but deploying the exploded files did work and that’s fine by me.
Wowsers! Sun’s Updating The Swing Text APIs
Apparently Sun has finally decided to give some love to the Swing text APIs with the addition of a removeElement method. What the linked article fails to mention is that in the particular example given, you can remove the list item with a simple document.remove(e.getStartOffset(), e.getEndOffset() - e.getStartOffset());
When you get into trouble with the swing text apis is when there are two elements that start and end at the same point – ie: a list as the only child of a table cell:
Improve Your Code By Writing About It
I’ve spent a fair bit of time writing articles for LiveWorks! lately – posting weekly isn’t as easy as it seems, particularly when you have three weeks leave coming up. Most of the articles include some kind of code, be it JavaScript or Java and an explanation of what the code does, why it does it and how you might use those techniques in other ways. As I go through writing the article trying to explain my code it really highlights how nonsensical my original design decisions are. Even when I’ve gone back over code and carefully refactored it to be clear, when I come to write the article I almost always rearrange things again.