Skip to content

Off to Spain

Next few weeks I’m going to get some sun light and heat in Spain. I’m first going to stay for a few days at Christian del Rosso’s place in Madrid. Christian’s my neighbour here in Helsinki and he is currently in Madrid together with his wife and little son Emilio doing an MBA. Last time I was in Madrid, I didn’t have much time for sightseeing so this time I will take some time to do that.

From Saturday, I’m renting a car and will be doing my usual neurotic routine of driving too much and seeing too little. The record still stands at around 4000 km in 3 weeks. That was on a trip to Spain where I visited Castilla y Leon and Extremadura. The goal of this trip is to go there again and this time visit all the stuff I somehow missed the last time, which is quite a lot.

I’ve spent the morning packing and (mostly) doing a bit of reconnaisance on Google. A few weeks ago I was considering to buy a travel guide when it occured to me that the lonely planet I have for Spain is mostly fluffy text and really thin on things like what are good places to see and stay. Sure it will cover the big attractions but the interesting stuff where I’m going is small villages and towns. I actually prefer staying in medium sized towns where parking is doable. My requirements are very simple: decent place to sleep, stuff to see, and lots of places to eat & drink. The average Spanish town has a plaza mayor, with a few bars and hotels around it and usually a parking garage nearby.

So that’s easy. So the algorithm is roughly: select town, drive to it, park on or near Plaza Mayor and try a few hostels/hotels. Never takes more than two or three attempts to find a decent place to sleep. Anyway, Google is good to that stuff so I spend the morning copy pasting together my own travelguide. I’ve a top five list of towns I’m probably going to stay in: Zamora, Segovia, Trujillo, Badajoz, Salamanca and maybe Leon again (depending on weather). I also have a few backup options and a long list of stuff to see and do. In the unlikely event I get bored with the area, I have Portugal and Andalucia within a few hours drive.

Tagged

Stitched Photos

I have over 50 panorama photos taken over the last few years. In my regular photo album, photos are downsized to ensure I don’t run out of disk space with my provider. However, for panorama photos this means the size becomes too small to admire all the detail inside. So I have created a separate album where each panorama photo can have a maximum height or width of 3000 pixels. In quite a few cases that is still much smaller than the original resolution (in some cases more than 10000 pixels width) but it is better than nothing.

Your browser will probably zoom the photos to fit your screen width so you will have to click on them to zoom in and use the scrollbars to pan left to right. Some of the photos actually are not traditional panorama photos but high resolution views stitched together from several photos organized in e.g. a 4×4 grid. This creates a nice effect where you get a very wide angle view of something that would otherwise have required an extremely large telelense + rediculous distance to subject to achieve.

For stitching, I have used various tools over the years. For the past two years the tool of choice for me has been Hugin which is an excellent but difficult to use tool for this purpose.

I also took the opportunity to add a bit of text to the front page of my photosite.

Tagged

Photos

While updating my photosite I noticed I had accidentally broken it with some .htaccess rule that was redirecting index.html to www.jillesvangurp/index.php. Anyway, problem solved.

I also took the opportunity to upload photos from:

  • A trip to Madrid in April for the EU Sensei project I’m in.
  • Another Sensei trip to Guildford (UK) last week.
  • A visit from my friend Mark last weekend. We rented a car and did some serious driving to Nuuksio, Hanko (with a stop at Raseborg’s castle) and Naantali. Also we celebrated midsummer together with a lot of wet Fins in Seurasari, just behind my house.

Given that I’m traveling to Madrid again next wednesday (vacation), it was about time too.

Me on Plaza Mayor in Madrid.

My friend Mark admiring the boats in Hanko.

Tagged ,

Security

Knowing slightly more than average about computers, I am regularly assumed to be an expert by family or friends. A returning question is which virus scanner to use. This one always causes me some trouble because I know the right answer is to spend some money on e.g. Norton or McAfee or to experiment with one of the free solutions.

However, I don’t really know anything about the subject because I don’t use virus scanners at all. I don’t need them. But this is hardly something to recommend to somebody who has no clue what to do here. At work we have Norton Antivirus wasting a lot of my time on my slow laptop disk. Each time it boots, it insists on re-examining the same crap it hasn’t found a virus in before, ever. It so happens that this is a particularly bad time to access the disk since a dozen processes are trying to start. So, I usually kill the process to make my laptop boot in a reasonable time frame (< 10 minutes). I think over the past few years, Norton easily wasted several working days of my time by making me wait. Arguably the economic damage of such products is worse than the problems they supposedly prevent.

Installing virus scanners is not a security measure but merely a form of deniability for system administrators. If the shit hits the fan, they can point to Norton and Norton can point to the fine print in their license (which says good luck if the shit hits the fan). Norton’s businessmodel is to provide deniability to companies. The price is in dollars and productivity lost. Norton will easily transform any laptop in a dead slow machine, especially if configured for maximum deniability (scan everything, always, every time, all the time).

I know I’m OK because I know how not to get infected, which is why I don’t run any security products at home. A little bit of hygiene goes a long way. Most virus infections are ignorant users clicking on stuff they shouldn’t be clicking. Drive by infections are also common with risky things such as active x and internet explorer exposed and not updated. That’s no concern for me because I A)  use firefox, B) don’t visit suspicious sites and only use up to date, mainstream plugins, C) have adaware filter out a lot of crap, and D) keep my shit up to date. Sure, that still leaves some room for something to slip by but I’ve never been infected by anything since I stopped accepting floppy’s from strangers (long time ago).

A few days ago, some download included Norton Security Scan which is a free scanning tool designed to make you buy the full version. Since this computer has been exposed to the nasty internet for a few years now, I thought lets see what it comes up with.

Well:

  • A tracking cookie in my browser of some ad site. Tedious but not really a risk. Also shows how crappy this tool is because I have way more advertisment related cookies that I probably should remove. However, I’m too lazy to keep track of all my cookies. Once in a while I clean them up by deleting cookies for any domain I don’t know or care about.
  • Two infected mails in thunderbird’s trashcan (w32.netsky.p@mm!enc worm). That’s risky, if you open the attachment. Using Thunderbird prevents that from happening automatically. Besides, all my mail is handled by Google these days which uses serverside malware and virus filters. So no reason for me to install a virus scanner. These two mails probably predate me starting to use gmail. This was in 2005.

So two old and obvious malware mails I deleted (or thunderbird filtered them) and a tracking cookie. No worms, no rootkits, no spyware, no adware. Just a failed attempt to make me open some shitty attachment and a cookie. Thanks for confirming what I already knew Norton. I uninstalled it.

This doesn’t prove anything of course. There’s no perfect security. But so far so good. I don’t have a firewall since I have a NAT router which stops any incoming request except the ones that it shouldn’t be stopping because I told it to. I know everything outgoing is OK because I know what software I install. My router doesn’t do UPNP configuration so I control everything manually. I don’t have a virus scanner active since all my mail goes to google which already scans my mails. All my downloads are of course a risk, so I take care to only download from respectable sources. I actually have clamwin antivirus installed to manually scan files if I don’t trust the source but I rarely have a need for it and it has never found anything. Firefox 3 should warn me against malware sites in so far they are able to keep their filters up to date. Arguably if they screw up, Norton isn’t doing much better probably.

So in short, I’m keeping my money and will take my chances. If something goes wrong, I’ll only have myself to blame and will be back online in no time because I do have backups of all my important files. The last time this happened was when a particularly nasty piece of malware struck me: windows activation failed on a fully legal version of windows XP pro after installing a new usb hard disk.

Tagged , , ,

X-Plane 9 review

Last weekend I ordered X-plane version 9. I bought version 8 early 2006 and since then I haven’t looked back. Sure MS Flight Simulator looks great but the flying sucks. Laminar consistently delivers with new features and bug fixes delivered. Version 8 got its last major update (8.64) about half a year ago and since then they have been beta testing version 9. While I could have bought it earlier, I waited until they released it.

A few days the package with 6 double layer DVDs was delivered. Installation was not so smooth as I complained about here. But I managed to sort it out and have a working X-plane 9 now. I installed European and US scenery. The 6 DVDs of world wide scenery is nice and detailed but consists of automatically computed landscapes from various databases. Europe now also includes the part I live in (Finland) which was too far north for version 8. However, I prefer to fly southern Europe, where the landscape is a bit more varied.

So you have cities, forests, roads, airports, coastlines, etc. where they should be (and in surprising amount of detail) but they lack custom content like comes with Microsoft Flight simulator. To fix that, I installed the excellent Corsica scenery, which is one of the many third party scenery packages available and one of the first ones to be upgraded for version 9. This adds a nice level of realism. Flying in from Nice (another scenery package, warning horrible HTML layout) with the new Cirrus jet was pretty cool and surprisingly easy given that the Cirrus was new to me. According to the product announcement, this plane is actually used by Cirrus themselves as well (and presumably tuned to their specifications and needs). Also, the 3D cockpit is pretty cool and much more user friendly on a PC than the average very complicated panel coming with a X-plane jet.

Technically, version 9 includes lots of improvements to the scenery rendering and simulation. The changes are outlined in great detail in the product announcement page. I have little to add except to say that it mostly works and delivers as advertised. Don’t expect to max out any of the rendering settings, they have been designed such that that is not possible with any hardware available now. In fact they just raised the bar for future hardware. If you can get your hands on a NVIdia with a few GB of video ram, X-plane will probably find a use for every byte of it. The good news is that it still looks pretty good with object detail not set to “TOTALLY INSANE” (Austin Meyer loves his capitals).

Part of the attraction of X-plane is that it is a niche product build by some dedicated people who know what they are doing and are totally focused on doing it. Considering that they have a very small programmer team and not much other people working for them, it is pretty amazing what they manage to deliver. They have to be smart and efficient about a lot of things. So their UI is totally custom and a bit wacky. But it works. The included planes are so so but there are plenty of free ones available to fix that (and some better ones for a small fee). With all these nice freeware planes out there (e.g. on x-plane.org), you have to wonder why the selection bundled with X-plane is so weak. Most of the planes don’t have 3D cockpits and quite a few even lack textures.

At the core of X-plane is an excellent and extremely detailed simulation of just about anything that flies and everything that makes it fly. I mean, they are worrying about the accuracy of the voltage in electrical systems here and how that behaves under different failure scenarios.

Tagged ,

Lucene Custom Analyzer

A second neat trick I did with Lucene this week was to wrap the StandardAnalyzer with my own analyzer (see here for the other post on Lucene I did a few days ago).

The problem I was trying to address is very simple. I have a nice web service API for my search engine. The incoming query is handled by Lucene using the bundled QueryParser which has a quite nice and elaborate query language that covers most of my needs. However, a problem is that it uses the StandardAnalyzer on everything which means that all the terms in the query are being tokenized. For text this is a good thing. However, I also have fields in my index that are not text.

The Lucene solution to this is to use Untokenized fields in the index. Only problem, using untokenized fields in combination with the QueryParser is not recommended and tends to not work well since everything in the query is being tokenized. So, you should not use the QueryParser but programmatically construct your own Query. Nice but not what I want since it complicates my search API and I need to make complicated queries on the other end of it.

What I wanted is to match a url field against either the whole or part of the url (using wildcards). On top of that, I want to do that as part of a normal QueryParser query e.g. keyword: foo and link: “http\://example.com/foo”. I’ve been doing this the wrong way for a while and let Lucene tokenize the url. So http://example.com/foo becomes [http] [example.com] [foo] for Lucene. The StandardAnalyzer is actually quite smart about hostnames as you can see since otherwise it would treat the . as a token separator as well.

This was working reasonably well for me. However, this week I ran into a nice borderline case where my url ended in …./s.a. Tokenization happens on characters like . and /. On top of that, the StandardAnalyzer that I use with the QueryParser also filters out stopwords like a, the, etc. Normally this is good (with text at least). But in my case it meant the last a was dropped and my query was getting full matches against entries with a similar link ending in e.g. s.b. Not good.

Of course what I really wanted is to be able to use untokenized fields with the QueryParser. Instead what I did this week was create a tokenizer that for selected fields skips tokenization and treats the entire field content as a single token. I won’t put the code for that here but it is quite easy:

  • extend Analyzer
  • override tokenStream(String field, Reader r)
  • if field matches any of your special fields, return a custom TokenStream that returns the entire content of the Reader as a single Token, else just delegate to a StandardAnalyzer instance.

This is a great way to influence the tokenization and also enables a few more interesting hacks that I might explore later on.

Tagged ,

WP-OpenID

I’ve been enthusiastic about openid for a while but have so far not managed to openid enable my site. WP-OpenID, which is the main openid plugin for wordpress is under quite active development. Unfortunately, until recently, any version I tried of that had some issues that prevented me from using it.

The author Will Norris got hired by Vidoop the other day to continue working on wp-openid in the context of the diso project. Diso is another thing I’m pretty enthousiastic about. So, things are improving on the openid front.

Tonight, I managed to get version 2.1.9 of wp-openid to install without any issues on my wordpress 2.5.1 blog. I’ve been testing and it seems to at least accept my openid www.jillesvangurp.com (delegate to myopenid) without issues.

So finally, my blog is openid enabled.

The delegation bit is BTW courtesy of another wordpress plugin: openid delegation. I’ve been using the 0.1 version for more than a year and it just works. Delegation is an openid concept where any website can delegate openid authentication to an external openid provider. This allows you to use a URL you own as your identity and also to switch provider without losing control of your openid url.

Tagged , , ,

Boosting Lucene search results using timestamps

Since I spent quite a bit of time looking into how to do this properly so here’s a solution to a little problem that has been nagging me today: how make lucene take into account timestamps when returning search results. I don’t want to sort the results (that’s easy) but instead when two results match a query and get the same score from lucene, I want to see the newest first.

Basically in lucene this means influencing how it ’scores’ entries against a query. So far I have been relying on the lucene QueryParser that implements a nice little query language with some cool features. However, the above requirement cannot be expressed as a query in that language. At best you might work with date ranges but that is not quite what I need.

So I had to dive into lucene architecture a bit more and after lots of digging came up with the following code:

String query="foo"
QueryParser parser =new QueryParser("name", new StandardAnalyzer());
Query q = parser.parse(query);
Sort updatedSort = new Sort();
FieldScoreQuery dateBooster = new FieldScoreQuery("timestampscore", FieldScoreQuery.Type.FLOAT);
CustomScoreQuery customQuery = new CustomScoreQuery(q, dateBooster);
Hits results = getSearcher().search(customQuery, updatedSort);

The FieldScoreQuery is a recent addition to lucene. I had to upgrade from 2.1 to 2.3 to get it. Essentially what it does is interpret a field as a float and deriving a score from it. Then the CustomScoreQuery combines the score with the score from my original query.

So far it is working beautifully. I basically added a float field to my index which is basically “0.” + timestamp where timestamp is formatted as a yyyyMMddhhmm string (lucene only has string fields). Consequently, later timestamps have a slightly higher score. I might have to tune the query a bit further by either using a weight or by manipulating the float a bit further.

If any Lucene gurus stumble upon this and have some useful advice, please use the comments.

Tagged ,

Cartoons

I like to read cartoons. I’m a regular reader of userfriendly.org, dilbert, the wizard of id, fokke en sukke and a few others. I can’t say that I’m a regular reader of Gregorius Nekschot’s cartoons, which cover such topics as multiculturalism, islam, and other rather controversial topics. Good satire can hurt and his cartoons definitely hits a nerve with some deeply religious individuals. His website is enitled “Gregorius Nekschot - Sick Jokes”. Lets say Nekschot is very blunt and to the point.  Anyway check here for an example.

Anyway, two weeks ago, Nekschot  put a rather visionary (in retrospect) post on his web site where he jokingly suggests that soon Ernst Hirsch Ballin (Dutch minister of justice) & his uniformed party members would be arresting free spirited people and deporting them for reeducation (in Dutch). The reason was Ballin’s apparent plans to broaden the scope of existing legislation against blasphemy and the analogy with Guantanomo that was being suggested was very to the point in my view. Also the reference to 1940’s party members of Hirsch Ballin that cooperated with the Nazi occupation or looked the other way is very much to the point since Hirsch Ballins motivation seems to display similar spinelessness and an apparent desire to follow up on rather intimidating threats/complaints coming from e.g. Iran and various islamist groups living in the Netherlands. Sort of the same groups of people that cheered Theo van Gogh’s murder a few years ago, who was BTW a friend of Nekschot apparently.

It seems Nekschot’s analysis was more accurate than he must have realized. Gregorius Nekschot was arrested last week on orders of a maverick Dutch attorney who seems to be more or less operating under direct orders from Hirsch Ballin. Nekschot was locked up for two days, with no trial, based on vague accusations regarding the general shocking, insulting and discriminating nature of some of his cartoons. This is sort of a new low. Having a cartoonist’s house searched by a 10 people strong police force and the victim subsequently deported to a prison is not something you’d expect in a modern, democratic country.  It happened last week in the Netherlands. Full thumbs up from the responsible minister apparently.

Nekschot is a pseudonym of course that refers to the “shot to the neck” execution style that very much characterizes his approach to humour. His real name is so far not revealed. A court case will change that and expose him to very real threats to his life. After all, one of his friends was murdered already for speaking freely.

I usually don’t do politics on my blog but this is a good reason to make an exception. It seems Hirsch Ballin lacks a sense of humor and an appreciation of free speech and the Dutch constitution.

Tagged

captcha

It seems the captcha plugin (capcc) I was using with wordpress has been broken for some time. Probably this happened when I installed wp 2.5 a few weeks ago. My friend Christian del Rosso pointed this out. I installed a different plugin now (yacaptcha) which both looks nicer and hopefully works better too.

So if you couldn’t comment because of this, try again.

Tagged ,