Thursday, March 26, 2009

News reporting n such

Loved the Jon Stewart new-A ripping on Jim Cramer. (As I recall, the last time Stewart really took someone to task a week later they didn't have a show any more.)

So there's this local nitwit, Richard Cohen, who writes for the Wash Post, who complains about Stewart's having done this...and goes on to prove he didn't actually watch the episode. And it all feels like whiny-boy stuff, and "how come my newspaper's not doing so well?"

Which of course is because newspapers are in the process of ceasing to exist. ALL of them.

Which is kinda sad, but probably inevitable.

More about the Jag

Spring is here, and I'm getting itchy to get the Jag fixed up and drive it.

Finally got a replacement (not new) alternator to try out (hope that's all that is wrong).

Just ordered new inner tubes for the wheels, so I can get the tires all replaced too. Then I can get out and really wind it up!

Having it out of commission the past few months has been painful...going past it in the garage, knowing I can't do anything but look.

Last I fiddled with it a few weekends back when there was some actual warmth outside, I took the non-functional A/C off, and the old ALT. With luck, this new(er) one...

Monday, March 23, 2009

O/S basics

I don't know why this is, the solution seems absurdly obvious...Only recently did calendar and address book functionality become part of the operating system, but it's so basic you have to wonder why it took so long.

It's not like that kind of info is difficult to store and make available...

I was back on this issue because of getting the new MacBook (17") at the beginning of the month (3/09). I wanted to get it set to read the calendars on my G5 and wife's Mini. But I had lost how to publish those other ones, in the Leopard upgrade a year ago. So we hadn't been sharing calendars for months. And that extended to our PDAs.

The problem has several aspects: #1--I don't want my calendars out on the web. I put things in it that are only for her to know about. #2--I'm not buying Leopard Server for another umpty-hundred $ (apparently "LS" has built-in assistance for managing this). #3--I wasn't remembering the correct name for what I wanted to do; I kept thinking it was CalDAV, which it would be for "LS".

What I needed was WebDAV for iCal. Regrettably there doesn't seem to be any helper tool around to get you through the awkward need to use Terminal (cmd-line). Not that I can't, been a unix user for nearly 20 years...but there are more than a few tiny details.

Turned out that I still had the old setup properly in place, I just needed to do the Apache parts, which are about creating a userid/password and a httpd config block. Of course, this means YAPTR (yet another password to remember), which really means it has to get written down somewhere...

I'm still migrating from the old Powerbook onto the new one...there's built-in help for something that complex...why not for WebDAV? Mostly done, except for things like my old address book, the keychain, mail archive, and probably something else I don't remember. I did the manual drag/drop so far, haven't run the migration tool. (Why not? Because I had to completely reconfigure my home network again. Seems I have to start over every single time I add a new wireless device, because I don't remember how I did it before. Even after writing it down. Too many passwords.)

Spelling check should have been part of the O/S years ago, too. Why wasn't it? It's not like that is hard either...granted, a big dictionary is a good-sized file, which would have been problematic >20 years ago, but now? Should be a standard function so that any app can use it.

Built-in general-purpose database, too. That, too, would have been a problem in the 80s...but it ain't now. Granted, there is no shortage of free databases around, but they take a lot of work to do anything, even something simple.

Which is why Excel became the defacto database for an awful lot of information. I mostly use Filemaker for that sort of thing, for my own personal data. It's pretty friendly.

But a built-in database would be the right kind of place to store all kinds of stuff...you could argue that the filesystem IS a database, and in a very loose sense, that's true, inasmuch as you can store anything. But it doesn't really have any built-in organizing capabilities; limited sorting; usually doesn't handle large quantities of files in a single folder very well...

What other things should be O/S built-in capabilities?

Saturday, March 14, 2009

Text Processing etc, part 2

Been continuing on with another text processing tool. This one will be able to read in a story, and spit back the top several topics in that story.

Actually this is 3 tools. First is the training input creator. Second is the model creator. Third is the runtime document processor.

Training creator shows you an input doc (web-p, text file, etc; simple things), allows you to mark it with topics, and save the result. Hmm...just occurred to me: should I allow PDF as input? that's not actually too hard to accomplish, with a PDF ripper front-end.

Model Creator takes the training input and creates a recognition model. It's not a statistical model. I was thinking about using SVM (Support Vector Machine) in this, but that kinda wants actual percent probabilities, which I don't have. I probably could if I think of a way to normalize values.

Runtime processor receives the story you want to know about, and returns some topics.

I also use an english word dictionary in this (although I don't there's any requirement to do that). You'd think that finding a good one wouldn't be that hard...I thought that. But we are wrong! Finding dictionary files is not that hard, I have several. The biggest one I could find online had over 200K words, but you'd be amazed at the basic words that were missing... "cat", for example. And "horse"? You'd be likewise amazed at the really unusual words it DOES have: "catachrestically" -- what the heck is that? And why is "catawampously" in there? Have you ever even seen those two before?

This is a bizarre dictionary. And it's WAY bigger than the others I found...although it seems likely that the others have a lot more of the basic/common stuff and not so much the exotic words. Maybe I just need to merge them all...

But this weirdness has forced me to track unknown words, since a lot of them are fairly common.

Why can't we have a good, pretty complete, free dictionary word list? i.e., one that is better than the ones I've found recently...

Related to that: ever looked at WordNet? An interesting project. If you look around, you can find a number of browser-based WN viewers: enter a word, get a view of the words or phrases that are nearby in terms of some flavor of semantics. You also get use-type (noun, verb, etc). And some more exotic aspects that I don't quite know what they are.

What you don't get is also interesting, because I went looking for this. You don't get the root word for your word. E.g., if your word is "catawampously", the root word for that is "catawampous". So who cares about root words? Well, the topic-ID software would have a better model if I could convert training words and runtime-doc words into their root/stemmed form.

So I do of course know about the Porter Stemmer, I grabbed the java version, and have integrated that...problem is that it overstems, in my opinion. (I have read some of the more formal study work that compares stemmers; Porter is really good--for english--and really bad--for other languages. Porter is entirely suffix-based, and only knows english suffixes. (You could do the same thing for other languages, I'm sure.) Porter will make an error like stemming "heading" into "head"--where "heading" most likely means "direction" and "head" most likely means "part of your body where your brain is", although I suspect that both have less common usage that is exactly reversed. The formal comparisons suggest this is a small problem. So I don't know. It'd be easy enough to insert the Porter stemmer into the pipeline and try it out--except that I don't know how I'd tell if it was better...

Leads you to wonder why there's no serious dictionary-based stemmer...I've read about them, too, and what you seem to get is a hybrid that does a little of the Porter style, and more table-lookup.

So why isn't there a pure dictionary-based table-lookup stemmer? You'd base it off a really large dictionary (you see where this has been going now). That would not be perfect, you'd get some errors where the stem is different depending on noun/verb usage. You'd only need a hash-table to implement this. If you needed to be fancier, you could deal with the noun-verb-etc aspect, but figuring that out in the first place is probably more expensive than the error (and is itself an imperfect process, so you'd be introducing a different flavor of error into the answer).

This doesn't make sense to me...a pure dictionary-based stemmer would be time-consuming to create, but trivial to use. And it would work for all languages where root words exist (i.e., not chinese/japanese/etc). It'd be a little large, a complete english dictionary is a few megabytes, whereas the Porter stemmer code is a few kilobytes.

---

Further notes: this weird dictionary, "unabr.dict", appears to be associated with password-cracking...which might explain the missing common words. Might. Assoc with crossword puzzles, too?

This URL:

http://www.puzzlers.org/dokuwiki/doku.php?id=solving:wordlists:about:start

seems to say a lot more about word lists, and "unabr.dict" especially. I downloaded all the word lists mentioned there. Should produce a better set than "unabr.dict".

Should also point out that when I have written "dictionary" here, I really mean "word list", not a dictionary-with-definitions-n-stuff. That probably explains why I found "unabr.dict" early on.

Also to be noted: these lists aren't going to contain names, excepting when names are other words. Vaguely annoying, if you're doing what I'm doing with word lists, because "Bernanke" and "Greenspan" are going to correspond to several money/gov't topics, and probably nothing else.

Friday, February 27, 2009

New computes

Had a rare opportunity yesterday...buy a brand-new Macintosh at a seriously discounted price. MicroCenter sent me a sale flyer last week, special pricing starting yesterday. Mac Powerbook Pro was $1799, 17", 2.5 GHz, 2GB RAM, 250 GB disk. Intel dual-core processor. Nice machine.

Deal too good to pass up, like when I got my refurb G5 in 2004. Only problem is that NONE of my existing Mac software will run on this machine...going to have to download entirely new versions of the freebies, and pay for a couple of others. At least I can do it piecemeal, unlike if I replaced my G5. Dreading that day...

So I got it home just in time to find out that my internet access is dead for the next 18 hours...sob!

Stephanie Plum...

my favorite funniest book character...new book "Plum Spooky" came out in Jan...why not before Xmas I don't know...that seems poor timing by the publisher.

Anyway...this is one of those "between the numbers" books. Previously they were a little different, there was more character development in them, and less going on. This one is actually a Number book sans the boyfriends, and with this "Diesel" guy instead.

Which is just fine, it is just a screamingly funny as the numbered ones, which the other "Betweens" were not.

Looking at the length of it: 300pp, just like the numbers books, and twice the thickness of the other "betweens" titles. I got it for 30% off at Borders Express at the mall, and read it over the next 48 hours...great fun.

You really gotta wonder why Stephanie hasn't been turned into a movie. (yes, Grafton hasn't either, and V.I. Warshawski was more like VI wash-out-ski, so maybe that's it--except that Plum would be a lot funnier than those others)

found a free game...

called NEXUIZ

it's a pretty serious download, 380MB for the zip file...it's basically a deathmatch FPS game. runs well on my XP-64 box. (whereas UT04 has gone bad for some reason)

get it here

I've only played it a tiny bit...it's based on DarkPlaces, a Q1-source-derived engine.

and I am stuck at a seemingly simple low-grav level. It's instagib, which is not my fave, and I'm doing badly...and this is an early level. Have not figured out the rest of the weapons, so previously I've mostly been lucky.

some online animation

http://www.blender.org/features-gallery/movies/

in particular you want to look at Big Buck Bunny (8min) and Murnau the Vampire (27 min). BBB was an easy download. Murnau was not; apparently there are torrents, but that seems to not work for me any more...despite a brand-new BT. Which is too bad, this is an exceptional bit of animation; not available in hi-def, tho.

fwiw, F W Murnau was the director of Nosferatu, the first, silent, vampire movie. Apparently you can see it online HERE.

Monday, February 23, 2009

XML software

Can someone explain to me why it is that org.w3c.dom.Document (java api) does NOT have a method something like "writeToStream(OutputStream os)" which will dump the entire document to some stream (i.e., a file)?

Why? If the corresponding class "DocumentBuilder" can read a file (or URL), why can we not have a method to write?

I wrote one about 8 years ago, I know I can go find it, but sheesh...why?

(continuing here with man's favorite activity)

Wednesday, February 18, 2009

My computers...

Being unorthodox in most things...

I have a day job at an interesting company, which can remain nameless...I get to do some exotic stuff. And some pedestrian stuff, that's often a need too.

I mostly get to do things I like, which is great.

One of the things I get to do (because I choose to): I have my own computer. Well, really, all the engineers do. The diff is that I build my own from pieces. Everyone else: std config whatever the company is buying (well, with some variations, but not much).

Mine: completely different. This way I get what I want (which is NOT a generic Dell, which has gone from best to pretty sucky in recent years), when I want, and can upgrade when/how I want.

So I have this:



It takes an AMD Athlon x64 X2, 2GB RAM. On-board video plus a good PCI-Ex card means three monitors, which is pretty dang cool. And it is *quiet*. It's now 3 yrs old, so what was hot at the time is less so now; this doesn't really bother me a lot, except that now I am doing some work where more power would be helpful. (At the least I want a quad core, and 4GB RAM.)

I'm also thinking about a touch-screen, having played with those new HP units a little. Touch-position precision is weaker than I'd prefer there...but a touch-screen would be a nifty thing to do some UI work with, and there seem to be some < $1000 in the 25-inch range.

The process at work for getting a computer is kinda broken, inasmuch as I could get the Dell, but not something as slick as what I have. So I bought/built it myself. Replaced a drive when the original boot-drive started doing that bad clicking thing (see blog late 08 on this), got a video card aimed at decent game perf, just the right RAM (2GB dual-channel). My preferred kbd and mouse combos. My preferred monitors. Etc. I buy my own software, too, so I get what I want when I want.

New machine coming probably a year from now. If I can get something in roughly the same form-factor. With a 4-core or better CPU, at least 50% faster clock-rate.

This is the machine I do all my programming on. And nearly all my game-playing. And not much else.

My Macintosh G5 is where I do all the other stuff, like music, photos, video, database, taxes, personal info--the non-game fun stuff...that's had a chunk of upgrade, too, but less. RAM (5GB) and disk (1TB).

another Jaguar XKE note

having bought the car, I find myself noticing how many others ripped it off back in the 60s and 70s.

The original (this is mine):



It looks like nearly every sports car in the 60s was a rip-off of the style...without managing to look as good.

Felt like a blog on this, having seen the trailer for The Graduate go by last night on TCM. Didn't quite recognize Dustin Hoffman's car...it's an Alpha Romeo Duetto:

a little less curvy than the Jag, but clearly derivative.



As was the Corvette tear-drop:

which was clearly the previous body style updated to look like the Jag.

and the Datsun 240Z:

which did at least have a price advantage over the Jag (about half, actually).



Even the classic 1964 Aston Martin (the Bond car of all time):



which shows up briefly in the recent Bond film Casino Royale.

and of course the late 70s Mazda RX-7:

Computer gadgets

I had the Logitech wireless kbd/mouse combo...I really liked the kbd feel. the mouse was ok, the good part about the pair was that they worked together off the same wireless, and the mouse had a recharging dock...but the mouse had not been working too well in the dock for a while now, so when wife and I went to Circuit City, I got the last Logitech Dinovo Edge they had, for windows.



This is a nifty kbd, has its own dock for recharging (which is going to cause a problem when it can't hold a charge any longer). I'd like it a little better if the arrow-keys combo was shifted to the right about an inch. It's not, and the home/end/del/insert combo is now vertical instead of horiz, which means my game custom keys are out of whack...

It's a bluetooth kbd, which means it comes with a BT/USB adapter, and therefore I could get a BT mouse now...but I have this nice Logitech Nano wireless mouse I like...the one with the micro-transmitter unit.

So my wife now wants the same kbd for her Mac...I probably do, too, for that matter...except that I don't have bt on my mac; I think she does, but turned off.

Sunday, February 08, 2009

Software errors

Nothing worse than having a bizarre error creep into someone else's software that you have to use, and it's too complex for you to fix...

take this example: google for "eclipse ioconsole updater error" and see what you get. I just moved to eclipse 3.4 yesterday, because it looks like I'm going to have to do some C code soon (feh!).

tweaked an older program today, to try out an enhancement (from a friend), and suddenly I've got the below error. This is something that has gone wrong in eclipse, has to do with your doing too much System.out typeout. Most of the complaints (man's favorite activity) you find in google on this subject suggest it's something about line length, but it's not. It's just total typeout.

Here's the actual error:

!ENTRY org.eclipse.ui 4 0 2009-02-08 17:43:10.270
!MESSAGE Unhandled event loop exception
!STACK 0
org.eclipse.swt.SWTException: Failed to execute runnable (java.lang.ArrayIndexOutOfBoundsException)
at org.eclipse.swt.SWT.error(SWT.java:3777)
at org.eclipse.swt.SWT.error(SWT.java:3695)
at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:136)
at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:3800)
at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3425)
at org.eclipse.ui.internal.Workbench.runEventLoop(Workbench.java:2382)
at org.eclipse.ui.internal.Workbench.runUI(Workbench.java:2346)
at org.eclipse.ui.internal.Workbench.access$4(Workbench.java:2198)
at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:493)
at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:288)
at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:488)
at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:149)
at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:113)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:193)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:110)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:79)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:386)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:179)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:549)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:504)
at org.eclipse.equinox.launcher.Main.run(Main.java:1236)
Caused by: java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at org.eclipse.swt.custom.StyledTextRenderer.textChanging(StyledTextRenderer.java:1295)
at org.eclipse.swt.custom.StyledText.handleTextChanging(StyledText.java:5467)
at org.eclipse.swt.custom.StyledText$6.textChanging(StyledText.java:4850)
at org.eclipse.ui.internal.console.ConsoleDocumentAdapter.documentAboutToBeChanged(ConsoleDocumentAdapter.java:302)
at org.eclipse.jface.text.AbstractDocument.fireDocumentAboutToBeChanged(AbstractDocument.java:645)
at org.eclipse.jface.text.AbstractDocument.replace(AbstractDocument.java:1148)
at org.eclipse.jface.text.AbstractDocument.replace(AbstractDocument.java:1176)
at org.eclipse.ui.internal.console.ConsoleDocument.replace(ConsoleDocument.java:82)
at org.eclipse.ui.internal.console.IOConsolePartitioner$QueueProcessingJob.runInUIThread(IOConsolePartitioner.java:533)
at org.eclipse.ui.progress.UIJob$1.run(UIJob.java:94)
at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35)
at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:133)
... 22 more



So it's just an array OOB problem. It arrived in 3.3, and is still present in 3.4, which I just started using yesterday.

Looking at the reported msgs and responses, it seems to be a problem in more than one place in the eclipse source code.

Gad.

Looks like I'm going back to 3.2, I can't live with this. Will only do the C dev with 3.4. Not worth my time to try to fix it for them.

Man's favorite activity

not sex...just as well, we'd have overpopulated ourselves to death.

not tv, although we do spend a lot of time on that...

no, it's complaining.

That's right...complaining. Came to this conclusion a couple of years ago...

look at my other post about How We Learn, that was the genesis. We complain about things that have made us unhappy. Why? Because when we are babies, when we complain (i.e., cry), we get made happy pretty quick--fed, diaper changed, whatever. I suspect that folks who complain a lot probably cried a lot as babies.

Tuesday, February 03, 2009

Oblivion, redux

Started a replay of Oblivion. Yes, I already put in over 1000 (yes, thousand) hours on this before...thought I would try out some alternate strategies.

1) How far can you level up in the training area? Well, sneak can be at 100. Block can be over 25. Other things can be in the 20s. How to get sneak that high: when you first encounter the "sneak" goblin, he never turns around, so you can level-up on that behind him. Thing to do is go to 25, so you get the damage bonus, then whack it and take the loot, and go into the next room, where there are the 4 goblins, incl the one on patrol. Wait until the patrol goblin passes by, then follow to find a stalagmite pair in the dark that you can get stuck behind. Now, put a weight on the forward key, and you can walk away for a while (this is going to take several hours). Come back when you need to approve the 50,75,100 level dialogs, and zowie! You are the Expert of Sneak. Unfortunately, you can't get Athletics up at the same time, you actually have to change position. For Block, you stand and let the rats attack you one at a time, and hold out your shield. This isn't fast, but it's effective. Heal yourself along the way, and kill the rats when you can't recover. Move on to the next one, perhaps with new armor. You'll want to repeat this routine again later, when you need to level-up on armor.

Bonus: pick a character type where Sneak is a major skill. This way when you leave the training area, you can go sleep somewhere and level-up 10-12 times immediately. (Course that might be dangerous, as the opposition is suddenly a chunk better than you, and you only have a punky weapon). If I could, I'd want to design a character type where all the things you can level-up on fastest/soonest are primary skills: Sneak, Block, Alteration,

2) Go straight for the Mages Guild, and do their tasks. Practice on some low-level spells that are not attack, while standing around. Your aim is to get to Alteration Level 50, because that is when you can use Chameleon--and you can enchant your armor with it. Along the way you have to either capture some souls, or find some loaded soul gems. IIRC, with Common souls, you can enchant Chameleon at 14%, Greater Souls at 17%, and Grand souls are 20%. You'll have to sleep-level-up along the way so you have enough Magicka to do the enchanting, but once you can, with Chameleon > 100%, you are undetectable by anyone, which pretty much makes you invincible. I found a couple of Grand Soul gems with grand souls in them, which is great, because you aren't going to encounter any for a while.

3) Good loot and interesting opponents don't really show up until about Level 10. Recall all the squawk when the game originally came out about how the opponents level'd up with you? i.e., the game does not ever get easier in terms of fighting...well, true enough, until you learn the trick about Chameleon. But the real problem is that you do need some money for buying things, but that's really only at the beginning, when you need to buy spells; you're going to find adequate armor on opponents. And excepting when you attack a clannfear which reflects damage, you don't even need armor when you have 100% Chameleon (other than needing to wear enough charmed items to reach that 100%).

4) Fast travel does nothing for your skill increases...but it lets you join Mages Guild sooner, and Chameleon.

5) Do the task about the missing brother in Chorrol/Cheydinhall, and then the follow-on about the missing sword, but DO NOT turn in the sword--you want to keep it and use it. This is the best sword you can get for a long time.

6) Clear out any relevant caves/etc BEFORE taking on any kind of escort assignment. NPCs all operate via Artifical Stupidity, so they are going to run into fights they can't win. Granted, official escorts can't usually be killed, but there's that one "take weapons to cave X and clear the goblins out" where they CAN be, and if one of them IS, later rumors mention your failure. Whats-er-name the orc and the Black Bow Bandits task, she can be killed too, but she does at least wait for you to say it's ok to tag along...and when you have Sneak 100 and Chameleon 100, you can kill anything anytime anywhere without taking any damage, so you don't want anyone else getting in your way.

7) Avoid starting the main story/quest line until you have Chameleon = 100. Fast travel around Kvatch to make certain...because once you start it, the Oblivion gates start appearing. They're fairly dangerous, but if you have Chameleon 100, you don't care.

8) Let summoned chars do your fighting. Until you have chameleon 100, this is important. Otoh, it does mean that your weapon attack skills atrophy.

9) When you go to Leyawiin, go to Rowena Galentius' house, and whack Everscamps to your heart's content. They won't attack you until you attack them, and even then only one at a time. I shot arrows into them for 10 Marksman levels. I whacked a bunch for ten Blunt levels. I punched a bunch for 20 Hand-to-hand levels.

10) Begin the game with "Bag of Holding" plug-in. This lets you carry an emormous amount of stuff, as opposed to going back and forth hauling loot to the merchants (which would otherwise be infuriating, and ultimately something you stop doing).

11) Do the paid training once you start having money. Granted, this is only 5 skill levels per major level, but you want to pay-train the major levels.

-----

Largely good strategies. I've played 120 hours, am level 12, Master of the Fighters Guild (didn't do that first time), near the top of Mages Guild, skill level ~50 on most (100 on sneak), Chameleon 100, have NOT begun the main story line so no Oblivion Gates yet, and I have not traveled too far or done too much yet. And I can continue to major-level-up for a while yet, probably at lest 5 more, maybe 10. MANY places to visit and explore.

Text processing concepts and tools

In the past 15 years, I have worked on text-processing software tools more than once, and I'm doing it again here of late.

While it doesn't take an Advanced Degree (tm) to understand *most* of it, some aspects do get pretty exotic.

How I started way back when I've used several of the mentioned tools. Most don't really meet my needs or wants.

I participated in some of the MUC episodes (6 and 7, I think), and have known about the MET and ACE episodes.

There are others, of course.

There are other tools around that do the named-entity job. I wrote one myself, because the one I had used most a lot had some flaws I didn't care for (one of which was occasionally a show-stopper), and some experimental purposes.

What I would consider an interesting set of text-processing capabilities:

Tokenizer (separate words from each other and non-words)
Reconstitutor (re-assemble words or other things from separate tokens)
Stemmer (separate root words from their suffixes; Porter Stemmer is the standard)
Pattern matcher (match word sequences)
Name lists (annotated/typed names of whatever)
Dictionaries
Topic Finding
WordNet

----

Other related tools whose value I'm not convinced of:

POS tagging
Sentence splitting/parsing

-----

Why are these tools of interest or value?

There is a lot of text/words content on the Web, and in databases. No possible way to read it all, and nearly no way to even find out what you might *want* to read. How do you find all the stuff you *should* read? Or stories that mention things of interest? How do you find stories that are on topics of interest but didn't happen to use the words you expected (i.e., defeating google)? What if it was in a foreign language--which REALLY defeats Google..?

You need some help.

Which leads to the two tools of interest.

1) Named-entity recognition. Find various reasonably-unique-meaning words/phrases
2) Topic recognition. Stories on any given topic are likely to use a lot of the same words.

A third tool of interest would do this: recognize relationships between words in the stories; this could include the simple concept of pronoun-references, but could also be more complex relationships, like "Barack Obama is the President of the United States" would have a person name, a location name, a job title, and the relationship between all of them. In the MUC bake-offs this was known as Template Entity Recog. It's dramatically much harder than the others.

Name recog is important because you can use it to mark up stories as being about that particular name, without necessarily having seen that name before. Topic recog is valuable because you can then find stories "about" something-or-other, without having to know any of the right keywords. Of course the list of topics isn't going to be tiny, so choosing the right topic is not necessarily trivial.

-----

Peculiarities:

There are a few, but not many, human language families. One group are the "Romance" languages, which are derived from Latin. Many european languages are this type. There are the pictogram languages like Chinese, Japanese, and Korean. There are a few oddments, like Thai, which has no related languages (IIRC). Arabic languages are another family.

Writing direction: left to right, right to left, top to bottom...likewise varies, but probably corresponds closely to origin.

Use of an alphabet, and whitespace. The pictogram languages don't have an alphabet in the same way that Latin languages do, nor do they use white space as word separators. I'd argue that these are fundamental flaws in the languages' written form.

All these things complicate computing, because they lead to language-specific solutions.

-----

Words are important. Without them you cannot express concepts, and you can't really invent new concepts. Language has to be mutable. But let's have the computer do some of the work.

Thursday, January 15, 2009

Jaguar XKE notes

Have you seen this?

http://jaguar-xke.blogspot.com/2009/01/jaguar-xke-sports-car-have-you-ever.html

"You will pay less amount of money for second hand cars than for new cars. The difference in cost or price can going up to over ten thousand dollars. If you are thinking about buying a second hand car make sure to check the cost of a new car of the same type and see how much you can save."

(looks like ESL)

While that first sentence is generally true, it really isn't for an XKE. Not that you can buy a new XKE. You could certainly buy a different used Jaguar model, for a lot less. But a used XKE is *not* cheaper than a new car...cheaper than *some* new cars, but not many. OK, a cheap XKE might be more cheaper, but then it may well not be one worth getting, because you're going to have to do a chunk of work on it.

The rest of the blog entry is comparably off-target.

Wednesday, January 14, 2009

On Dying

Turned 50 last year. The end of my life is closer than the beginning. I have two grandparents who lived to be 90+, but I am certainly past the halfway point.

Feels like my health started downhill 5 years ago, beginning with the kidney stone. Actually maybe it began a couple years earlier with some kind of bronchitis attack. Throat hasn't been the same since. May be having the occasional heart murmur trouble the past few years.

Leads you to wonder...but that's not what this blog post is about.

I'd prefer to die pretty quick, rather than gradually deteriorate. My mother-in-law has Parkinson's, that's a slow degeneration. Alzheimer's is too; dementia, etc...and you're not allowed to decide to just die, and by the time you really need to be able to, you don't even have the ability to make the decision.

My dad died of pancreatic cancer. Gradual deterioration for six+ months, and then fairly quick the last 30 days. (I have the feeling he had some Agent Orange exposure in Vietnam.)

Is this life all there is? Is there an after? What happens after? Should I want whatever it is? What would we be "reborn" as? Am I reborn as a human-shaped being? At what age? (i.e., am I reborn at age 25, or the same age as when I died?) Am I a more/less-likable person? Better looking? Am I still "me" physically and mentally? What about everyone else? Will I have to worry about job/income/housing/food etc? Just like now? Will I have to deal with the same amount of obnoxious other people? The same ones?

Read the comics enough, and the impression you get (at least from Family Circus) is that once dead and living in heaven, that is all about standing around on the clouds, talking. OK, that's not the daily struggle for food and shelter. But what happens when the obnoxious person decides he wants the cloud you're on?

If the afterlife is just like this one, I'm not too interested. I'm not going to be interested in fighting the same kinds of battles for all eternity. But I don't want to just stand around and talk. Reading on some religious-based websites you can readily find that heaven is not going to be much like this, but in ways that we cannot imagine, and that we shall all be made perfect. Which really means that a lot of us are not going to be the same person. Wants won't be the same, either, so one expects less inter-personal conflicts. Or maybe they are just different ones?

All unknowable. But you gotta wonder...

Artificial Stupidity

I started through Spellforce 1 again, as mentioned last month. Played as a fighter, did maximum FPS before doing any RTS on any level--it's like a different game. Started on the first expansion. It does look good on a 24" monitor.

And then I got hammered by the NPC's Artificial Stupidity again.

I started this expansion as a wizard this time. Never played as that char before, because it seems too hard--primarily because you run out of mana, whereas a fighter never runs out of sword. Useful to have such in the group, but not to BE such. I don't remember how I played this one level before; it has some scripted behaviors I can't control properly.

I'm at this point where I have to escort a group of refugees. They don't move very fast, but they will fight; and they're weak. So they are probably going to get killed. A couple of levels back I had a team of Dark Elves, one class of which can summon things, and another can revive dead as skeletons. So you really have a lot of extra fighters.

It should always be the case that when you have an escort mission, you should be able to tell the escortee(s) "Wait here!" because you are going to go clear a path. And then you should be able to go back and say "Follow me!" and have that happen.

I start a new map with these refugees, it's an ice/snow location. As soon as I move, they start heading towards this locked gate. The key to the gate is on some giant wolves nearby. So I have to kill the wolves. I have some ice elf archers, but the group of us is not strong enough to swarm the wolves. I have to do the whole rope-a-dope routine in order to stay alive. This means that the wolves will get close enough to get the refugees to attack, meaning I lose half the refugees.

If I could have told the refugees to stay put a ways back, wolf problem can be solved. (ok , alternative: only send my archers forward, me and the refugees stay behind until wolves are dead.)

So on we go through the gate. The refugees slog onward, we encounter additional opposition, but I can deal with that, until we pass the ice-elemental-spawner. They start to attack, and while they are up ahead, the refugees halt and wait for me to kill the elemental spawner. Well, I'm too weak for that, and the refugees will NOT follow me to the next gate. Which is going to mean we are all going to die.

This is because of unrealistic behavior. Artificial Stupidity. "Let's attack the giant wolves with our bare hands! They're only five levels stronger than we are!"

Sunday, January 04, 2009

Is our President read?

Just a week ago, Richard Cohen, writing in the Washington Post, reacts to an op-ed piece by Karl Rove some days earlier that I didn't see, asserting that apparently George W Bush reads A LOT. Apparently something like 100 books per year.

That's two per week. Really? Shouldn't take an Advanced Degree to figure this out...

For comparison:

I have been a heavy reader since about 1972 or so. Prior to that I just didn't have enough access. About 1970 or so I began to have enough of my own books that I was re-reading them a lot, in addition to new ones. Mom took us to the library fairly often, as she was a heavy reader too (from having been stuck in bed for a year as a child, apparently with TB, although apparently decades later that was debunked).

Since 1972 I've read about 3000 books. I still have most of them. That's almost two per week. The shortest ones are probably 125 pages. The longest, over a thousand pages.

I read fast. Damn fast.

In 5th grade I had a nearly unique experience in school--all the 5th-graders took speed-reading. This was in Montgomery, Alabama, in 1968/69, not a location you connect with advanced thinking like this...I'm not aware of this happening anywhere else, really. i was only there for a year, but this was almost unimaginably valuable. (I was back there in 10th grade, and took typing, comparably valuable.)

The scoop: we learn to read by reading aloud. (Think back on your earliest years in school, and before.) So we read at the same speed we talk. On average, this is about 150 words per minute. Everyone in my class started speed-reading training at 150 words a minute. The trick to going faster, in your brain, is that you have to decouple reading and speaking. This is doable by most folks, and pretty much everyone can progress to about 300 words a minute. This is roughly one page in a novel every minute (a page in a printed book usually has about 300 words, I've counted this a number of times in the past; it varies with font-size, but 300 is a good approx). I think I remember everyone in my class being at least at 300 by the end of the school year.

(Lacking speed-reading training, you don't read faster than you talk, and I don't mean the "Evelyn Wood" noise, that isn't really speed reading, but it too requires you to decouple from your speech.)

How was this done? They had a film-strip-like machine that would move a sliding box across a line of text, L to R, then repeat with a new line of text. (There's a computer program that more or less duplicates this, called "Ace Reader"; the sliding box motion is jerky rather than smooth, I found it jarring to try to read that way.) You have to move your eyes to follow the box, so you begin to separate eye movement and subconscious vocalizing. The complete text was a story. You'd take a quiz at the end. High enough score on the quiz, and you moved ahead 25 words/min the next week. The machine's speed incremented in 25 words/minute quanta. So we only did the jumps once a week. At the beginning of the training everyone is at 150. Next week, some still are, some have moved on. By the end of the school year, the spread has increased, and there are kids at most speeds. Nearly everyone has moved beyond 150. I am in the fastest group, at 625, along with 2 or 3 others. Yes--I read 4X faster than I did before.

Damn fast. I still read pretty fast now, but it's variable, depending on the content. A technical manual is a slow read; Janet Evanovich is fast, maybe even faster than 600...

Which means that most novels unfold for me at the pace of a theatrical movie. AND, it means that reading 100 in a year isn't that hard or unlikely. Altho these days I'm busy enough with other things that I don't read that many. Suppose you read one page/min. Suppose the average book is 300 pages, so Bush reads 600 pp/week. Roughly 100 per day, or 100 minutes per day. Does he actually have that kind of time?

But apparently in this Karl Rove article they've been keeping a list of them (which sounds a bit artificial to being with). I haven't seen the list, apparently Cohen has. Apparently the list content has its own interesting features, but that's his discussion. My blog entry argues against his even having done the reading, regardless of what it was.

I don't even have a list of what books I've *bought* in the last year, much less read.

But wait...what did Rove mean by "read"? Did Bush read every word? Or just the first paragraph in the chapters? Skim the chapters? If we assume he reads at 150, then he didn't read 600 pp/week. The President just would not have that kind of time, that's about 3 hours per day. Any more, *I* don't manage to have 3 hours/day for it (although 100 pp takes me < 1 hour).

Cohen's article is about the books themselves. Apparently they are biographies, and their thematic content is such that they would be reinforcing Bush's self-image, and offering some personal vindication for his actions as President. Are there in fact 500+ books like that so that one *could* read that many? That too strikes me as unlikely--but I can imagine it, and if there's really a list...

My conclusion: the mechanics of it indicate that Bush does not, and has not, read 100 books per year. (Of course, if he's had that same speed-reading training I have, well, maybe he did.)


(Aside: why I think this reading machine does this well: our eyes/brains are attracted to motion. Why? I think it's probably ancient racial memory--things that are moving could be predators, so we need to focus on them. There's an interesting bit of imagery/video I'm thinking of here; it begins with a still photo, mostly of non-uniform vertical lines, but when you see part of it move you are able to resolve that it is a tiger (vertical stripes) obscured by nearly-vertical vegetation leaves; no motion = no danger, motion means the tiger (danger) needs to be watched.)