What has you here today?    work history (html) about me tajik bookmarks
MAY, 2007 → ← JULY, 2007

MySQL & Latin1 woes … E4X … Pirahã … Alcabala 22nd of June, 2007 POST·MERIDIEM 10:53

Okay, for that eager plethora heh of my readers who have heavily used non-ASCII UTF-8, who stored that UTF-8 with metadata saying that it was latin 1 in a MySQL 4.1.11 database, and who need to move to a MySQL 5.0.41 where such thoughtless trust of the program not to corrupt your data no longer works, I have enlightenment on how to do that migration.

  1. Dump your existing MySQL database, and specify latin1 as the default character set. This last is important, since otherwise MySQL assumes that Windows 1252 is the default character set, and that Windows 1252 cannot encode U+009D, for example. Tell me, MySQL folk, what exactly should the octet with value #x9D mean as a character if not U+009D? Hmm, what’s that? You’ve no good answer? Interesting. Command:
    mysqldump -u user-name --password=password --default-character-set=latin1 database-name
  2. Edit the database dump file, replace latin1 with utf8 everywhere that’s not in your data.
  3. Copy it over to the new machine and load this dump file into the new server:
    mysql -u user -p database-name-not-password,-sorry-to-confuse-you < modified-dump-file-name
    and give the password for the database server (which you have configured, right? Right?).
After that, you should have your UTF-8 passed to you as UTF-8 once more..

ECMAScript for XML is the unwieldy name of a recent standard for processing XML data with JavaScript, and after a couple of days working with it, I’m totally impressed. For me, the initial stumbling blocks were the namespaces and xmlns; but a

var jdfns = new Namespace("http://www.CIP4.org/JDFSchema_1");
and a subsequent addressing of all elements (attributes, etc) with jdfns::element-or-attribute-name resolved that quickly enough, yay. But after that, for example, I commented one evening that were I particularly perverse, I could parse a configuration file to find an integer ordering of some set of attributes; the next day, I found this translated to ten lines of code. Thoroughly recommended if you use XML and JavaScript on a regular basis, though perhaps irrelevant if your code needs to function on Internet Explorer.

And thirdly, in energetic contrast to Emma and Simon’s take on it, I found this New Yorker article on Daniel Everett’s work with the Pirahã really, really good. It’s a magazine article, and as such it doesn’t try to treat the linguistics in detail, any more than the recent article on Григорий Перельман treated the mathematics of the Poincaré conjecture, but it deals well with communicating the sociology of the disagreements the Pirahã provoke; it quotes Michael Tomasello in a critical but diplomatic tone, and gives a vivid picture of the occasional hellishness of tropical field work.

The impression I get from it (and it is to my discredit that I hadn’t read the relevant papers already, but in my defense they were on lingbuzz, which to anyone not interested in generativism is as interesting as the theological debates of the 7th Day Adventists) is of the Pirahã as the apogee of anti-intellectualism; when other language communities have had number systems that lacked in the fine differentiation of most western languages, they were happy to pick them up, but for the Pirahã the difference seems to have been a social pressure not to.

Word of the day: قبالة qabālat (v.n. of قبل), in Persian qabāla, qubāla: surety, contract (especially of bargain and sale); in Spanish as la alcabala, an historical sales tax; in Hebrew as kabala קבלה, meaning invoice/receipt.

Last comment from on the 7th of August at 6:52

[Seven older comments for this entry.]

FotC … Luckily, I’m not attending a college to drop out of … Çin 1st of June, 2007 POST·MERIDIEM 09:09

Sunday I spent mostly listening to Flight of the Conchords, a fine, fine New Zealand band who have, for example, this piece on Youtube: Business Time. Besides that, not doing anything constructive.

Saturday, I came across a German mattress shop with a large sticker ‚Preiѕhіt‘ on its front window.

Since then, meh.

I’ve been pasting interesting links (interesting to linguistics nerds, that is) to #linguistics on irc.freenode.net on the invitation of Francis Tyers for the last few weeks now, and an agreeable corner of IRC it is too. Haven’t been conscientiously logging, so can’t really refer to any of them here that are not already posted to http://​del.​icio.​us/​aidan/​

Word of the day: Чин is Tajik and Turkic for ‘China’; not unrelatedly, this was the term Marco Polo used for the country, the word ‘China’ itself being introduced to Europe by other authors.

Last comment from Ibrahim on the 16th of July at 13:33
Tajik people also use Хитой for "China"

[One older comment for this entry.]