SyntaxHighlighter

Thursday, January 31, 2008

Growing Pains at BP3

I've been watching Bloggers for Peer-Reviewed Research Reporting and their Research Blogging aggregator with interest since learning about it from Alun Salt last week. I was taken aback by today's announcement from Dave Munger that they are turning down applications "because the blogs aren't written in English." My brain mumbled, "show-stopper." But I read on ...

I understand this problem: "The organization as it stands now simply doesn't possess the language skills to verify that blogs written in other languages are living up to our guidelines." Munger outlines some steps aimed at building community capability for handling non-English content. Since the BP3 model is predicated upon some human evaluation of a blog's "living up to [BP3] guidelines," they'll have to rise to the challenge or go out of business.

Another problem is poorly expressed: "readers might be turned off by a site that includes many posts written in a language they don't understand." How about (and this is clearly what Munger means if you read the whole post): users will need the ability to customize language and script settings. If I may, this will need to apply both to the interface, and to the filtering of content. And please don't bind these choices together! For example, I'd want an English-language interface but content in (at least) English, French, German, Greek, Italian, Portuguese, Romanian and Spanish. This is one of the reasons we chose Plone as the platform for Pleiades: it comes localization-ready out of the box, and with a minimum of work you can manage multilingual content.

If the site hopes to mature beyond Anglophone scientific content, it's going to have to go multilingual. There's a whole world of humanistic scholarship out there just waiting to go digital; and much of it is interestingly more than English.

This reminds me, I need to write a rant about Blogger's recently announced pseudo-support for bidirectional text editing. Hint: I can't embed right-to-left Arabic, Hebrew or Persian in this post, but if I were reset my blog's language settings to one of those languages then I could mix in left-to-right English (or whatever) here. I bet I'd have to manually hack the HTML to mark the "foreign" snippets for language and script per RFC4646 too ...

4 comments:

Dave Munger said...

If the site hopes to mature beyond Anglophone scientific content, it's going to have to go multilingual. There's a whole world of humanistic scholarship out there just waiting to go digital; and much of it is interestingly more than English.

Absolutely. It's just that our current interface simply can't handle those demands. Reader A may understand Portuguese, English and Spanish, but reader B only understands English. It doesn't make sense to force reader B to wade through a bunch of posts he can't possibly understand.

The planned interface for the future will allow users to specify which language posts they want to see.

Shawn Graham said...

Hi -

I recently had to approach this same problem of getting multilingual content into a wordpress blog. The plugin that I eventually chose simply encloses text within a language tag; when the reader selects the language they want, the blog filters the content displaying only the relevant language. Multiple language versions can thus be kept in one posting.

In my example, I'm using english and french. I don't write french very well, but with some automatic translation, a good dictionary, and a feel for language it becomes possible to have reasonably good multilingual material up in reasonably short order. My original post

Dhushy said...

You don't have to set your blog's language settings to Hebrew/Arabic/Persian to put text in those languages. The directionality buttons will also appear if you change your user settings (in the Blogger Dashboard).

Or you can just embed the text without using the buttons - they're a convenience, but not essential. If the direction doesn't flow as expected, you can edit the HTML to put span dir="rtl" (or ltr) tags around the embedded text, as needed.

If the text is entered with Unicode characters, no other "hacking" should be needed.

Tom Elliott said...

Dhushy: thanks for your comments ... I'll check out the dashboard for the controls and report back on my experience.

Of course the HTML can be manually edited to indicate directionality; my comments were about the GUI-based method as described (with so much fanfare) on the blogger blog.

Plopping in Unicode characters is not the same as encoding html with the appropriate language and script codes specified by RFC4646. Latin-1 is neither a language nor a writing system. Not all Unicode code ranges map to an unambiguous language/script pair.