[Platforms] Principle decision needed

Neil Holdsworth NeilH at ices.dk
Wed Apr 25 07:47:06 BST 2012


Dear Don,

Thanks for your comments, we'll get back with a full answer but just wanted to clarify point 2). The link that you are missing is http://en.wikipedia.org/wiki/Typographic_ligature . I had a look at the ALA-LC site and couldn't quite see if the Nordic special characters were assumed to be part of the roman, and if so where the transformation table was. Could you clarify this?

Thanks, Neil

-----Original Message-----
From: Donald Collins [mailto:donald.collins at noaa.gov] 
Sent: 24 April 2012 16:35
To: Neil Holdsworth
Cc: Sjur Ringheim Lid; Lowry, Roy K.; Marilynn Sørensen; platforms at mailman.nerc-liv.ac.uk; dick at maris.nl
Subject: Re: [Platforms] Principle decision needed

All,
I've been discussing this thread with my colleagues here and have a few 
ideas to include in the discussion.

Our database programmer noted that the discussion had not resolved the 
character encoding issue for the fields, but maybe that is implicitly 
understood to be UTF-8 for the 'other language' synonym field and 
US-ASCII for the 'english' entry. These choices would limit the 
'english' name to a very specific character set while the synonym name 
could be effectively any alphabet's character set.  Specifying the 
character encoding for the two fields should minimize some of the issues 
Roy raised about the havoc possible with different encodings for the 
same character.

- Agree with 1) keep 'english' as the default, although maybe 
identifying it as ascii_name (vice utf8_name for the synonym) would be 
more explicit. All records would require name/ascii_name, some records 
would have synonym/utf8_name.

- Agree with 2) set some rules for converting non-english alphabet 
characters. The link in Marilynn's email appears to be truncated, so I'm 
not quite sure of the exact wikipedia page she referenced. One of our 
librarians suggested the American Library Association - Library of 
Congress (ALA-LC) Romanization Tables at 
http://www.loc.gov/catdir/cpso/roman.html as an authoritative resource 
for character transliterations. It is my understanding from Linda Pikula 
that GE-MIM is also discussing the rules for transliteration of 
non-english alphabet characters, but they have not reached any agreement 
or recommendation.

- Agree with 3) allow a single instance 'other language' synonym 
associated with the correct language code. This does not necessarily 
have to be the primary language used in the 'country' database element.

Best regards,
Don


On 3/29/2012 5:16 AM, Neil Holdsworth wrote:
> Hi Sjur,
>
> Indeed we would be able to programmatically identify the non-english name, which would be wise for validation however i think if we are to do this then we should positively identify the language that has been used in the synonym. I think we all agree so far that we
> 1) keep english as the default
> 2) set some rules for converting non-english alphabet characters
> 3) allow a single instance 'other language' synonym associated with the correct language code
>
> Neil
>
> -----Original Message-----
> From: Lid Sjur Ringheim [mailto:sjur.ringheim.lid at imr.no]
> Sent: 29 March 2012 11:06
> To: Neil Holdsworth; Lowry, Roy K.; Marilynn Sørensen;platforms at mailman.nerc-liv.ac.uk
> Cc:dick at maris.nl;Donald.Collins at noaa.gov
> Subject: SV: [Platforms] Principle decision needed
>
> Hi Neil and Roy,
>
> Is there any scenarios where a ship would have more then two names registered? Usualy the only reason to register a second name, as i can see it, would be that the original name contains characters not in the english alphabet. If that is the case it would make it quite easy to pick out the right language to present when you need lang=en as it would be the one without strange characters.
>
> Another way this could be solved is to make one of the names the default name.
> We are at the moment updating our internal systems and allowing multilingual synonyms. The way we get around the problem with choosing the right synonym to present is to assign one of them as the default synonym.
>
> As i mentionend earlier we would welcome a possibility to register names containing non english alphabet characters as some of the ships we handle do have names with characters from the norwegian alphabet.
> We should also agree on a standard way to represent the characters when translating them to English as Marilynn earier stated.
>
> Sjur
>
> ________________________________________
> Fra: Neil Holdsworth [NeilH at ices.dk]
> Sendt: 29. mars 2012 10:30
> Til: Lowry, Roy K.; Marilynn Sørensen; Lid Sjur Ringheim;platforms at mailman.nerc-liv.ac.uk
> Kopi:dick at maris.nl;Donald.Collins at noaa.gov
> Emne: RE: [Platforms] Principle decision needed
>
> Hi Roy,
>
> I think the first assumption of the default language is safe enough. I'm not quite 100% so sure on the 2nd assumption but having scanned the ship names and their respective country flags i haven't seen any that break that rule.
>
> However, it is a decision that has some implications, so still waiting for some agreement/disagreement from others in the platform group.
>
> Neil
>
>
> -----Original Message-----
> From: Lowry, Roy K. [mailto:rkl at bodc.ac.uk]
> Sent: 29 March 2012 00:54
> To: Neil Holdsworth; Marilynn Sørensen; Sjur Ringheim Lid;platforms at mailman.nerc-liv.ac.uk
> Cc:dick at maris.nl
> Subject: RE: [Platforms] Principle decision needed
>
> Hi Neil,
>
> I'm only too well aware of the need to consider resource implications.  Should the language encoding approach not get the support I feel that it's important to establish a convention so we know which of the synonyms is the lowest common denominator (which I would assume to be lang=en in any application generating SKOS or OWL).  I would probably also assume that any synonym found containing extended characters was in the language of the ship's flag.
>
> If everybody would be happy with this kind of assumption model then I guess we could go for the easy option.
>
> Cheers, Roy.
> ________________________________________
> From: Neil Holdsworth [NeilH at ices.dk]
> Sent: 28 March 2012 09:06
> To: Lowry, Roy K.; Marilynn Sørensen; Sjur Ringheim Lid;platforms at mailman.nerc-liv.ac.uk
> Cc:dick at maris.nl
> Subject: RE: [Platforms] Principle decision needed
>
> Hi Roy,
>
> I agree that the language encoding is the most thorough solution, we suggested the synonym approach on the basis of it was quite easy to implement and would take little resources to do.
>
> If we are to go for the language encoding approach then i think we need a bit more positive agreement from the platform group - is anyone able to speak up?
>
> Best, Neil
>
> -----Original Message-----
> From:platforms-bounces at mailman.nerc-liv.ac.uk  [mailto:platforms-bounces at mailman.nerc-liv.ac.uk] On Behalf Of Lowry, Roy K.
> Sent: 27 March 2012 23:25
> To: Marilynn Sørensen; Sjur Ringheim Lid;platforms at mailman.nerc-liv.ac.uk
> Cc:dick at maris.nl
> Subject: Re: [Platforms] Principle decision needed
>
> Hi Marilynn,
>
> My concern with having a simple synonym approach is that using the language tag allows a mechanism for a single concept to have multiple labels, whereas with synonyms, each name is a separate concept.
>
> This may seem a strange comment if you haven't done any work with representing ship codes in standard knowledge encodings, such as SKOS or OWL.  The issue is that each synonym is encoded in RDF XML as a reference (such as a URL) and having multiple references for one ship code is an issue.  Some form of discriminator is required, which probably brings us back to language codes!
>
> Your concern about flag changes can be easily addressed - you have one entry for lang=en (the LCD), plus another for each country where the ship has been registered.  Note that this would mean total decoupling between the languages carried by a code and the flag of the ship.  However, in my opinion this is good.
>
> One final point.  I'm not sure if  ISO639 is the recommended standard for XML encodings. There's one we've used, but I've forgotten its name!  Hopefully Adam can help!
>
> Cheers, Roy.
> ________________________________________
> From: Marilynn Sørensen [marilynn at ices.dk]
> Sent: 27 March 2012 12:59
> To: Sjur Ringheim Lid; Lowry, Roy K.;platforms at mailman.nerc-liv.ac.uk
> Cc:dick at maris.nl
> Subject: Principle decision needed
>
> Dear Roy, Sjur and Platform group,
>
> Full multi-lingual support would mean creating a new field to hold the "language" code. This would also mean adopting ISO 639 (2 or 3 character variant) to use as the language code identifier. The responsibility would be on the platform management group to ensure that the language is correctly identified, which may not be easy if a vessel has changed ownership/flag. This still leaves us with the issue of dealing with the lowest common denominator, and even if we include local language support the "default" name attribute would still need to be easily read/translated by users of the webservices etc., and we would therefore still need a basic name with no extended language characters.
>
> A simpler solution could be to create a new attribute field for "Synonyms". The name could be translated to the English in the "Name" field and the original spelling in the original language would be added to the "Synonyms". This would require that we agree on a rule for English translation of special characters. Seehttp://en.wikipedia.org/wiki/Typographic_ligat  for a description of translations between the most common letters. We could create and agree a simple translation table based on this.
>
> What do you think of this solution?
>
> Kind regards,
> Marilynn
>
> -----Original Message-----
> From: Lid Sjur Ringheim [mailto:sjur.ringheim.lid at imr.no]
> Sent: 26 March 2012 14:01
> To: Lowry, Roy K.; Marilynn Sørensen;platforms at mailman.nerc-liv.ac.uk
> Cc:dick at maris.nl
> Subject: SV: Principle decision needed
>
> Dear Roy and Marilynn,
>
> As we are one of the countries where the ships get named using those characters we would welcome the possibility very much.
>
> The proposal Roy comes with about multilingual storage of ship names would actualy be the very best as it will make it possible to register the original name and a english friendly (possibly others?) name for users not familliar with the letters.
>
> Cheers,
> Sjur
> ________________________________________
> Fra:platforms-bounces at mailman.nerc-liv.ac.uk  [platforms-bounces at mailman.nerc-liv.ac.uk] på vegne av Lowry, Roy K. [rkl at bodc.ac.uk]
> Sendt: 23. mars 2012 22:44
> Til: Marilynn Sørensen;platforms at mailman.nerc-liv.ac.uk
> Kopi:dick at maris.nl
> Emne: Re: [Platforms] Principle decision needed
>
> Hi Marilynn,
>
> Whilst we have been able to store full Latin-1 characters for a long time (since Oracle introduced Unicode support), I prefer to avoid them for two reasons:
>
> 1) The characters are extremely difficult to type from the keyboards I use - there may be keyboard shortcuts, but I don't know them so I usually end up opening Word, inserting symbol and then copying and pasting the character to wherever I need it.
>
> 2) There are issues that we've hit several times with character encoding mismatches causing web applications to render the Latin-1 characters incorrectly - they usually end up as square boxes. I don't know the technical details - all I know is that I have had to submit multiple bug reports and have experienced fixes for Java applications causing Perl applications to break and vice versa.
>
> One way around the problem might be to introduce multilingual storage of ship names (obviously tagged with the appropriate languages) combined with the sort of Technology Google uses to allow Latin-1 and various equivalents to be discovered interoperably.  That way we could cover all bases.  We could even use that to go beyond Latin-1 into full multilingual support. What do people think about that?
>
> Cheers, Roy.
> ________________________________
> From:platforms-bounces at mailman.nerc-liv.ac.uk  [platforms-bounces at mailman.nerc-liv.ac.uk] On Behalf Of Marilynn Sørensen [marilynn at ices.dk]
> Sent: 23 March 2012 18:45
> To:platforms at mailman.nerc-liv.ac.uk
> Subject: [Platforms] Principle decision needed
>
>
> Dear Platform Group,
>
> You have seen the request for standardization of "ö" to "oe" so all vessels with "Hoegh" become "Hoeegh".
>
> There is an alternative which the platform group needs to discuss.
>
> a)      Is it time to implement extended characters and allow "ö", "ø", "ä" etc? Both ICES and NOAA can handle this change.
>
> b)      If yes, how should this be implemented?
>
> *       Update all old ship names
>
> *       Make a "point in time" change
>
> Please send your views to me as soon as possible and by 23 April at the latest. We need answers from all members of the platform group. If other groups should be contacted for their views, please let us know.
>
> Kind regards,
>
> Marilynn
>
> ----------------------------------------------------------
>
> Marilynn Sørensen
>
> Data Manager
>
> International Council for the Exploration of the Sea
>
> H.C. Andersens Boulevard 44-46, 1553 Copenhagen V.
>
> Denmark
>
> marilynn.sorensen at ices.dk<mailto:marilynn.sorensen at ices.dk>
>
> Direct tel: +45 33 38 67 20
>
> --
> This message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system.
> ********************************************************************************
> Denne mail er blevet scannet afhttp://www.comendo.com  og indeholder ikke virus!
> ********************************************************************************
> _______________________________________________
> Platforms mailing list
> Platforms at mailman.nerc-liv.ac.uk
> http://mailman.nerc-liv.ac.uk/mailman/listinfo/platforms
> ********************************************************************************
> Denne mail er blevet scannet afhttp://www.comendo.com  og indeholder ikke virus!
> ********************************************************************************
> ********************************************************************************
> Denne mail er blevet scannet afhttp://www.comendo.com  og indeholder ikke virus!
> ********************************************************************************
> ********************************************************************************
> Denne mail er blevet scannet afhttp://www.comendo.com  og indeholder ikke virus!
> ********************************************************************************



********************************************************************************
Denne mail er blevet scannet af http://www.comendo.com og indeholder ikke virus!
********************************************************************************



More information about the Platforms mailing list