Leveraging Translation Techniques in Plone

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Leveraging Translation Techniques in Plone

Andreas Pauley-6
Hi all,

I've been looking at some techniques and technologies used in
translation, some of them seem quite promising.

http://en.wikipedia.org/wiki/Computer-assisted_translation
http://en.wikipedia.org/wiki/Translation_memory

Here are some interesting examples:

Terminology
-----------

Certain translation software supports terminology/glossary matching.
A list of terms (similar to our Language Specific Terms) is used to
display relevant terms for the target language while you translate.


Translation Memory
------------------

Quite a lot of tools allow you to specify other existing translations as
a TM database.
This is then used to provide a list of possible matches.
Often this is implemented as an "automatic translation" feature where
matches above a certain percentage could be automatically used, but
marked fuzzy.
Using this approach a translator can leverage some of the work of other
big localization efforts.


Translation Validation
----------------------

Some parts of the quality of a translation can be automatically validated.
These include:
   - Checking that variables have not been translated.
   - Checking that XML/HTML tags are still valid.
   - Checking that quotes (") are correctly escaped.
   - Checking that capitalization and punctuation are correctly
preserved (for languages that have these concepts).
   - Checking for blank strings in the translation.
   - Checking that brackets match the source.
   - etc. etc. etc.

I've tried some of the open source tools available that support these,
notably OmegaT, KBabel, poEdit, Pootle, Pootling and Gtranslator.
Unfortunately all of these make certain assumptions about the PO files
that are not necessarily the case with Plone.
As explained in the Guidelines for Translators, Plone po-files uses
ID-based msgids (with good reason - other localization projects ran into
the same issues and also devised ways around the limitations of Gettext
.po files).

For any of the above functionality, however, it is assumed that the
msgid of a po-file will contain the English source string.

Given the fact that Gettext has since introduced msgctxt to provide
string context, should we not consider using it in Plone po-files?

It is my understanding that the KDE project is in the process of
replacing their own solution with msgctxt.

Regards,
Andreas Pauley.

--
http://translate.org.za/

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Plone-i18n mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-i18n
Reply | Threaded
Open this post in threaded view
|

Re: Leveraging Translation Techniques in Plone

Alexander Limi
Administrator
On Mon, 12 Mar 2007 13:52:24 -0800, Andreas Pauley  
<[hidden email]> wrote:

> It is my understanding that the KDE project is in the process of
> replacing their own solution with msgctxt.

Yes, we could easily support this — however, I can't seem to find the  
proper specification of this, nor whether current releases of gettext  
actually ship with this support enabled.

Also, until poEdit (the only editor that works on all of our platforms)  
supports it, we can't really switch.

--
Alexander Limi · http://limi.net


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Plone-i18n mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-i18n
Alexander Limi · http://limi.net

Reply | Threaded
Open this post in threaded view
|

Re: Leveraging Translation Techniques in Plone

Hanno Schlichting-2
In reply to this post by Andreas Pauley-6
Hi,

Andreas Pauley wrote:

> Translation Memory
> ------------------
>
> Quite a lot of tools allow you to specify other existing translations as
> a TM database.
> This is then used to provide a list of possible matches.
> Often this is implemented as an "automatic translation" feature where
> matches above a certain percentage could be automatically used, but
> marked fuzzy.
> Using this approach a translator can leverage some of the work of other
> big localization efforts.

poEdit has some limited support for translation memories. The practical
approach taken by the Welsh language team has been to convert our files
into fully gettext compatible onces, translate them and convert them
back. They have written a simple script to automate it and this is
provided with the PloneTranslations package in the i18n/utils directory.
The script is called fillmsgstr.py and has some good documentation in it.

> Translation Validation
> ----------------------
>
> Some parts of the quality of a translation can be automatically validated.
> These include:
>    - Checking that variables have not been translated.
>    - Checking that XML/HTML tags are still valid.
>    - Checking that quotes (") are correctly escaped.
>    - Checking that capitalization and punctuation are correctly
> preserved (for languages that have these concepts).
>    - Checking for blank strings in the translation.
>    - Checking that brackets match the source.
>    - etc. etc. etc.

We already have an automated test suite that takes care of checking
syntactic correctness, variables still being present and proper
quotation. XML/HTML should not be contained in our files at all, so we
don't need those checks. All the other checks are highly language
dependent and unless there are some tools out there which we could
leverage I don't think it's worth spending time on those.

> I've tried some of the open source tools available that support these,
> notably OmegaT, KBabel, poEdit, Pootle, Pootling and Gtranslator.
> Unfortunately all of these make certain assumptions about the PO files
> that are not necessarily the case with Plone.
> As explained in the Guidelines for Translators, Plone po-files uses
> ID-based msgids (with good reason - other localization projects ran into
> the same issues and also devised ways around the limitations of Gettext
> ..po files).
>
> For any of the above functionality, however, it is assumed that the
> msgid of a po-file will contain the English source string.
>
> Given the fact that Gettext has since introduced msgctxt to provide
> string context, should we not consider using it in Plone po-files?
>
> It is my understanding that the KDE project is in the process of
> replacing their own solution with msgctxt.

The msgctext support has landed in gettext version 0.15 which was
released in July 2006. While I think we should switch to that approach
in the mid-term, we do need support for it on the various levels of the
stack we are depending on. The same goes for the long needed
plural-forms support.

Both of these are not supported by the gettext module included in
Python. In return both of them are also not supported on the Zope (3)
level, which our translation machinery is based on.

As long as we don't have support on the lower levels of our stack for
these two concepts, we simply cannot use them. Every help on
implementing these two features on the lower levels is much appreciated ;)

Hanno


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Plone-i18n mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-i18n
Reply | Threaded
Open this post in threaded view
|

Re: Leveraging Translation Techniques in Plone

Andreas Pauley-6
In reply to this post by Alexander Limi
Alexander Limi wrote:
> Yes, we could easily support this — however, I can't seem to find the  
> proper specification of this, nor whether current releases of gettext  
> actually ship with this support enabled.

I also couldn't find clear documentation on the web.
Luckily the info page for the gettext version on my system (0.16.1)
mentions it:
"""
It is also possible to have entries with a context specifier.
They look like this:

      WHITE-SPACE
      #  TRANSLATOR-COMMENTS
      #. EXTRACTED-COMMENTS
      #: REFERENCE...
      #, FLAG...
      #| msgctxt PREVIOUS-CONTEXT
      #| msgid PREVIOUS-UNTRANSLATED-STRING
      msgctxt CONTEXT
      msgid UNTRANSLATED-STRING
      msgstr TRANSLATED-STRING

    The context serves to disambiguate messages with the same
UNTRANSLATED-STRING.  It is possible to have several entries with the
same UNTRANSLATED-STRING in a PO file, provided that they each have a
different CONTEXT.  Note that an empty CONTEXT string and an absent
`msgctxt' line do not mean the same thing.
"""


>
> Also, until poEdit (the only editor that works on all of our platforms)  
> supports it, we can't really switch.

I've tried using Pootling a few times. It's still in beta but it looks
promising.
It runs on Linux, Windows and OS X (reportedly) and it uses the
translate-toolkit API so I guess it should support msgctxt.

Regards,
Andreas

--
http://translate.org.za/

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Plone-i18n mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-i18n
Reply | Threaded
Open this post in threaded view
|

Re: Leveraging Translation Techniques in Plone

Andreas Pauley-6
In reply to this post by Hanno Schlichting-2
Hanno Schlichting wrote:
> poEdit has some limited support for translation memories. The practical
> approach taken by the Welsh language team has been to convert our files
> into fully gettext compatible onces, translate them and convert them
> back. They have written a simple script to automate it and this is
> provided with the PloneTranslations package in the i18n/utils directory.
> The script is called fillmsgstr.py and has some good documentation in it.

Interesting, I'll have a look at that script.


> We already have an automated test suite that takes care of checking
> syntactic correctness, variables still being present and proper
> quotation. XML/HTML should not be contained in our files at all, so we
> don't need those checks. All the other checks are highly language
> dependent and unless there are some tools out there which we could
> leverage I don't think it's worth spending time on those.

Yes, I was also thinking along the lines of leveraging external tools.
Partly because I was not aware of the test suite and partly because I
would like to see how useful the translate-toolkit checks are when run
on Plone :-).

On an interesting side-note, we have recently implemented some
infrastructure in the translate-toolkit for language-specific features.
This allows a language to eg. specify its valid characters and what
character it uses for the end of a sentence, between words etc.
This can be used to write some language-specific checks.
But we have not actually started writing such checks :-)

> The msgctext support has landed in gettext version 0.15 which was
> released in July 2006. While I think we should switch to that approach
> in the mid-term, we do need support for it on the various levels of the
> stack we are depending on. The same goes for the long needed
> plural-forms support.
>
> Both of these are not supported by the gettext module included in
> Python. In return both of them are also not supported on the Zope (3)
> level, which our translation machinery is based on.

I am not 100% familiar with the complete cycle that happens in Plone i18n.
Are you talking about the end of the cycle, when Plone needs to read the
translated po-files?
Are both the Python gettext module and the Zope 3 infrastructure used
independantly, or is Plone using Zope 3/Five, which in turn uses the
gettext module?

>
> As long as we don't have support on the lower levels of our stack for
> these two concepts, we simply cannot use them. Every help on
> implementing these two features on the lower levels is much appreciated ;)

Yes, this is somewhat of a bummer.
I could help by finding the bug trackers for these projects and
submitting some bug reports.

Regards,
Andreas

--
http://translate.org.za/

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Plone-i18n mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-i18n
Reply | Threaded
Open this post in threaded view
|

Re: Leveraging Translation Techniques in Plone

Weisglass Ofer
hay
 
very good ideas here
I found few times bugs with the po files and still waiting for answers
also there are parts on plone 3 that I not sure if they already i18n
 
this is example email with still no replay  
I am translating the cmfedtions-he to Hebrew with po edit and after translation of 16 values and trying to save there is error
 
cmfeditions-he.po:49:120: invalid control sequence
 
he.po:49:120: invalid control sequence
 
Any ideas?
 
Ofer Weisglass


 
On 3/13/07, Andreas Pauley <[hidden email]> wrote:
Hanno Schlichting wrote:
> poEdit has some limited support for translation memories. The practical
> approach taken by the Welsh language team has been to convert our files
> into fully gettext compatible onces, translate them and convert them
> back. They have written a simple script to automate it and this is
> provided with the PloneTranslations package in the i18n/utils directory.
> The script is called fillmsgstr.py and has some good documentation in it.

Interesting, I'll have a look at that script.


> We already have an automated test suite that takes care of checking
> syntactic correctness, variables still being present and proper
> quotation. XML/HTML should not be contained in our files at all, so we
> don't need those checks. All the other checks are highly language
> dependent and unless there are some tools out there which we could
> leverage I don't think it's worth spending time on those.

Yes, I was also thinking along the lines of leveraging external tools.
Partly because I was not aware of the test suite and partly because I
would like to see how useful the translate-toolkit checks are when run
on Plone :-).

On an interesting side-note, we have recently implemented some
infrastructure in the translate-toolkit for language-specific features.
This allows a language to eg. specify its valid characters and what
character it uses for the end of a sentence, between words etc.
This can be used to write some language-specific checks.
But we have not actually started writing such checks :-)

> The msgctext support has landed in gettext version 0.15 which was
> released in July 2006. While I think we should switch to that approach
> in the mid-term, we do need support for it on the various levels of the
> stack we are depending on. The same goes for the long needed
> plural-forms support.
>
> Both of these are not supported by the gettext module included in
> Python. In return both of them are also not supported on the Zope (3)
> level, which our translation machinery is based on.

I am not 100% familiar with the complete cycle that happens in Plone i18n.
Are you talking about the end of the cycle, when Plone needs to read the
translated po-files?
Are both the Python gettext module and the Zope 3 infrastructure used
independantly, or is Plone using Zope 3/Five, which in turn uses the
gettext module?

>
> As long as we don't have support on the lower levels of our stack for
> these two concepts, we simply cannot use them. Every help on
> implementing these two features on the lower levels is much appreciated ;)

Yes, this is somewhat of a bummer.
I could help by finding the bug trackers for these projects and
submitting some bug reports.

Regards,
Andreas

--
http://translate.org.za/

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Plone-i18n mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-i18n


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Plone-i18n mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-i18n
Reply | Threaded
Open this post in threaded view
|

Re: Leveraging Translation Techniques in Plone

Hanno Schlichting-2
In reply to this post by Andreas Pauley-6
Hi,

Andreas Pauley wrote:

> Hanno Schlichting wrote:
>
>> The msgctext support has landed in gettext version 0.15 which was
>> released in July 2006. While I think we should switch to that approach
>> in the mid-term, we do need support for it on the various levels of the
>> stack we are depending on. The same goes for the long needed
>> plural-forms support.
>>
>> Both of these are not supported by the gettext module included in
>> Python. In return both of them are also not supported on the Zope (3)
>> level, which our translation machinery is based on.
>
> I am not 100% familiar with the complete cycle that happens in Plone i18n.
> Are you talking about the end of the cycle, when Plone needs to read the
> translated po-files?
> Are both the Python gettext module and the Zope 3 infrastructure used
> independantly, or is Plone using Zope 3/Five, which in turn uses the
> gettext module?

I was only talking about the end of cycle when we already have a po file
and need to query it to return a translation.

The full process right now is:

- PlacelessTranslationService uses a customized version of the gettext
file from Python to compile the po file into a mo file on Zope startup.

- A page being rendered is usually written in TAL which includes markup
for marking translatable text. The current syntax is sufficient for the
msgctext support but would need to be extended to cover plural forms.

- If the text being translated is defined in Python code it probably
used the zope.i18nmessageid package, which would need to be extended as
well to support both of these new options.

- The TAL machinery calls a translation service. Currently a mixture of
Five/Zope3/PlacelessTranslationService, which all would need to be extended.

- The translation service looks up the mo file and uses the Python
gnutranslations/gettext module to access it. This would need to be
extended as well.

For the process of creating pot files from source code, we mainly use
i18ndude which in turn is based on zope.tal and Python's gettext module
again. And again all those don't support the new options.

All of this is a bit of work and quite a lengthy process as with for
example Python the next version which could include a new feature like
this one is probably not usable with Plone for the next one to two years
:( Which makes it even more important to actually start on the lower
levels and add the support we want ;)

Hanno


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Plone-i18n mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-i18n
Reply | Threaded
Open this post in threaded view
|

Re: Leveraging Translation Techniques in Plone

Weisglass Ofer
Hay Hanno
 
Seems that there going to changes in the RTL css as well
 
Hope more people will contribute in this area - and still looking for more arabic translators
 
Ofer Weisglass

 
On 3/27/07, Hanno Schlichting <[hidden email]> wrote:
Hi,

Andreas Pauley wrote:

> Hanno Schlichting wrote:
>
>> The msgctext support has landed in gettext version 0.15 which was
>> released in July 2006. While I think we should switch to that approach
>> in the mid-term, we do need support for it on the various levels of the
>> stack we are depending on. The same goes for the long needed
>> plural-forms support.
>>
>> Both of these are not supported by the gettext module included in
>> Python. In return both of them are also not supported on the Zope (3)
>> level, which our translation machinery is based on.
>
> I am not 100% familiar with the complete cycle that happens in Plone i18n.
> Are you talking about the end of the cycle, when Plone needs to read the
> translated po-files?
> Are both the Python gettext module and the Zope 3 infrastructure used
> independantly, or is Plone using Zope 3/Five, which in turn uses the
> gettext module?

I was only talking about the end of cycle when we already have a po file
and need to query it to return a translation.

The full process right now is:

- PlacelessTranslationService uses a customized version of the gettext
file from Python to compile the po file into a mo file on Zope startup.

- A page being rendered is usually written in TAL which includes markup
for marking translatable text. The current syntax is sufficient for the
msgctext support but would need to be extended to cover plural forms.

- If the text being translated is defined in Python code it probably
used the zope.i18nmessageid package, which would need to be extended as
well to support both of these new options.

- The TAL machinery calls a translation service. Currently a mixture of
Five/Zope3/PlacelessTranslationService, which all would need to be extended.

- The translation service looks up the mo file and uses the Python
gnutranslations/gettext module to access it. This would need to be
extended as well.

For the process of creating pot files from source code, we mainly use
i18ndude which in turn is based on zope.tal and Python's gettext module
again. And again all those don't support the new options.

All of this is a bit of work and quite a lengthy process as with for
example Python the next version which could include a new feature like
this one is probably not usable with Plone for the next one to two years
:( Which makes it even more important to actually start on the lower
levels and add the support we want ;)

Hanno


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Plone-i18n mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-i18n


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Plone-i18n mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-i18n