|
Hi,
Most users want the search to ignore accents where "économétrie" finds "econometrie", "Econométrie", "Économétrie". What do you think about a PLIP to give Plone lexicon a casenormalizer that would use plone.i18n stuff to normalize ZCTextIndex lexical values ? (as lucene latin normalizer does) that would be the occasion to fix a bug, that "économétrie" does'nt find "Économétrie" (plone.i18n stuff manages that, plone lexicon, not) (I can write the PLIP and implement it... but i know it is a major issue) Thanks Thomas -- Thomas Desvenain Téléphone : 09 51 37 35 18 ------------------------------------------------------------------------------ Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Discover what all the cheering's about. Get your free trial download today. http://p.sf.net/sfu/quest-dev2dev2 _______________________________________________ Plone-i18n mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/plone-i18n |
|
On Mon, Jun 6, 2011 at 23:08, thomas desvenain
<[hidden email]> wrote: > Hi, > > Most users want the search to ignore accents > > where "économétrie" > finds "econometrie", "Econométrie", "Économétrie". I can understand this, if not use it. My translation students are capable, but they have to work on the University's locked down systems. As we are in Australia, the keyboard is set to US English, with no accents etc. Like I said, they are capable enough to discover work arounds, but they are also looking for workflow pace - and being able to search accent free would be a boon to their productivity cheers L. > > What do you think about a PLIP to give Plone lexicon a casenormalizer > that would use plone.i18n stuff to normalize ZCTextIndex lexical > values ? > > (as lucene latin normalizer does) > > that would be the occasion to fix a bug, that "économétrie" does'nt > find "Économétrie" (plone.i18n stuff manages that, plone lexicon, not) > > (I can write the PLIP and implement it... but i know it is a major issue) > > Thanks > > Thomas > > -- > Thomas Desvenain > > Téléphone : 09 51 37 35 18 > > ------------------------------------------------------------------------------ > Simplify data backup and recovery for your virtual environment with vRanger. > Installation's a snap, and flexible recovery options mean your data is safe, > secure and there when you need it. Discover what all the cheering's about. > Get your free trial download today. > http://p.sf.net/sfu/quest-dev2dev2 > _______________________________________________ > Plone-i18n mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/plone-i18n > -- Betteridge’s Law of Headlines states that “any headline which ends in a question mark can be answered by the word ‘no’ “. (via factoringprimes) from The Best of Wikipedia http://bestofwikipedia.tumblr.com/ ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Plone-i18n mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/plone-i18n |
|
In reply to this post by thomasdesvenain
Le 06/06/2011 15:08, thomas desvenain a écrit :
> Hi, > > Most users want the search to ignore accents > > where "économétrie" > finds "econometrie", "Econométrie", "Économétrie". > > What do you think about a PLIP to give Plone lexicon a casenormalizer > that would use plone.i18n stuff to normalize ZCTextIndex lexical > values ? > > (as lucene latin normalizer does) > > that would be the occasion to fix a bug, that "économétrie" does'nt > find "Économétrie" (plone.i18n stuff manages that, plone lexicon, not) > > (I can write the PLIP and implement it... but i know it is a major issue) It seems it was done in products like ploneglossary since 2006 or http://pypi.python.org/pypi/collective.latin1Splitter but it was never added into a plip. If you do this for latin you should do it for other alphabets. Regards, -- Encolpe DEGOUTE http://encolpe.degoute.free.fr/ Logiciels libres, hockey sur glace et autres activités cérébrales ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Plone-i18n mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/plone-i18n |
|
In reply to this post by Lachlan Musicman
I also would like to see the feature available in out-of-the-box Plone 4.x The biggest hassle is to put together the mappings of "similar" characters I.e., all the following unicodes could match for each other: 0x004F: [ 0x006F,0x00D2,0x00D3,0x00D4,0x00D5,0x00D6,0x00F2,0x00F3,0x00F4,0x00F5,0x00F6,0x00D8,0x00F8,0x014C,0x014D,0x014E,0x014F,0x0150,0x0151,0x0152,0x0153], # O So if anybody implementing this wants to save some time, I collected I implemented an ISplitter implementation, that I apply to application-specific instances of ZCatalog, indexing elements not in English language (although it should work also for english). Note that this is running on Plone 2.5. This allows to enter search criteria with characters of cyrilic, greek and latin languages, and supports equivalences between letters plain and upper/lower case, accentuated/diacritical, or somehow similar. It has been used inproduction and tested only with narrow unicode pyton builds (up to unicode 65535). Available in the OSOR.eu repository forge, project gvSIG-i18n http://forge.osor.eu/projects/gvsig-i18n/ TRASplitter.py TRAUnicode.py TRAUnicode_Constants.py Antonio Carrasco Valero [hidden email] Model Driven Development, sl Valencia España (Spain) www.ModelDD.org On 20:59, Lachlan Musicman wrote: On Mon, Jun 6, 2011 at 23:08, thomas desvenain [hidden email] wrote:Hi, Most users want the search to ignore accents where "économétrie" finds "econometrie", "Econométrie", "Économétrie".I can understand this, if not use it. My translation students are capable, but they have to work on the University's locked down systems. As we are in Australia, the keyboard is set to US English, with no accents etc. Like I said, they are capable enough to discover work arounds, but they are also looking for workflow pace - and being able to search accent free would be a boon to their productivity cheers L.What do you think about a PLIP to give Plone lexicon a casenormalizer that would use plone.i18n stuff to normalize ZCTextIndex lexical values ? (as lucene latin normalizer does) that would be the occasion to fix a bug, that "économétrie" does'nt find "Économétrie" (plone.i18n stuff manages that, plone lexicon, not) (I can write the PLIP and implement it... but i know it is a major issue) Thanks Thomas -- Thomas Desvenain Téléphone : 09 51 37 35 18 ------------------------------------------------------------------------------ Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Discover what all the cheering's about. Get your free trial download today. http://p.sf.net/sfu/quest-dev2dev2 _______________________________________________ Plone-i18n mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/plone-i18n ------------------------------------------------------------------------------ Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey _______________________________________________ Plone-i18n mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/plone-i18n |
|
On Sun, Jul 31, 2011 at 5:20 AM, Antonio Carrasco Valero on gmail
<[hidden email]> wrote: > The biggest hassle is to put together the mappings of "similar" characters > I.e., all the following unicodes could match for each other: Not really. We already have and ship such a list and use it as part of plone.i18n. It depends on the http://pypi.python.org/pypi/Unidecode library, which has a pretty comprehensive list and maps about 46000 characters from the entire Unicode range. So all we need to do is: from plone.i18n.normalizer.base import baseNormalize ascii = baseNormalize('some text') The baseNormalize function only uses the Unidecode mappings with some upper limit - as the phonetic mappings for Asian languages aren't good enough. This makes sense for this use-case as well, as Asian languages need different approaches for search anyways, like not doing whitespace delimited splitting. But thanks for the pointer :) Hanno ------------------------------------------------------------------------------ Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey _______________________________________________ Plone-i18n mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/plone-i18n |
|
In reply to this post by thomasdesvenain
On Mon, Jun 6, 2011 at 3:08 PM, thomas desvenain
<[hidden email]> wrote: > Most users want the search to ignore accents > > where "économétrie" > finds "econometrie", "Econométrie", "Économétrie". > > What do you think about a PLIP to give Plone lexicon a casenormalizer > that would use plone.i18n stuff to normalize ZCTextIndex lexical > values ? > > (I can write the PLIP and implement it... but i know it is a major issue) +10 :) The framework team is accepting PLIPs for 4.3 - please get it in :) Hanno ------------------------------------------------------------------------------ Got Input? Slashdot Needs You. Take our quick survey online. Come on, we don't ask for help often. Plus, you'll get a chance to win $100 to spend on ThinkGeek. http://p.sf.net/sfu/slashdot-survey _______________________________________________ Plone-i18n mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/plone-i18n |
|
On Sun, Jul 31, 2011 at 11:04 AM, Hanno Schlichting <[hidden email]> wrote: On Mon, Jun 6, 2011 at 3:08 PM, thomas desvenain Great ! :) So, I will (after i come back from my holiday place). Thank you ! Thomas Hanno -- Thomas Desvenain Téléphone : 09 51 37 35 18 ------------------------------------------------------------------------------ BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA The must-attend event for mobile developers. Connect with experts. Get tools for creating Super Apps. See the latest technologies. Sessions, hands-on labs, demos & much more. Register early & save! http://p.sf.net/sfu/rim-blackberry-1 _______________________________________________ Plone-i18n mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/plone-i18n |
|
Hi,
My PLIP draft : Title: Plain text search ignores accents. '''Proposer:''' Thomas Desvenain '''Seconder:''' Vincent Fretin == Motivation == Most users want the search to ignore accents. This is a question of comfort : a search on econometrie should found documents with term "économétrie". And that would fix an issue, as most users don't use accents with upper case characters : For example, a search on 'économétrie' doesn't found a document titled as "Econométrie" == Assumptions == We will improve plone lexicon so that it normalizes indexed and searched terms in plain text indexes (ZCTextIndex). A document with 'Econométrie' and 'économétrie' words will be indexed for 'econometrie' term. A search on 'économétrie' or 'econometrie' word will search for 'econometrie' index value. The normalization will be made on the model of document ids generation in Plone. To avoid performance issue and extensions at lowest level than Plone, normalization will be independent of site language. == Proposal & Implementation == We have to add a new Case Normalizer named 'I18n Case Normalizer'. This normalizer will use plone.i18n tools to generate an ascii string from any word to normalize. == Deliverables == * Code - Add a new class in Products.CMFPlone.UnicodeSplitter, register it as 'I18n Case Normalizer'. - plone_lexicon will use this. * Upgrades - Upgrade plone_lexicon with this normalizer. - Reindex ZCTextIndex indexes. * Unit tests - Test search and found documents containing words 'économétrie', 'Économétrie', 'Econométrie' with criterion 'econometrie' and 'économétrie'. - Equivalent Unit tests with eastern language == Risks == The main risks are : - it has to work with all languages, included eastern languages. - check consequences on general performances. - indexes have to be updated for backward compatibility. == Participants == Thomas Desvenain I need volunteers to add tests for eastern languages. == Progress == ------------------------------- NB: what do you think about an 'experimental.plonei18nlexicon' package ? On Tue, Aug 2, 2011 at 10:30 PM, thomas desvenain <[hidden email]> wrote:
-- Thomas Desvenain Téléphone : 09 51 37 35 18 ------------------------------------------------------------------------------ Get a FREE DOWNLOAD! and learn more about uberSVN rich system, user administration capabilities and model configuration. Take the hassle out of deploying and managing Subversion and the tools developers use with it. http://p.sf.net/sfu/wandisco-dev2dev _______________________________________________ Plone-i18n mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/plone-i18n |
|
Hi, Thomas
I'm Japanese Plone developer and I made UnicodeSplitter on Plone4 (http://dev.plone.org/plone/ticket/9309) . I can support this PLIP and I'll add test for Japanese. But I think this product may not have risk for Japanese. Please let me know when it get started. 2011/8/12 Jonathan Lewis <[hidden email]>: > Thomas, > > Thanks for your efforts on this. > > On 2011/08/12, at 4:36, thomas desvenain wrote: > >> >> == Participants == >> Thomas Desvenain >> I need volunteers to add tests for eastern languages. >> > > I'll contact the Japanese Plone developers regarding this. > > Jonathan Lewis > Hitotsubashi University, Tokyo > > > > ------------------------------------------------------------------------------ > FREE DOWNLOAD - uberSVN with Social Coding for Subversion. > Subversion made easy with a complete admin console. Easy > to use, easy to manage, easy to install, easy to extend. > Get a Free download of the new open ALM Subversion platform now. > http://p.sf.net/sfu/wandisco-dev2dev > _______________________________________________ > Plone-developers mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/plone-developers > ------------------------------------------------------------------------------ Get a FREE DOWNLOAD! and learn more about uberSVN rich system, user administration capabilities and model configuration. Take the hassle out of deploying and managing Subversion and the tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2 _______________________________________________ Plone-i18n mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/plone-i18n |
|
Great !
I added the plip : https://dev.plone.org/plone/ticket/12110 I added you in participants Manabu. See you Thomas
On Wed, Aug 17, 2011 at 2:55 PM, Manabu TERADA <[hidden email]> wrote: Hi, Thomas -- Thomas Desvenain Téléphone : 09 51 37 35 18 ------------------------------------------------------------------------------ Get a FREE DOWNLOAD! and learn more about uberSVN rich system, user administration capabilities and model configuration. Take the hassle out of deploying and managing Subversion and the tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2 _______________________________________________ Plone-i18n mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/plone-i18n |
|
Hi,
I have implemented the PLIP, but I have a doubt. As far as i know, my implementation works, is tested (i have tests in english, french and japanese for now), is backward compatible. But I don't like that the values stored in plone_lexicon are not human-readable anymore for languages where translation into ascii is not obvious (eastern languages) That's not a problem if 'œuf' is stored as 'oeuf' in lexicon, or 'économie' as 'economie'... but i don't feel very comfortable with the fact that, for example, "テス" will be stored stored as "30c630b9", even if i have not seen any border effect on plone behaviour related to this, (especially considering that this normalization doesn't have any added value for those languages) I don't know how to check, according to site language, if normalization is relevant or not. Anyway, testing in splitter which is the current language is not reasonable, and API doesn't allow us to pass this as an argument. So i wonder if it wouldn't be better to make use of I18NNormalizer as an option through an optional profile for our plone site ? On Wed, Aug 17, 2011 at 3:55 PM, thomas desvenain <[hidden email]> wrote: Great ! -- Thomas Desvenain Téléphone : 09 51 37 35 18 ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Plone-i18n mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/plone-i18n |
|
Hi,
I'm sorry for late response. >> That's not a problem if 'œuf' is stored as 'oeuf' in lexicon, or 'économie' as 'economie'... >> but i don't feel very comfortable with the fact that, for example, "テス" will be stored stored as "30c630b9", even if i have not seen any border effect on plone behaviour related to this, >> (especially considering that this normalization doesn't have any added value for those languages) I think big problem for eastern languages (China, Korean and Japanese). And, It is maybe problem for Arabic languages. > I believe the bigram splitter activates itself for certain character ranges, so perhaps a solution along those lines would work? > > If not, making this a configurable option like some of the other text index options (lexicon etc.) makes sense. > > David +1 This system should use character ranges for to need languages. -- =========================== Manabu TERADA (@terapyon) [hidden email] =========================== On 2012/01/18, at 7:10, David Glick (GW) wrote: > > On Jan 17, 2012, at 2:06 PM, thomas desvenain wrote: > >> Hi, >> >> I have implemented the PLIP, but I have a doubt. >> >> As far as i know, my implementation works, is tested (i have tests in english, french and japanese for now), is backward compatible. >> But I don't like that the values stored in plone_lexicon are not human-readable anymore for languages where translation into ascii is not obvious (eastern languages) >> >> That's not a problem if 'œuf' is stored as 'oeuf' in lexicon, or 'économie' as 'economie'... >> but i don't feel very comfortable with the fact that, for example, "テス" will be stored stored as "30c630b9", even if i have not seen any border effect on plone behaviour related to this, >> (especially considering that this normalization doesn't have any added value for those languages) >> >> I don't know how to check, according to site language, if normalization is relevant or not. >> Anyway, testing in splitter which is the current language is not reasonable, and API doesn't allow us to pass this as an argument. >> >> So i wonder if it wouldn't be better to make use of I18NNormalizer as an option through an optional profile for our plone site ? >> > > I believe the bigram splitter activates itself for certain character ranges, so perhaps a solution along those lines would work? > > If not, making this a configurable option like some of the other text index options (lexicon etc.) makes sense. > > David > > > ---------- > David Glick > Web Developer > [hidden email] > 206.286.1235x32 > > Groundwire Consulting is here. > > http://groundwire.org/about/FAQ-gw-consulting > > ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Plone-i18n mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/plone-i18n |
| Powered by Nabble | Edit this page |
