[Geany-Users] Regular expression, for Unicode characters

Tue Aug 2 11:59:02 UTC 2016

<p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?</p>)
I just found this regex for unicode,Perl, somewhere and tried modify it, but it not works. 

I have Geany 1.23.1, I browsed it regex syntax, but there is no any examples.

The text I want parse have multiple spaces inside paragraphs tags. Sometimes upper case text inside paragraphs are mixed with lower case characters or words - those paragraphs need be omitted. So we need match and apply bold class only to paragraphs, containing all upper case text, as in my examples.

I tried both regex but it not works.
<p(>.*?[[p{Lu}]].*?</p>)

(<p>).*?[[p{Lu}]].*?</p>

Vesta

> Sent: Tuesday, August 02, 2016 at 12:03 PM
> From: "James Ginns" <starvagrant at yahoo.com>
> To: "Geany general discussion list" <users at lists.geany.org>
> Subject: Re: [Geany-Users] Regular expression, for Unicode characters
>
> Regular Expressions are a tad difficult to master.
> 
> Basic question: you're using lazy modifiers on purpose right? Just 
> checking.
> 
> So, a dissection The regex engine (don't know what you're using) should 
> hit \W*? and look for as few non word characters as possible (in some 
> instances zero). Then it will look for ONE character in the character 
> class [p{Lu}] (unicode?). Then it will look for zero or more instances 
> of [p{Lu}] or a non-word character. This is until it gets to the closing 
> tag. Since you're only looking for a single capital letter, why not try:
> 
> <p(>.*?[[p{Lu}]].*?</p>)
> 
> Or better yet, since you're only replacing the p tag with p class="bold" 
> why not just capture the initial p tag:
> 
> (<p>).*?[[p{Lu}]].*?</p>
> 
> Hope that gives you some starting ideas.
> 
> On 07/31/2016 08:19 AM, Vesta wrote:
> > Can anyone show how should look regular expression for this particular case?
> >
> > this not works too:
> >
> > <p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?</p>)
> >
> > Regards,
> > Vesta
> >
> >
> >
> >
> >
> >> Sent: Sunday, July 31, 2016 at 3:32 PM
> >> From: "Lex Trotman" <elextr at gmail.com>
> >> To: "Geany general discussion list" <users at lists.geany.org>
> >> Subject: Re: [Geany-Users] Regular expression, for Unicode characters
> >>
> >> Geany uses the Glib regex library whose syntax is described at
> >> https://developer.gnome.org/glib/stable/glib-regex-syntax.html
> >>
> >> Cheers
> >> Lex
> >>
> >> 2016-07-31 22:03 GMT+10:00 Vesta <laguna-mc at mail.com>:
> >>> How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these <p> tag with <p class="bold">
> >>>
> >>>      <p>                                                   </p>
> >>>      <p>                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
> >>>      <p>Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. </p>
> >>>      <p>Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. </p>
> >>>      <p>                                                                             </p>
> >>>      <p>                       CU CONGUE IRIURE SCAEVOLA   --
> >>>         UT DOMING IRACUNDIA. </p>
> >>>      <p>                                  DICO TEMPOR HABEMUS - PART II, 123 </p>
> >>>      <p>Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. </p>
> >>>
> >>> I want appply class to
> >>>
> >>> <p class="bold">                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
> >>> <p class="bold">                      CU CONGUE IRIURE SCAEVOLA   --
> >>>         UT DOMING IRACUNDIA. </p>
> >>> <p class="bold">                                DICO TEMPOR HABEMUS -PART II, 123 </p>
> >>>
> >>> I need Unicode solution for Cyrillic text. This not works:
> >>>
> >>> Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?</p>)
> >>> Replace with: <p class="bold"\1
> >>> _______________________________________________
> >>> Users mailing list
> >>> Users at lists.geany.org
> >>> https://lists.geany.org/cgi-bin/mailman/listinfo/users
> >> _______________________________________________
> >> Users mailing list
> >> Users at lists.geany.org
> >> https://lists.geany.org/cgi-bin/mailman/listinfo/users
> >>
> > _______________________________________________
> > Users mailing list
> > Users at lists.geany.org
> > https://lists.geany.org/cgi-bin/mailman/listinfo/users
> 
> _______________________________________________
> Users mailing list
> Users at lists.geany.org
> https://lists.geany.org/cgi-bin/mailman/listinfo/users
>