Regular expression, for Unicode characters

List overview All Threads

newer

older

setting compile-build command for...

Geany and CommonLisp

Vesta

31 Jul 2016 31 Jul '16

2:03 p.m.

How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these tag with

USU EA EUISMOD HONESTATIS DETERRUISSET. Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. CU CONGUE IRIURE SCAEVOLA -- UT DOMING IRACUNDIA. DICO TEMPOR HABEMUS - PART II, 123 Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER.

I want appply class to

USU EA EUISMOD HONESTATIS DETERRUISSET. CU CONGUE IRIURE SCAEVOLA -- UT DOMING IRACUNDIA. DICO TEMPOR HABEMUS -PART II, 123

I need Unicode solution for Cyrillic text. This not works:

Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?) Replace with: <p class="bold"\1

Show replies by date

Lex Trotman

31 Jul 31 Jul

2:32 p.m.

Geany uses the Glib regex library whose syntax is described at https://developer.gnome.org/glib/stable/glib-regex-syntax.html

Cheers Lex

2016-07-31 22:03 GMT+10:00 Vesta laguna-mc@mail.com:

...

How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these tag with 
 
 USU EA EUISMOD HONESTATIS DETERRUISSET.
Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. 
Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. 
 
 CU CONGUE IRIURE SCAEVOLA --
 UT DOMING IRACUNDIA. 
 DICO TEMPOR HABEMUS - PART II, 123 
Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. 
I want appply class to

 USU EA EUISMOD HONESTATIS DETERRUISSET. CU CONGUE IRIURE SCAEVOLA -- UT DOMING IRACUNDIA. DICO TEMPOR HABEMUS -PART II, 123 

I need Unicode solution for Cyrillic text. This not works:

Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?) Replace with: <p class="bold"\1 _______________________________________________ Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users

Vesta

3:19 p.m.

Can anyone show how should look regular expression for this particular case?

this not works too:

<p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?)

Regards, Vesta

...

Sent: Sunday, July 31, 2016 at 3:32 PM From: "Lex Trotman" elextr@gmail.com To: "Geany general discussion list" users@lists.geany.org Subject: Re: [Geany-Users] Regular expression, for Unicode characters

Geany uses the Glib regex library whose syntax is described at https://developer.gnome.org/glib/stable/glib-regex-syntax.html

Cheers Lex

2016-07-31 22:03 GMT+10:00 Vesta laguna-mc@mail.com:

...
How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these tag with 
 
 USU EA EUISMOD HONESTATIS DETERRUISSET.
Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. 
Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. 
 
 CU CONGUE IRIURE SCAEVOLA --
 UT DOMING IRACUNDIA. 
 DICO TEMPOR HABEMUS - PART II, 123 
Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. 
I want appply class to

 USU EA EUISMOD HONESTATIS DETERRUISSET. CU CONGUE IRIURE SCAEVOLA -- UT DOMING IRACUNDIA. DICO TEMPOR HABEMUS -PART II, 123 

I need Unicode solution for Cyrillic text. This not works:

Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?) Replace with: <p class="bold"\1 _______________________________________________ Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users

James Ginns

2 Aug 2 Aug

11:03 a.m.

Regular Expressions are a tad difficult to master.

Basic question: you're using lazy modifiers on purpose right? Just checking.

So, a dissection The regex engine (don't know what you're using) should hit \W*? and look for as few non word characters as possible (in some instances zero). Then it will look for ONE character in the character class [p{Lu}] (unicode?). Then it will look for zero or more instances of [p{Lu}] or a non-word character. This is until it gets to the closing tag. Since you're only looking for a single capital letter, why not try:

<p(>.*?[[p{Lu}]].*?)

Or better yet, since you're only replacing the p tag with p class="bold" why not just capture the initial p tag:

().*?[[p{Lu}]].*?

Hope that gives you some starting ideas.

On 07/31/2016 08:19 AM, Vesta wrote:

...

Can anyone show how should look regular expression for this particular case?

this not works too:

<p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?)

Regards, Vesta

...
Sent: Sunday, July 31, 2016 at 3:32 PM From: "Lex Trotman" elextr@gmail.com To: "Geany general discussion list" users@lists.geany.org Subject: Re: [Geany-Users] Regular expression, for Unicode characters

Geany uses the Glib regex library whose syntax is described at https://developer.gnome.org/glib/stable/glib-regex-syntax.html

Cheers Lex

2016-07-31 22:03 GMT+10:00 Vesta laguna-mc@mail.com:

...
How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these tag with 
 
 USU EA EUISMOD HONESTATIS DETERRUISSET.
 Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. 
 Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. 
 
 CU CONGUE IRIURE SCAEVOLA --
 UT DOMING IRACUNDIA. 
 DICO TEMPOR HABEMUS - PART II, 123 
 Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. 
I want appply class to

 USU EA EUISMOD HONESTATIS DETERRUISSET. CU CONGUE IRIURE SCAEVOLA -- UT DOMING IRACUNDIA. DICO TEMPOR HABEMUS -PART II, 123 

I need Unicode solution for Cyrillic text. This not works:

Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?) Replace with: <p class="bold"\1 _______________________________________________ Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users

Vesta

1:59 p.m.

<p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?) I just found this regex for unicode,Perl, somewhere and tried modify it, but it not works.

I have Geany 1.23.1, I browsed it regex syntax, but there is no any examples.

The text I want parse have multiple spaces inside paragraphs tags. Sometimes upper case text inside paragraphs are mixed with lower case characters or words - those paragraphs need be omitted. So we need match and apply bold class only to paragraphs, containing all upper case text, as in my examples.

I tried both regex but it not works. <p(>.*?[[p{Lu}]].*?)

().*?[[p{Lu}]].*?

Vesta

...

Sent: Tuesday, August 02, 2016 at 12:03 PM From: "James Ginns" starvagrant@yahoo.com To: "Geany general discussion list" users@lists.geany.org Subject: Re: [Geany-Users] Regular expression, for Unicode characters

Regular Expressions are a tad difficult to master.

Basic question: you're using lazy modifiers on purpose right? Just checking.

So, a dissection The regex engine (don't know what you're using) should hit \W*? and look for as few non word characters as possible (in some instances zero). Then it will look for ONE character in the character class [p{Lu}] (unicode?). Then it will look for zero or more instances of [p{Lu}] or a non-word character. This is until it gets to the closing tag. Since you're only looking for a single capital letter, why not try:

<p(>.*?[[p{Lu}]].*?)

Or better yet, since you're only replacing the p tag with p class="bold" why not just capture the initial p tag:

().*?[[p{Lu}]].*?

Hope that gives you some starting ideas.

On 07/31/2016 08:19 AM, Vesta wrote:

...
Can anyone show how should look regular expression for this particular case?

this not works too:

<p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?)

Regards, Vesta

...
Sent: Sunday, July 31, 2016 at 3:32 PM From: "Lex Trotman" elextr@gmail.com To: "Geany general discussion list" users@lists.geany.org Subject: Re: [Geany-Users] Regular expression, for Unicode characters

Geany uses the Glib regex library whose syntax is described at https://developer.gnome.org/glib/stable/glib-regex-syntax.html

Cheers Lex

2016-07-31 22:03 GMT+10:00 Vesta laguna-mc@mail.com:

...
How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these tag with 
 
 USU EA EUISMOD HONESTATIS DETERRUISSET.
 Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. 
 Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. 
 
 CU CONGUE IRIURE SCAEVOLA --
 UT DOMING IRACUNDIA. 
 DICO TEMPOR HABEMUS - PART II, 123 
 Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. 
I want appply class to

 USU EA EUISMOD HONESTATIS DETERRUISSET. CU CONGUE IRIURE SCAEVOLA -- UT DOMING IRACUNDIA. DICO TEMPOR HABEMUS -PART II, 123 

I need Unicode solution for Cyrillic text. This not works:

Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?) Replace with: <p class="bold"\1 _______________________________________________ Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users

James Ginns

3:58 p.m.

Hmm. Could you be more specific then? When you say it doesn't work, what kinds of lines is it missing and what kinds of lines is it catching? You could use a tool like regexpal to see what is and isn't matching. From the lack of descriptiveness in your message you might have just forgotten a semicolon for all anyone knows.

On 08/02/2016 06:59 AM, Vesta wrote:

...

<p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?) I just found this regex for unicode,Perl, somewhere and tried modify it, but it not works.

I have Geany 1.23.1, I browsed it regex syntax, but there is no any examples.

The text I want parse have multiple spaces inside paragraphs tags. Sometimes upper case text inside paragraphs are mixed with lower case characters or words - those paragraphs need be omitted. So we need match and apply bold class only to paragraphs, containing all upper case text, as in my examples.

I tried both regex but it not works. <p(>.*?[[p{Lu}]].*?)

().*?[[p{Lu}]].*?

Vesta

...
Sent: Tuesday, August 02, 2016 at 12:03 PM From: "James Ginns" starvagrant@yahoo.com To: "Geany general discussion list" users@lists.geany.org Subject: Re: [Geany-Users] Regular expression, for Unicode characters

Regular Expressions are a tad difficult to master.

Basic question: you're using lazy modifiers on purpose right? Just checking.

So, a dissection The regex engine (don't know what you're using) should hit \W*? and look for as few non word characters as possible (in some instances zero). Then it will look for ONE character in the character class [p{Lu}] (unicode?). Then it will look for zero or more instances of [p{Lu}] or a non-word character. This is until it gets to the closing tag. Since you're only looking for a single capital letter, why not try:

<p(>.*?[[p{Lu}]].*?)

Or better yet, since you're only replacing the p tag with p class="bold" why not just capture the initial p tag:

().*?[[p{Lu}]].*?

Hope that gives you some starting ideas.

On 07/31/2016 08:19 AM, Vesta wrote:

...
Can anyone show how should look regular expression for this particular case?

this not works too:

<p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?)

Regards, Vesta

...
Sent: Sunday, July 31, 2016 at 3:32 PM From: "Lex Trotman" elextr@gmail.com To: "Geany general discussion list" users@lists.geany.org Subject: Re: [Geany-Users] Regular expression, for Unicode characters

Geany uses the Glib regex library whose syntax is described at https://developer.gnome.org/glib/stable/glib-regex-syntax.html

Cheers Lex

2016-07-31 22:03 GMT+10:00 Vesta laguna-mc@mail.com:

...
How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these tag with 
 
 USU EA EUISMOD HONESTATIS DETERRUISSET.
 Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. 
 Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. 
 
 CU CONGUE IRIURE SCAEVOLA --
 UT DOMING IRACUNDIA. 
 DICO TEMPOR HABEMUS - PART II, 123 
 Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. 
I want appply class to

 USU EA EUISMOD HONESTATIS DETERRUISSET. CU CONGUE IRIURE SCAEVOLA -- UT DOMING IRACUNDIA. DICO TEMPOR HABEMUS -PART II, 123 

I need Unicode solution for Cyrillic text. This not works:

Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?) Replace with: <p class="bold"\1 _______________________________________________ Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users

Vesta

10:27 p.m.

I don't know why it not work. Both regex just don't match anything. Below is screen shots.

https://s31.postimg.org/myq22vtln/Screenshot_from_2016_08_02_23_17_55.png

https://s32.postimg.org/ktjn7ywp1/Screenshot_from_2016_08_02_23_19_15.png

...

Sent: Tuesday, August 02, 2016 at 4:58 PM From: "James Ginns" starvagrant@yahoo.com To: "Geany general discussion list" users@lists.geany.org Subject: Re: [Geany-Users] Regular expression, for Unicode characters

Hmm. Could you be more specific then? When you say it doesn't work, what kinds of lines is it missing and what kinds of lines is it catching? You could use a tool like regexpal to see what is and isn't matching. From the lack of descriptiveness in your message you might have just forgotten a semicolon for all anyone knows.

On 08/02/2016 06:59 AM, Vesta wrote:

...
<p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?) I just found this regex for unicode,Perl, somewhere and tried modify it, but it not works.

I have Geany 1.23.1, I browsed it regex syntax, but there is no any examples.

The text I want parse have multiple spaces inside paragraphs tags. Sometimes upper case text inside paragraphs are mixed with lower case characters or words - those paragraphs need be omitted. So we need match and apply bold class only to paragraphs, containing all upper case text, as in my examples.

I tried both regex but it not works. <p(>.*?[[p{Lu}]].*?)

().*?[[p{Lu}]].*?

Vesta

...
Sent: Tuesday, August 02, 2016 at 12:03 PM From: "James Ginns" starvagrant@yahoo.com To: "Geany general discussion list" users@lists.geany.org Subject: Re: [Geany-Users] Regular expression, for Unicode characters

Regular Expressions are a tad difficult to master.

Basic question: you're using lazy modifiers on purpose right? Just checking.

So, a dissection The regex engine (don't know what you're using) should hit \W*? and look for as few non word characters as possible (in some instances zero). Then it will look for ONE character in the character class [p{Lu}] (unicode?). Then it will look for zero or more instances of [p{Lu}] or a non-word character. This is until it gets to the closing tag. Since you're only looking for a single capital letter, why not try:

<p(>.*?[[p{Lu}]].*?)

Or better yet, since you're only replacing the p tag with p class="bold" why not just capture the initial p tag:

().*?[[p{Lu}]].*?

Hope that gives you some starting ideas.

On 07/31/2016 08:19 AM, Vesta wrote:

...
Can anyone show how should look regular expression for this particular case?

this not works too:

<p(>\W*?[[p{Lu}]][[p{Lu}]\W]*?)

Regards, Vesta

...
Sent: Sunday, July 31, 2016 at 3:32 PM From: "Lex Trotman" elextr@gmail.com To: "Geany general discussion list" users@lists.geany.org Subject: Re: [Geany-Users] Regular expression, for Unicode characters

Geany uses the Glib regex library whose syntax is described at https://developer.gnome.org/glib/stable/glib-regex-syntax.html

Cheers Lex

2016-07-31 22:03 GMT+10:00 Vesta laguna-mc@mail.com:

...
How to create regular expression tp match all UPPER CASE text within paragraps tag, and replace these tag with 
 
 USU EA EUISMOD HONESTATIS DETERRUISSET.
 Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. 
 Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. 
 
 CU CONGUE IRIURE SCAEVOLA --
 UT DOMING IRACUNDIA. 
 DICO TEMPOR HABEMUS - PART II, 123 
 Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. 
I want appply class to

 USU EA EUISMOD HONESTATIS DETERRUISSET. CU CONGUE IRIURE SCAEVOLA -- UT DOMING IRACUNDIA. DICO TEMPOR HABEMUS -PART II, 123 

I need Unicode solution for Cyrillic text. This not works:

Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?) Replace with: <p class="bold"\1 _______________________________________________ Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users
Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users

Vesta

5 Aug 5 Aug

2:10 p.m.

New subject: Remove Extra Whitespace from text

Text have multiple whitespaces between words within and <h2><h2> tags.

How to find multiple whitespaces and replace them with a single whitespace?

Regards, Alex

Colomban Wendling

2:12 p.m.

New subject: Remove Extra Whitespace from text

Le 05/08/2016 à 14:10, Vesta a écrit :

...

Text have multiple whitespaces between words within and <h2><h2> tags.

How to find multiple whitespaces and replace them with a single whitespace?

learn regexes? :) For basic stuff like that it isn't so complex, and very powerful. Though, here you could also do it just replacing two spaces with one until there's no more to replace.

Regards, Colomban

PS: [[:space:]]+

Vesta

10:31 p.m.

New subject: Remove Extra Whitespace from text

Thanks you for support. Regex is a quite tricky, however if there is no other way, regex is only solution.

[[:space:]]+

There is one small issue with this: it also removes space between and when paragraphs begins from new line, i.e. first line text second line text

so paragraphs merge in one line: first line text second line text

The same for headers and paragraphs:

<h1> text </h2> text

becomes <h1> text </h2> text

How to avoid this?

Best Regards, Alex

...

Sent: Friday, August 05, 2016 at 3:12 PM From: "Colomban Wendling" lists.ban@herbesfolles.org To: "Geany general discussion list" users@lists.geany.org Subject: Re: [Geany-Users] Remove Extra Whitespace from text

Le 05/08/2016 à 14:10, Vesta a écrit :

...
Text have multiple whitespaces between words within and <h2><h2> tags.

How to find multiple whitespaces and replace them with a single whitespace?

learn regexes? :) For basic stuff like that it isn't so complex, and very powerful. Though, here you could also do it just replacing two spaces with one until there's no more to replace.

Regards, Colomban

PS: [[:space:]]+ _______________________________________________ Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users

Colomban Wendling

10:45 p.m.

New subject: Remove Extra Whitespace from text

Le 05/08/2016 à 22:31, Vesta a écrit :

...

[…]

[[:space:]]+

There is one small issue with this: it also removes space between and when paragraphs begins from new line, i.e. […]

How to avoid this?

don't match newlines. " +" (without the quotes) is likely enough.

Colomban Wendling

2 Aug 2 Aug

2:17 p.m.

Le 31/07/2016 à 15:19, Vesta a écrit :

...

Can anyone show how should look regular expression for this particular case?

This will work:

(<p)(>[^[:lower:]]*[[:upper:]][^[:lower:]]*)

It matches any *but* lowercase, then one upper character, then anything *but* lower characters. Using "not lowercase" is useful to allow punctuation and digits.

if you're interested in supporting uppercase tags, you'll need to make quantifiers ungreedy too:

(<[pP])(>[^[:lower:]]*?[[:upper:]][^[:lower:]]*?</[pP]>)

Cheers, Colomban

Vesta

10:29 p.m.

Regex works fine -- Thank you.

B.Regards, Alex

...

Sent: Tuesday, August 02, 2016 at 3:17 PM From: "Colomban Wendling" lists.ban@herbesfolles.org To: "Geany general discussion list" users@lists.geany.org Subject: Re: [Geany-Users] Regular expression, for Unicode characters

Le 31/07/2016 à 15:19, Vesta a écrit :

...
Can anyone show how should look regular expression for this particular case?

This will work:

(<p)(>[^[:lower:]]*[[:upper:]][^[:lower:]]*)

It matches any *but* lowercase, then one upper character, then anything *but* lower characters. Using "not lowercase" is useful to allow punctuation and digits.

if you're interested in supporting uppercase tags, you'll need to make quantifiers ungreedy too:

(<[pP])(>[^[:lower:]]*?[[:upper:]][^[:lower:]]*?</[pP]>)

Cheers, Colomban _______________________________________________ Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users

Vesta

3 Aug 3 Aug

12:49 a.m.

One note: how to replace with in all matched lines?

...

Sent: Tuesday, August 02, 2016 at 3:17 PM From: "Colomban Wendling" lists.ban@herbesfolles.org To: "Geany general discussion list" users@lists.geany.org Subject: Re: [Geany-Users] Regular expression, for Unicode characters

Le 31/07/2016 à 15:19, Vesta a écrit :

...
Can anyone show how should look regular expression for this particular case?

This will work:

(<p)(>[^[:lower:]]*[[:upper:]][^[:lower:]]*)

It matches any *but* lowercase, then one upper character, then anything *but* lower characters. Using "not lowercase" is useful to allow punctuation and digits.

if you're interested in supporting uppercase tags, you'll need to make quantifiers ungreedy too:

(<[pP])(>[^[:lower:]]*?[[:upper:]][^[:lower:]]*?</[pP]>)

Cheers, Colomban _______________________________________________ Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users

Colomban Wendling

1:16 a.m.

Le 03/08/2016 à 00:49, Vesta a écrit :

...

One note: how to replace with in all matched lines?

\1 class="bold"\2

or alter the RE to whatever capture you like best

Cheers, Colomban

Vesta

2:13 a.m.

\1 class="bold"\2

How to alter this to apply <h2> </h2> tags in place of tags?

Best Regards, Vesta

...

Sent: Wednesday, August 03, 2016 at 2:16 AM From: "Colomban Wendling" lists.ban@herbesfolles.org To: "Geany general discussion list" users@lists.geany.org Subject: Re: [Geany-Users] Regular expression, for Unicode characters

Le 03/08/2016 à 00:49, Vesta a écrit :

...
One note: how to replace with in all matched lines?

\1 class="bold"\2

or alter the RE to whatever capture you like best

Cheers, Colomban _______________________________________________ Users mailing list Users@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/users

Colomban Wendling

2:20 a.m.

Le 03/08/2016 à 02:13, Vesta a écrit :

...

\1 class="bold"\2

How to alter this to apply <h2> </h2> tags in place of tags?

You should try and understand the regex instead of using it as a mere magic solution.

…

But here you go:

()([^[:lower:]]*[[:upper:]][^[:lower:]]*)()

3144

Age (days ago)

3149

Last active (days ago)

users@lists.geany.org

16 comments

4 participants

tags (0)

participants (4)

Colomban Wendling
James Ginns
Lex Trotman
Vesta