Looking for a regular expression to catch everything between < and > Thread poster: Hans Lenting
|
Hi, I've imported an SDL TMX and I get lots of tags in the language pairs: etc. Can somebody please help me to define (non greedy) regular expression to replace all tags between < and > with nothing? I've come this far: | | | Michael Beijer United Kingdom Local time: 13:58 Member (2009) Dutch to English + ... | Csaba Ban Hungary Local time: 14:58 Member (2002) English to Hungarian + ... MemoQ 5 does this beautifully | Dec 9, 2011 |
MemoQ 5 offers a great and easy solution for turning such regular expressions into internal tags. They offer a 45-day free trial. BTW, now the software package is priced at a 40% discount. good luck, Csaba | | |
The regular expression to find tags is : <&> I think it is a little bit dangerous to do it like this (replace all tags with nothing based on this regular expression) it may have some side effects if you've real < or > characters meaning really "lower or greater than" in your text... Best regards, Jean-Marc
[Edited at 2011-12-09 11:31 GMT] | |
|
|
Adam Podstawczynski (X) Local time: 14:58 Polish to English + ... Your tags don't show :) | Dec 9, 2011 |
Show the examples first, because they are invisible in your posting. However, off the top of my head such an expression would go as follows: s/\<.?\>//
This is a Perl-like, non-greedy expression which you needed. Need a Word-like one? It will look a bit different, please let me know.
[Edited at 2011-12-09 11:16 GMT] | | | Adam Podstawczynski (X) Local time: 14:58 Polish to English + ... On second thoughts | Dec 9, 2011 |
This should read s/<.+?>// I'm writing from memory, can't check now. | | |
If Transit does have regex support, I find the idea of switching to MemoQ for this reason... peculiar. Anyway, this should work: <[^>]*> This is essentially the same as the non-greedy expression with the ? above, only I find it a bit more transparent, easier to adapt and more certain to work in more regex engines. [] stands for character group, [^] stands for 'all characters except', and * stands for 'any number of'. So the expression translates to:... See more If Transit does have regex support, I find the idea of switching to MemoQ for this reason... peculiar. Anyway, this should work: <[^>]*> This is essentially the same as the non-greedy expression with the ? above, only I find it a bit more transparent, easier to adapt and more certain to work in more regex engines. [] stands for character group, [^] stands for 'all characters except', and * stands for 'any number of'. So the expression translates to: <, then any number of characters that aren't >, then >. If Transit can replace multiple regex matches in the same TU, then it should be enough to run this once. [^] is of course much better for this sort of thing than trying to match every conceivable character positively. For instance, not even characters like éáőúóü are covered by [a-z]. They are covered by \w if the regex engine has \w (which I believe should capture all letters, all numbers and _) but even then, there are a myriad special characters you are never going to remember.
[Edited at 2011-12-09 13:00 GMT] ▲ Collapse | | | Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER Want to try it in Transit first | Dec 15, 2011 |
Hi Michael and thanks for the suggestion. I want to try it in Transit first. Transit NXT your one stop solution. | |
|
|
Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER Don't compare a Ferrari with a Trabant | Dec 15, 2011 |
Csaba Ban wrote: MemoQ 5 offers a great and easy solution for turning such regular expressions into internal tags. They offer a 45-day free trial. BTW, now the software package is priced at a 40% discount. good luck, Csaba Thanks Csaba, you're surely not comparing Transit NXT with MemoQ? They are nice guys at Kilgray but they'll have a long way to go to offer all the beauties that Transit NXT offers. | | | Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER Greater than/lesser than don't show | Dec 15, 2011 |
Adam Podstawczynski wrote: This should read s/// I'm writing from memory, can't check now. I had already written to Proz Support that they should fix display of Greater than/lesser than ASAP. I had forgotten that, now the tags I've posted don't show up. Thanks for the suggestion, I'll try it. Hmm, invalid syntax, I get. Find: «machine123#4!!» With: «.+» doesn't work either ... Ah, probably this is what I need (like suggested in another reply here): «&» (Where « is greater than and » is smaller than) Hans
[Edited at 2011-12-15 13:08 GMT] | | | Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER Yep that is the one | Dec 15, 2011 |
Warlock wrote: The regular expression to find tags is : I think it is a little bit dangerous to do it like this (replace all tags with nothing based on this regular expression) it may have some side effects if you've real < or > characters meaning really "lower or greater than" in your text... Best regards, Jean-Marc
[Edited at 2011-12-09 11:31 GMT] Thanks for this one. Please tell me, how do you insert « and » to show up in your message? Hans | | |
Hans Lenting wrote: Adam Podstawczynski wrote: This should read s/// I'm writing from memory, can't check now. I had already written to Proz Support that they should fix display of Greater than/lesser than ASAP. I had forgotten that, now the tags I've posted don't show up. Thanks for the suggestion, I'll try it. Hmm, invalid syntax, I get. Find: «machine123#4!!» With: «.+» doesn't work either ... Ah, probably this is what I need (like suggested in another reply here): «&» (Where « is greater than and » is smaller than) There's no need to write to technical support as there is nothing for them to fix. The forum software parses tags in < ... > as HTML. If you want them to show up, use < and > as I did in my post above. As to your problem, Transit's regex engine probably doesn't know non-greedy ?. Try the solution I suggested above. | |
|
|
Don't think so | Dec 15, 2011 |
Warlock wrote: I think it is a little bit dangerous to do it like this (replace all tags with nothing based on this regular expression) it may have some side effects if you've real < or > characters meaning really "lower or greater than" in your text... The source text seems to be some sort of tagged format, which means that "real" < and > characters will be encoded as character entities (< and >) and won't get caught in the crossfire. | | | Displaying < and > on the forum | Dec 15, 2011 |
Hans Lenting wrote: Warlock wrote: The regular expression to find tags is : <&> I think it is a little bit dangerous to do it like this (replace all tags with nothing based on this regular expression) it may have some side effects if you've real < or > characters meaning really "lower or greater than" in your text... Best regards, Jean-Marc
[Edited at 2011-12-09 11:31 GMT] Thanks for this one. Please tell me, how do you insert « and » to show up in your message? Hans Hans, To display lower and greater signs you have to wrote them as HTML entities. Regards, Jean-Marc | | | msoutopico Ireland Local time: 13:58 English to Galician + ...
For the regexp in Transit NXT, I would use <#([!>]+)0> and replace with <#0> However, I don't see why you would need to do that in Transit. Cheers, Manuel
[Edited at 2012-03-28 11:00 GMT]
[Edited at 2012-03-28 11:01 GMT] | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Looking for a regular expression to catch everything between < and > Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
| Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |