How do I remove the last character of an R-T-L string in python?

I am trying to remove the last character of a string in a “right-to-left” language. When I do, however, the last character wraps to the beginning of the string.

I know that this is a fundamental issue with how I’m handling the R-T-L paradigm, but if someone could help me think through it, I’d very much appreciate it.


with open(r"file.txt","r") as f:
    for line in f:
        line = unicode(line,'utf-8')
        the_text = line.split('\t')[1]

Best answer

Some characters in Unicode are always LTR, some are always RTL, and some can be either depending on their surrounding context. In addition, the display context for bidirectional text will have a “predominant” directionality (e.g. a text editor configured for mainly-English text would be predominantly LTR and have a ragged right margin, one configured for mainly-Hebrew would be predominantly RTL with a ragged left margin).

It looks like what has happened here is that when a closing square bracket character appears between two RTL characters it is rendered in its RTL form (your first example) but when it appears between a RTL and a LTR character (or at the end of the string – basically, somewhere where it doesn’t have other characters of the same directionality on both sides) then it is considered to be part of whichever run of text matches the predominant direction. If you try dragging your mouse over the string to select the characters you’ll see that logically the closing ] still follows the ֶם even if visually it appears to have moved.

If the second-to-last character in your string were also a Hebrew character (or other strongly RTL character) rather than a ], or if the display context was predominantly RTL, then it would appear where you expect it to.