Ellipsization (as it stands) is somewhat broken: it doesn't properly take into account that one character isn't necessarily one byte (or, for that matter, one glyph) or the width of each glyph. Also, replacement of tabs, LFs and CRs need not be done: the label could be put into single-line mode (with the effect that you'll get [000A] and [000D] instead).
Created attachment 710 Improve the ellipsization
Although I agree with you there is one problem now, the title that is generated has the same length (thus size) as the clip, and it gets fully utf8 validated. Both consume time and memory. Maybe it's better to use g_strndup (txt, chars+5), so all the 'visible' characters have all their needed bytes? If there is a decent solution for this, I'll apply the patch. Thanks in advance, Nick
chars+5 is not really any better. This really needs to be done according to the width of the rendered string. chars*6 is closer and is likely to be enough to cover use of combining marks, but is still (potentially) leaving the string short if you have a lot of narrow glyphs. Also, what if an option to control the position of the ellipsis is added? Generating the title once would seem to be more appropriate, despite the memory usage. As things stand, it turns out that clipman_regenerate_titles() can be removed without side-effects.
MAXCHARS*8 should cover it, at least with PANGO_ELLIPSIZE_END. *6 will probably cover it too, but *8 leaves a margin for error. (There's still at least the pathological case - lots of zero-width (non-)joiners - but it's probably safe to ignore that.) It *won't* cover it if PANGO_ELLIPSIZE_MIDDLE is used, at least not for long selections.
I've committed you patch, removed the regenerate_title function and added strndup (length*8), which could save some time/mem on huge amounts of text. I think an ellipses position option is a bit overkill, remember we're still using Xfce :-). Thanks for your time, Nick
length*8 combined with the removal of regenerate_title is a bad choice. To demonstrate: 1. Set the length to 10 characters. 2. Select the whole of the initial description of this BTS entry. 3. Look at the clipman menu. Ellipsization is fine. 4. Set the length to 120 characters. 5. Look at the clipman menu. Whoops, no ellipsization. The more non-ASCII glyphs are present, the sooner this will become obvious. From what I've seen of Chinese (simplified) text, each glyph requires three bytes (if encoded using UTF-8), so 10 characters -> 80 bytes -> 26 glyphs and one truncated character. That's important, because g_locale_to_utf8() will return NULL due to the invalid character, and there's no check for that -> segfault.
Created attachment 712 Properly truncate UTF-8 strings & fix a null dereference.
Committed, thanks