Created attachment 5105 smarter file extensions Right now, Thunar does really simple (and often incorrect) file splitting by grabbing the last "." in the filename and treating everything after it as the extension. This ignores more complicated extensions like .tar.gz (or lots of things followed by .gz, .bz2, .z, etc). This problem is impossible to solve without being able to know the user's intentions for a filename. Here's a patch that creates a new method for hopefully smarter file extension splitting (that will be incorrect a little less often than the current simple implementation) in thunar-util and then uses that method for simple rename and bulk rename.
Created attachment 5106 Function that acts like strrchr I don't really like the idea of the regex (regexes are slow), its also quite convenient if the function behaves like strrchr does (return a pointer in the string). Only patched the rename dialog to show the functioning. Probably also wise to check it with non-utf8 names bit it think its safe although it does check raw strings.
Created attachment 5107 Nick's patch with a few tweaks I can see the point in making this faster, especially when it comes to bulk renaming files. I may try some tests to see how much faster it is. It looks good. I tweaked a couple of things (test for extension being only ".", added some comments, applied it to bulk rename). I've done a bit of testing with unicode characters. I'm going to come up with a good variety of filenames some time this weekend to test this.
Pushed patch in 5e25c20 with some additional comments and remarks. Also fixed the selection in the properties dialog and new-file dialog.
Created attachment 5108 Ignore dotfiles
I reset to master and built again and found a couple of problems. For dot files with no extension (eg ".filename"), the entire filename gets treated as the extension. The simple renamer (where I did a lot of the testing), still selects the entire thing. It looks like it does this because it doesn't change the selection if the offset is 0. Patch for this is "Ignore dotfiles" (attachment 5108 ). Another problem is is that wide unicode characters in the secondary extension when testing compression extensions will not split properly. For example, "filename.שּשּ.gz" will split to "filename.שּשּ" and ".gz" ('שּ' is three bytes wide). If there's any expectation that wide unicode characters will be part of a file extension, this could be fixed by using g_utf8_pointer_to_offset() to calculate the extension length. This would be slower though, so it is probably fine as-is. Here are some filenames I used to test: Should match entire name: .filename .filename. filename. Should match 3 extensions: .filename.templatefile.in.in filename.something.in.in Should match 2 extensions: .filename.tar.gz .filename.שּ.gz .filename.שּשּ.gz filename.asdfg.gz .filename.templatefile.in Should match 1 extension: .filename.gz filename.asdfgh.gz filename.asd filename.asdf filename.asdfg filename.asdfghij .tar.gz
I've pushed the patch to not match hidden names. The fact that hidden chars are not matched is good, since all extensions are always unicode. So I think we're good here now.