Ticket #2650 (closed defect: fixed)
unicode isn't used for filenames
| Reported by: | avorobey | Owned by: | kovidgoyal |
|---|---|---|---|
| Priority: | minor | Milestone: | |
| Component: | Default | Version: | |
| Keywords: | filenames unicode utf-8 | Cc: |
Description
I'm running calibre 0.5.14 on Windows XP. When I import an LRF book in Russian, everything works fine but its author and title, in the storage tree of calibre, are mangled into (mostly) underscores. I see that calibre took the utf-8 author/title and ascii-fied them this way, replacing all non-ascii chars with underscores. The names are all unintelligible and it's impossible to find a file manually if I need to for whatever reason.
I'm storing the books on an NTFS disk that has no problem working with unicode dirnames/filenames, so it'd be great if calibre could just do that. If that's impossible for some reason, maybe a transliteration scheme for a few major alphabets?
Change History
comment:2 Changed 15 months ago by avorobey
Could it be an option in preferences, defaulting to off? If on, just don't underscorize the names and let python do what it can with utf-8 filenames.
I know how unicode filenames can be horrible on different platforms. But consider someone like me who wants to build a library of 100 or 500 non-English books in calibre. The disk hierarchy will be a completely useless mess so the user is locked into calibre and can't get their files out in any meaningful way. That's an uncomfortable feeling.
comment:3 Changed 15 months ago by kovidgoyal
Does the save to disk function in calibre also mangle the names? I don't recall if I escape non ascii characters in that function or not.
comment:5 Changed 15 months ago by yaroslav
I would like to second this request, since I've run into this problem too. In addition to being confusing, especially when I want to remove books from the device, I expect it to cause conflicts for short author-title pairs and large library, since the only difference between different title is number of underscores. Even worse for authors - I expect there are quite a few authors with the same name length. Because of this problem, I only use command line tools from calibre distribution for books conversion and don't use it for library management at all.
comment:6 Changed 15 months ago by alleycat
I am running calibre on English Windows XP with Russian Cyrillic set as default non-ascii encoding. I have a mixture of English, Russian, French, German and Swedish books, more than 1200, and all works fine, i.e. I can see Russian books with Cyillic file names in the file system as well as in Calibre. Everything is consistent, until I take the database to work, where my PC "does not know Cyrillic". When I manipulate Latin books from calibre, all is well, Cyrillic ones look fine, too, until I click on them, which gives this:
[Error 123] The filename, directory name, or volume label syntax is incorrect: u'H:
MY DOCUMENTS
ML
????? ????????'
Detailed traceback:
Traceback (most recent call last):
File "calibre\gui2\library.pyo", line 350, in current_changed File "calibre\gui2\library.pyo", line 318, in get_book_display_info File "calibre\library\database2.pyo", line 490, in abspath File "os.pyo", line 150, in makedirs File "os.pyo", line 157, in makedirs
Windows Error?: [Error 123] The filename, directory name, or volume label syntax is incorrect: u'H:
MY DOCUMENTS
ML
????? ????????'
However, looking at the same record in SQLite (table "books"):
"47","Призрак Александра Вольфа","Призрак Александра Вольфа","2009-03-28 15:14:11","NULL","1","Газданов; Гайто","","Гайто Газданов/Призрак? Александра Вольфа (47)"
and similarly fine in calibre GUI, of course.
Confusing, isn't it?
comment:7 Changed 14 months ago by kovidgoyal
- Status changed from new to closed
- Resolution set to fixed
In calibre 0.6.1 all filesnames will be "intelligently" converted to ASCII which should fix these problems.

Supporting unicode file names across 3 operating systems and a dozen different file systems is impossible I'm afraid.