Ticket #4721 (closed defect: fixed)

Opened 7 months ago

Last modified 7 months ago

Bulk import stops because of utf-8 errors

Reported by: EgnaledKnarf Owned by: kovidgoyal
Priority: major Milestone:
Component: Default Version: trunk
Keywords: bulk import Cc:

Description

I can not get Calibre to import my book directories in bulk because it cancels importing as soon as it hits a filename it can not interpret as utf-8.

Correct behaviour would be to either:

  • skip the file and put its name in a list for the user to peruse
  • ask the user what to do with this file (use different encoding, new filename, skip file, etc)
  • replace the uninterpretable characters with something innocuous

The real error here is that Calibre insists on interpreting all filenames as utf-8, even when they clearly are encoded in Latin or some other encoding. The real solution therefore would be to add some intelligence to the name mangling code to check for possible file name encodings and default to ASCII with placeholders if the encoding can not be determined.

An example of this type error looks like this:

ERROR: Path error (on behemoth)

The specified directory could not be processed

details:

('utf8', '/path/Name_of_Author/S\xf8k_etter_b\xf8ker.txt', 37, 42, 'unsupported Unicode code range')

That looks like Latin-1 or WINDOWS-1252 to me. It is clearly not utf-8 (which would look like 'S\xc3\x6bker_etter_b\xc3\x6bker.txt').

Instead of dropping out immediately it could try to:

  • replace the unknown characters with something else 'S#ker_etter_b#ker.txt'
  • skip them alltogether 'sker_etter_bker.txt'
  • ask the user
  • etc...

Change History

comment:1 Changed 7 months ago by kovidgoyal

While I can probably fix this, you should be aware that having files in different character encodings on the same filesystem is a bug, indicating either a misconfigured system or misbehaving applications.

comment:2 Changed 7 months ago by kovidgoyal

  • Status changed from new to closed
  • Resolution set to fixed

Fixed in branch trunk. The fix will be in the next release.

Note: See TracTickets for help on using tickets.