Project

General

Profile

Actions

Priority task #2809

closed

Improve import tasks from CSV - handle BOM

Added by Alena Peterová almost 3 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
High
Assignee:
Peter Štrunc
Target version:
Start date:
05/12/2021
Due date:
% Done:

100%

Estimated time:
Owner:

Description

When the CSV contains BOM (https://en.wikipedia.org/wiki/Byte_order_mark), which is very usual for files created on Windows, then the file can't be imported and the exception is very misleading, it says that the column is not found.
The BOM is almost nowhere visible - not on Windows, not in vim, not in "less" command. It can be seen during diff.

Please support also files with BOM, so it gets ignored during import.

Workaround: add some first dummy column to the file, all other columns will be handled well. If the task supports dummy columns.


Related issues

Related to csv-connector - Feature #1746: Configurable encoding, support for BOMNew07/10/2019

Actions
Actions #1

Updated by Alena Peterová almost 3 years ago

  • Description updated (diff)
Actions #2

Updated by Alena Peterová almost 3 years ago

  • Related to Feature #1746: Configurable encoding, support for BOM added
Actions #3

Updated by Tomáš Doischer almost 3 years ago

  • Assignee set to Tomáš Doischer
  • Target version set to 3.2.0

This should be quite straightforward because BOMInputStream handles BOM only if present and so there should be no need to handle exceptions if the original reader fails (https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BOMInputStream.html). But this needs to be tested.

Actions #4

Updated by Tomáš Doischer almost 3 years ago

  • Status changed from New to Needs feedback
  • % Done changed from 0 to 80

Support for BOM in CSV was added. I also added a test - there is now a file with BOM (created in Sublime Text). I added the test for one of the tasks which extends AbstractCsvImportTask, not for AbstractCsvImportTask itself because that would be a bit of pain to do.

I didn't touch ImportAutomaticRoleAttributesFromCSVExecutor because it is obsolete. I also didn't change the documentation - this was a bug which was not mentioned before.

Code in branch: https://github.com/bcvsolutions/czechidm-extras/compare/doischer/2809-fix-issues-with-bom-in-csv-imports

@apeterova, can you give me feedback please?

Actions #5

Updated by Peter Štrunc almost 3 years ago

  • Status changed from Needs feedback to Closed
  • Assignee changed from Tomáš Doischer to Peter Štrunc
  • % Done changed from 80 to 100

LGTM, thanks for this fix. Merged it to develop.

Actions

Also available in: Atom PDF