Project

General

Profile

Actions

Task #900

closed

Very slow synchronization of organizations structure

Added by Alena Peterová over 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Ondřej Kopr
Category:
Synchronization
Target version:
Start date:
01/09/2018
Due date:
% Done:

100%

Estimated time:
Owner:

Description

Tested on 7.5.3, the organization structure has ~16 000 elements, only 1 root (defined by null parent)

The synchronization of organizations structure is very slow - more than 1 minute per organization.
When the "parent" is not defined so all organizations are roots, it takes less than 1 second per organization.

Please could you look at it?


Files

vazby.csv.zip (101 KB) vazby.csv.zip Alena Peterová, 01/09/2018 03:57 PM
Actions #1

Updated by Alena Peterová over 6 years ago

Sorry, only in redmine I can see my mistake - the Boolean values are interchanged. The root has null parents, non-root has not null parents.
Strangely, the computing of parents works well :-)

Actions #2

Updated by Radek Tomiška over 6 years ago

  • Category changed from Tree structures to Synchronization
  • Assignee changed from Radek Tomiška to Vít Švanda
Actions #3

Updated by Alena Peterová over 6 years ago

  • Subject changed from Very slow synchronization of organizations without groovy script to Very slow synchronization of organizations structure
  • Description updated (diff)

OK, so the difference was really caused by my mistake (calling each element root). When I corrected the script, the synchronization is as slow as without the script.
So the aim of this ticket should be checking why is the default sync of 16k organisations' structure so slow.

Actions #4

Updated by Vít Švanda over 6 years ago

Do you have more complex structure or all 16000 items are directly under one root?
Can you attach the source export with organizations?

Actions #5

Updated by Vít Švanda over 6 years ago

  • Target version set to Garnet (7.7.0)
Actions #6

Updated by Alena Peterová over 6 years ago

The structure is complex, several levels of organizations.
The compressed CSV is attached. I connected it by CSV connector, the attribute CISLOSSMSPM is mapped to the code of the organization, the attribute MANAGER_CISLOSSMSPM to the parent code. Other attributes are not important.

The names of the organizations were taken from different source, so the organizations already exist in IdM. I only need to synchronize the structure from this CSV. I linked the accounts first without setting the parent (because Update entity is not an option yet - #878) and then run the synchronization for LINKED -> Update Entity with computing the parents.

Actions #7

Updated by Vít Švanda over 6 years ago

  • Status changed from New to In Progress
Actions #8

Updated by Vít Švanda over 6 years ago

  • I simulated the problem and sync of tree for 16000 accounts is really slow.
  • Problem is in the transformation of the attribute value. This transformation is call for every account. In tree sync is this searche evaluated for every account again (16000 * 16000 calls).
  • I implemented cache for method "AbstractSynchronizationExecutor.getValueByMappedAttribute(AttributeMapping attribute, List<IcAttribute> icAttributes)".
    • This cache not supports evic by key.
    • This cache is cleared on start and end every sync.
    • Correct function assumes, that the every transformation on the attributes will return static result. It means transformation on the attribute can not generate "random" values without dependency on the input values. This predicate have to be more consulted.

Commit in the develop: https://github.com/bcvsolutions/CzechIdMng/commit/b8d3f4b5bba80dd473b1f4a2c3df90e6e92aad06

Actions #9

Updated by Alena Peterová over 6 years ago

Vít Švanda wrote:

Correct function assumes, that the every transformation on the attributes will return static result. It means transformation on the attribute can not generate "random" values without dependency on the input values.* This predicate have to be more consulted.

Thanks.
This solution makes sense to me. I think that the only use case for "random" values could be generating some "ids" for later use in provisioning or so. But in such case, the value of the attribute for one organisation should be the same during the whole synchronization, so the cache is correct in fact.

Actions #10

Updated by Vít Švanda about 6 years ago

  • Status changed from In Progress to Needs feedback
  • Assignee changed from Vít Švanda to Ondřej Kopr
  • % Done changed from 0 to 80
I added new checkbox "Cached value" on the mapping attribute detail.

Ondra, could you please do review and create test for this feature? You are the best for this job.

Actions #11

Updated by Ondřej Kopr about 6 years ago

  • Status changed from Needs feedback to Closed
  • % Done changed from 80 to 100

Thank you for your review, there is test: https://github.com/bcvsolutions/CzechIdMng/commit/c14c0b4847216c93d972ca2083c3b0268fbd2a4f thank you for your help.

Actions

Also available in: Atom PDF