Task #900: Very slow synchronization of organizations structure - IdStory Identity Manager

Actions

Copy link

Task #900

closed

Very slow synchronization of organizations structure

Added by Alena Peterová almost 7 years ago. Updated almost 7 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Ondřej Kopr

Category:

Synchronization

Target version:

Garnet (7.7.0)

Start date:

01/09/2018

Due date:

% Done:

100%

Estimated time:

Owner:

Description

Tested on 7.5.3, the organization structure has ~16 000 elements, only 1 root (defined by null parent)

The synchronization of organizations structure is very slow - more than 1 minute per organization.
When the "parent" is not defined so all organizations are roots, it takes less than 1 second per organization.

Please could you look at it?

Files

vazby.csv.zip (101 KB) vazby.csv.zip

Alena Peterová, 01/09/2018 03:57 PM

Actions

Copy link

Updated by Alena Peterová almost 7 years ago

Sorry, only in redmine I can see my mistake - the Boolean values are interchanged. The root has null parents, non-root has not null parents.
Strangely, the computing of parents works well :-)

Actions

Copy link

Updated by Radek Tomiška almost 7 years ago

Category changed from Tree structures to Synchronization
Assignee changed from Radek Tomiška to Vít Švanda

Actions

Copy link

Updated by Alena Peterová almost 7 years ago

Subject changed from Very slow synchronization of organizations without groovy script to Very slow synchronization of organizations structure
Description updated (diff)

OK, so the difference was really caused by my mistake (calling each element root). When I corrected the script, the synchronization is as slow as without the script.
So the aim of this ticket should be checking why is the default sync of 16k organisations' structure so slow.

Actions

Copy link

Updated by Vít Švanda almost 7 years ago

Do you have more complex structure or all 16000 items are directly under one root?
Can you attach the source export with organizations?

Actions

Copy link

Updated by Vít Švanda almost 7 years ago

Target version set to Garnet (7.7.0)

Actions

Copy link

Updated by Alena Peterová almost 7 years ago

File vazby.csv.zip vazby.csv.zip added

The structure is complex, several levels of organizations.
The compressed CSV is attached. I connected it by CSV connector, the attribute CISLOSSMSPM is mapped to the code of the organization, the attribute MANAGER_CISLOSSMSPM to the parent code. Other attributes are not important.

The names of the organizations were taken from different source, so the organizations already exist in IdM. I only need to synchronize the structure from this CSV. I linked the accounts first without setting the parent (because Update entity is not an option yet - #878) and then run the synchronization for LINKED -> Update Entity with computing the parents.

Actions

Copy link

Updated by Vít Švanda almost 7 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Vít Švanda almost 7 years ago

I simulated the problem and sync of tree for 16000 accounts is really slow.
Problem is in the transformation of the attribute value. This transformation is call for every account. In tree sync is this searche evaluated for every account again (16000 * 16000 calls).
I implemented cache for method "AbstractSynchronizationExecutor.getValueByMappedAttribute(AttributeMapping attribute, List<IcAttribute> icAttributes)".
- This cache not supports evic by key.
- This cache is cleared on start and end every sync.
- Correct function assumes, that the every transformation on the attributes will return static result. It means transformation on the attribute can not generate "random" values without dependency on the input values. This predicate have to be more consulted.

Commit in the develop: https://github.com/bcvsolutions/CzechIdMng/commit/b8d3f4b5bba80dd473b1f4a2c3df90e6e92aad06

Actions

Copy link

Updated by Alena Peterová almost 7 years ago

Vít Švanda wrote:

Correct function assumes, that the every transformation on the attributes will return static result. It means transformation on the attribute can not generate "random" values without dependency on the input values.* This predicate have to be more consulted.

Thanks.
This solution makes sense to me. I think that the only use case for "random" values could be generating some "ids" for later use in provisioning or so. But in such case, the value of the attribute for one organisation should be the same during the whole synchronization, so the cache is correct in fact.

Actions

Copy link

#10

Updated by Vít Švanda almost 7 years ago

Status changed from In Progress to Needs feedback
Assignee changed from Vít Švanda to Ondřej Kopr
% Done changed from 0 to 80

I added new checkbox "Cached value" on the mapping attribute detail.

Caching is now using only in the sync.
All exists attribute will be set as cached = false (for back compatibility).
New created attribute will be cached by default.
Documentation is here https://wiki.czechidm.com/devel/dev/system/system-mapping#attribute_cache.

Ondra, could you please do review and create test for this feature? You are the best for this job.

Actions

Copy link

#11

Updated by Ondřej Kopr almost 7 years ago

Status changed from Needs feedback to Closed
% Done changed from 80 to 100

Thank you for your review, there is test: https://github.com/bcvsolutions/CzechIdMng/commit/c14c0b4847216c93d972ca2083c3b0268fbd2a4f thank you for your help.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

IdStory Identity Manager

Custom queries

Task #900

Very slow synchronization of organizations structure

Updated by Alena Peterová almost 7 years ago

Updated by Radek Tomiška almost 7 years ago

Updated by Alena Peterová almost 7 years ago

Updated by Vít Švanda almost 7 years ago

Updated by Vít Švanda almost 7 years ago

Updated by Alena Peterová almost 7 years ago

Updated by Vít Švanda almost 7 years ago

Updated by Vít Švanda almost 7 years ago

Updated by Alena Peterová almost 7 years ago

Updated by Vít Švanda almost 7 years ago

Updated by Ondřej Kopr almost 7 years ago