Project

General

Profile

Defect #2686

Synchronization of identities and contracts sometimes left Waiting tasks (-> next synchronization failed to start HR processes)

Added by Alena Peterová 6 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Category:
Synchronization
Target version:
Start date:
02/12/2021
Due date:
% Done:

100%

Estimated time:
Milestones:

Description

Tested on 10.7.2, happened also on 10.7.0.

After synchronization of identities + contracts, which created 1 new identity with some automatic roles, some of the tasks (HrEnableContractProcess, HrEndContractProcess, or ProcessAllAutomaticRoleByAttributeTaskExecutor) stayed as "Waiting" in the Scheduler. Next synchronization wasn't able to start them => failed.

My testing data are the same as in Squash (or #2636), only the synchronizations are set as "Reconcilation". Synchronization of contracts is scheduled as dependent on synchronization of identities. I start this Synchronization task manually.

The problematic situation happened 4x in a row, with a bit different combination of Waiting tasks - see screenshots. Then I tested it 3x with 2 identities in source data and all was correct. Then I removed one of them and tested again 4x in a row - now all correct.

Details:
  • This 1st synchronization created a new identity. The synchronization, HrEnableContractProcess, HrEndContractProcess is Waiting.
  • Cancel tasks, remove the identity, run synchronizations again. Now ProcessAllAutomaticRoleByAttributeTaskExecutor is Waiting and synchronization of contracts fails, because it can't start it again.
  • Cancel tasks, remove the identity, run synchronizations again. (ProcessSkippedAutomaticRoleByTreeForContract processes more flags, probably because it didn't start in the previous run.)

  • Cancel tasks, remove the identity, run synchronizations again. Now the result was the same as in the 1st case
  • Make some more tests with 2 identities, no problem.
  • Return to the original data, run synchronization - now all is green
correct.png (80.5 KB) correct.png Alena Peterová, 02/12/2021 06:46 PM
ProcessSkippedAutomaticRoleByTreeForContract_detail.png (51.6 KB) ProcessSkippedAutomaticRoleByTreeForContract_detail.png Alena Peterová, 02/12/2021 06:46 PM
waiting_tasks.png (79.1 KB) waiting_tasks.png Alena Peterová, 02/12/2021 06:46 PM
waiting_tasks_2.png (79.6 KB) waiting_tasks_2.png Alena Peterová, 02/12/2021 06:46 PM
waiting_tasks_3.png (77.7 KB) waiting_tasks_3.png Alena Peterová, 02/12/2021 06:46 PM
without_automatic_roles_after_identities.png (84.3 KB) without_automatic_roles_after_identities.png Alena Peterová, 02/18/2021 10:16 AM
events.png (158 KB) events.png Alena Peterová, 02/18/2021 10:16 AM

Related issues

Related to CzechIdM - Task #2444: Implement waiting for the completion of the LRT after all asynchronous eventsClosed08/18/2020

Related to CzechIdM - Defect #2743: Event: Start event remains in running state, when long running task ends with exception.Closed03/31/2021

History

#1 Updated by Alena Peterová 6 months ago

I have a snapshot of the virtual server made after the 1st run, if needed.

#2 Updated by Radek Tomiška 5 months ago

  • Status changed from New to Needs feedback
  • Assignee changed from Vít Švanda to Alena Peterová

I think the source of issue is related to ProcessAllAutomaticRoleByAttributeTaskExecutor, which run in each synchronization - it should run in the second synchronization only.
Could you test it please without it in synchronization of identities?

If it's true, then I can look, what can be improved in dependent task execution, because if synchronization of contracts is scheduled as dependent on synchronization of identities, then the first ProcessAllAutomaticRoleByAttributeTaskExecutor is executed "in the middle" (~ synchronization on contract run in the same time as ProcessAllAutomaticRoleByAttributeTaskExecutor).

#3 Updated by Alena Peterová 5 months ago

Unfortunately, it didn't help. I got the similar situation on the first attempt:

Here are events on the identity.

#4 Updated by Radek Tomiška 5 months ago

  • Status changed from Needs feedback to In Progress

#5 Updated by Alena Peterová 5 months ago

Note: I realized that I did all testing when the server had only 1 CPU.
Task executor is initialized: corePoolSize [1], maxPoolSize [2], queueCapacity [20]
Event executor is initialized: corePoolSize [2], maxPoolSize [4], queueCapacity [50]

So I tried adding 2nd CPU, but the issue still happens sometimes.

Also, I got one more variation of the issue - the only "Waiting" task was the SynchronizationSchedulableTaskExecutor of contracts, all the others were Executed.

#6 Updated by Radek Tomiška 5 months ago

Nice, thx! I have a clue now, until now I wasn't able to reproduce it in my environment.

#7 Updated by Radek Tomiška 5 months ago

  • Related to Task #2444: Implement waiting for the completion of the LRT after all asynchronous events added

#8 Updated by Radek Tomiška 5 months ago

  • Status changed from In Progress to Needs feedback
  • Assignee changed from Radek Tomiška to Vít Švanda
  • Target version set to 10.8.0
  • % Done changed from 0 to 90
  • Affected versions 10.6.0, 10.6.1, 10.6.2, 10.6.3, 10.6.4, 10.7.1, 10.6.5, 10.6.6 added

The issue is related to event processing - when events from synchronization are processed to quickly (~ before all long running tasks are saved into queue), then tasks are left in waiting state. Number of tasks in waiting state depends on how quick was events processed => tasks already saved into queue are left in waiting state, the new one are marked as executed correctly.
This issue occurs mainly, when synchronization "do almost nothing" => creates only few events in queue + hr processes take a long time to process

Commit:
https://github.com/bcvsolutions/CzechIdMng/commit/ea1de47f0e5a4a954a6813d8e39b4e5aab8cae5e

Could you provide me a feedback, please?

#9 Updated by Vít Švanda 5 months ago

  • Status changed from Needs feedback to Resolved
  • Assignee changed from Vít Švanda to Radek Tomiška
  • % Done changed from 90 to 100

I did review and tested it. I was great brain exercise for me. I am not able simulated this now.

To clarify: The solution is in the new event IdmLongRunningTask.START, which is now created at the start of LRT (it was not there before). Thus, it is now true that each LRT has created at least one event that does not end before that LRT.

#10 Updated by Radek Tomiška 5 months ago

  • Status changed from Resolved to Closed

#11 Updated by Radek Tomiška 4 months ago

  • Related to Defect #2743: Event: Start event remains in running state, when long running task ends with exception. added

Also available in: Atom PDF

Go to top