Quantcast

Concurrency bugs in posting events?

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Concurrency bugs in posting events?

Mark Rotteveel-2
I have a test in Jaybird that post multiple events
(https://github.com/FirebirdSQL/jaybird/blob/master/src/test/org/firebirdsql/event/TestFBEventManager.java#L173)

Essentially it inserts on multiple threads (100 inserts per thread, 5
threads, a commit after each insert) which causes a trigger to post

POST_EVENT 'TEST_EVENT_A'
POST_EVENT 'TEST_EVENT_B'
POST_EVENT 'TEST_EVENT_A'

(yes, it is posting TEST_EVENT_A twice)

Which should lead to posting TEST_EVENT_A 1000 times and TEST_EVENT_B
500 times. This test occasionally fails (about 1 in 3 runs on Firebird
3). I initially thought I wasn't waiting long enough for all events to
be received.

After adding some extra debug logging, it looks like Firebird is
occasionally losing event registrations (not posting them, but the event
is unregistered), or posting an event with the id of TEST_EVENT_A, but
posting the count of TEST_EVENT_B (or vice versa), but both events are
unregistered, which seems to indicate some sort of race condition or
interleaved writes of the event.

I can reproduce this problem both with Jaybird 2.2 and Jaybird 3, which
have a totally different implementations.

As far as I can tell it is not version specific, but it is more easily
reproducible on Firebird 3 and 4 (probably because I'm using SuperServer).

Could someone take a look at this?

Mark
--
Mark Rotteveel

------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Mark Rotteveel-2
Anyone?

On 15-1-2017 11:56, Mark Rotteveel wrote:

> I have a test in Jaybird that post multiple events
> (https://github.com/FirebirdSQL/jaybird/blob/master/src/test/org/firebirdsql/event/TestFBEventManager.java#L173)
>
> Essentially it inserts on multiple threads (100 inserts per thread, 5
> threads, a commit after each insert) which causes a trigger to post
>
> POST_EVENT 'TEST_EVENT_A'
> POST_EVENT 'TEST_EVENT_B'
> POST_EVENT 'TEST_EVENT_A'
>
> (yes, it is posting TEST_EVENT_A twice)
>
> Which should lead to posting TEST_EVENT_A 1000 times and TEST_EVENT_B
> 500 times. This test occasionally fails (about 1 in 3 runs on Firebird
> 3). I initially thought I wasn't waiting long enough for all events to
> be received.
>
> After adding some extra debug logging, it looks like Firebird is
> occasionally losing event registrations (not posting them, but the event
> is unregistered), or posting an event with the id of TEST_EVENT_A, but
> posting the count of TEST_EVENT_B (or vice versa), but both events are
> unregistered, which seems to indicate some sort of race condition or
> interleaved writes of the event.
>
> I can reproduce this problem both with Jaybird 2.2 and Jaybird 3, which
> have a totally different implementations.
>
> As far as I can tell it is not version specific, but it is more easily
> reproducible on Firebird 3 and 4 (probably because I'm using SuperServer).
>
> Could someone take a look at this?
>
> Mark
>


--
Mark Rotteveel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Mark Rotteveel-2
In reply to this post by Mark Rotteveel-2
To come back to this again, there seems to be a concurrency bug in
events posted by Firebird to the client. It looks like it overwrites
local event ids (shared buffer, race condition?).

This is triggered by running the entire Jaybird test suite. Running the
specific test, TestFBEventManager.testLargeMultiLoad, in isolation
significantly reduces the chance of it occurring.

For example a test run shows:

[V10AsynchronousChannel]Queue event: WireEventHandle:{
name:TEST_EVENT_A, localId:694, eventId:0, internalCount:897,
previousInternalCount:897 }
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=88 cap=2048]:
000000340000000000000012010C544553545F4556454E545F42C201000000000000000000000000000002B5000000340000000000000012010C544553545F4556454E545F418503000000000000000000000000000002B6
[V10AsynchronousChannel]Received event id 693, eventCount 450
[V10AsynchronousChannel]Queue event: WireEventHandle:{
name:TEST_EVENT_B, localId:695, eventId:0, internalCount:450,
previousInternalCount:450 }
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[V10AsynchronousChannel]Received event id 694, eventCount 901
[V10AsynchronousChannel]Queue event: WireEventHandle:{
name:TEST_EVENT_A, localId:696, eventId:0, internalCount:901,
previousInternalCount:901 }
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=44 cap=2048]:
000000340000000000000012010C544553545F4556454E545F42C301000000000000000000000000000002B8
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[V10AsynchronousChannel]Received event id 696, eventCount 451
[V10AsynchronousChannel]Queue event: WireEventHandle:{
name:TEST_EVENT_A, localId:697, eventId:0, internalCount:451,
previousInternalCount:451 }

In other words, Firebird posts event data for TEST_EVENT_B (count 450 ->
451) with the local event id of TEST_EVENT_A. On occasion I also see
that it resends an earlier - already acknowledged - local event id.

As the event name is proper (although not 100% sure it always is), I
might be able to workaround this in the pure java implementation by
matching based on the event name instead, but that is hardly a good
workaround, because it is possible the same event name is registered
multiple times, and it won't solve the occurrence of the same bug with
the native client.

This seems to suggest a race condition of some kind when the events are
posted/written to the aux connection.

I can reproduce this with Jaybird master, Firebird 3.0.2.32703 on
Windows 10 64 bit, but I have also seen it with other Firebird versions,
and with Jaybird 2.2 (which has a significantly different implementation
of event handling), both with pure java and the native client use.

Any thoughts or ideas on this, or is it better if I just create a bug
report?

Other example: both A and B are acknowledged with id of event B:

[V10AsynchronousChannel]Received event id 640, eventCount 843
[V10AsynchronousChannel]Queue event: WireEventHandle:{
name:TEST_EVENT_A, localId:642, eventId:0, internalCount:843,
previousInternalCount:843 }
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[FBManagedConnection]End called: Xid[773794790]
[V10AsynchronousChannel]Received event id 641, eventCount 422
[V10AsynchronousChannel]Queue event: WireEventHandle:{
name:TEST_EVENT_B, localId:643, eventId:0, internalCount:422,
previousInternalCount:422 }
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=88 cap=2048]:
000000340000000000000012010C544553545F4556454E545F414D0300000000000000000000000000000283000000340000000000000012010C544553545F4556454E545F42A70100000000000000000000000000000283
[V10AsynchronousChannel]Received event id 643, eventCount 845
[V10AsynchronousChannel]Queue event: WireEventHandle:{
name:TEST_EVENT_B, localId:644, eventId:0, internalCount:845,
previousInternalCount:845 }
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[V10AsynchronousChannel]Received event id 643, eventCount 423

On 15-1-2017 11:56, Mark Rotteveel wrote:

> I have a test in Jaybird that post multiple events
> (https://github.com/FirebirdSQL/jaybird/blob/master/src/test/org/firebirdsql/event/TestFBEventManager.java#L173)
>
> Essentially it inserts on multiple threads (100 inserts per thread, 5
> threads, a commit after each insert) which causes a trigger to post
>
> POST_EVENT 'TEST_EVENT_A'
> POST_EVENT 'TEST_EVENT_B'
> POST_EVENT 'TEST_EVENT_A'
>
> (yes, it is posting TEST_EVENT_A twice)
>
> Which should lead to posting TEST_EVENT_A 1000 times and TEST_EVENT_B
> 500 times. This test occasionally fails (about 1 in 3 runs on Firebird
> 3). I initially thought I wasn't waiting long enough for all events to
> be received.
>
> After adding some extra debug logging, it looks like Firebird is
> occasionally losing event registrations (not posting them, but the event
> is unregistered), or posting an event with the id of TEST_EVENT_A, but
> posting the count of TEST_EVENT_B (or vice versa), but both events are
> unregistered, which seems to indicate some sort of race condition or
> interleaved writes of the event.
>
> I can reproduce this problem both with Jaybird 2.2 and Jaybird 3, which
> have a totally different implementations.
>
> As far as I can tell it is not version specific, but it is more easily
> reproducible on Firebird 3 and 4 (probably because I'm using SuperServer).
>
> Could someone take a look at this?
>
> Mark
>


--
Mark Rotteveel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Vlad Khorsun-2
02.04.2017 14:59, Mark Rotteveel wrote:

> To come back to this again, there seems to be a concurrency bug in
> events posted by Firebird to the client. It looks like it overwrites
> local event ids (shared buffer, race condition?).
>
> This is triggered by running the entire Jaybird test suite. Running the
> specific test, TestFBEventManager.testLargeMultiLoad, in isolation
> significantly reduces the chance of it occurring.
>
> For example a test run shows:
>
> [V10AsynchronousChannel]Queue event: WireEventHandle:{
> name:TEST_EVENT_A, localId:694, eventId:0, internalCount:897,
> previousInternalCount:897 }
> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
> [V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=88 cap=2048]:
> 000000340000000000000012010C544553545F4556454E545F42C201000000000000000000000000000002B5000000340000000000000012010C544553545F4556454E545F418503000000000000000000000000000002B6

   It is receive buffer with two op_event (52, or 0x34) packets:

00000034 0000 0000 00000012 01 0C 544553545F4556454E545F42 C2010000 0000 00000000 00000000 000002B5
TEST_EVENT_B: cnt = 0x1C2 = 450, id = 0x2B5 = 693

00000034 0000 0000 00000012 01 0C 544553545F4556454E545F41 85030000 0000 00000000 00000000 000002B6
TEST_EVENT_A: cnt = 0x385 = 901, id = 0x2B6 = 694


> [V10AsynchronousChannel]Received event id 693, eventCount 450
> [V10AsynchronousChannel]Queue event: WireEventHandle:{
> name:TEST_EVENT_B, localId:695, eventId:0, internalCount:450,
> previousInternalCount:450 }
> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
> [V10AsynchronousChannel]Received event id 694, eventCount 901
> [V10AsynchronousChannel]Queue event: WireEventHandle:{
> name:TEST_EVENT_A, localId:696, eventId:0, internalCount:901,
> previousInternalCount:901 }

   I see no op_que_events (48, or 0x30) packets.

> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
> [V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=44 cap=2048]:
> 000000340000000000000012010C544553545F4556454E545F42C301000000000000000000000000000002B8

   This is received packet with

TEST_EVENT_B: cnt = 0x1C3 = 451, id = 0x2B5 = 696

You claim it must have it 695, correct ?

> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
> [V10AsynchronousChannel]Received event id 696, eventCount 451
> [V10AsynchronousChannel]Queue event: WireEventHandle:{
> name:TEST_EVENT_A, localId:697, eventId:0, internalCount:451,
> previousInternalCount:451 }
>
> In other words, Firebird posts event data for TEST_EVENT_B (count 450 ->
> 451) with the local event id of TEST_EVENT_A. On occasion I also see
> that it resends an earlier - already acknowledged - local event id.

   I can't confirm it without op_que_events content.

> As the event name is proper (although not 100% sure it always is),

   This is really interesting to check

...

> I can reproduce this with Jaybird master, Firebird 3.0.2.32703 on
> Windows 10 64 bit, but I have also seen it with other Firebird versions,
> and with Jaybird 2.2 (which has a significantly different implementation
> of event handling), both with pure java and the native client use.
>
> Any thoughts or ideas on this, or is it better if I just create a bug
> report?

   Ideally, reproducible test case needed. As simple, as possible. Also, we
could log every packet related to events on server side.

> Other example: both A and B are acknowledged with id of event B:

   Provide, please, op_que_events contents to confirm this.

Regards,
Vlad


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Mark Rotteveel-2
On 2-4-2017 15:23, Vlad Khorsun wrote:

>> [V10AsynchronousChannel]Queue event: WireEventHandle:{
>> name:TEST_EVENT_A, localId:694, eventId:0, internalCount:897,
>> previousInternalCount:897 }
>> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
>> [V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=88 cap=2048]:
>> 000000340000000000000012010C544553545F4556454E545F42C201000000000000000000000000000002B5000000340000000000000012010C544553545F4556454E545F418503000000000000000000000000000002B6
>
>    It is receive buffer with two op_event (52, or 0x34) packets:
>
> 00000034 0000 0000 00000012 01 0C 544553545F4556454E545F42 C2010000 0000 00000000 00000000 000002B5
> TEST_EVENT_B: cnt = 0x1C2 = 450, id = 0x2B5 = 693
>
> 00000034 0000 0000 00000012 01 0C 544553545F4556454E545F41 85030000 0000 00000000 00000000 000002B6
> TEST_EVENT_A: cnt = 0x385 = 901, id = 0x2B6 = 694

That was one that went well, it is the next one that is the problem

>> [V10AsynchronousChannel]Received event id 693, eventCount 450
>> [V10AsynchronousChannel]Queue event: WireEventHandle:{
>> name:TEST_EVENT_B, localId:695, eventId:0, internalCount:450,
>> previousInternalCount:450 }
>> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
>> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
>> [V10AsynchronousChannel]Received event id 694, eventCount 901
>> [V10AsynchronousChannel]Queue event: WireEventHandle:{
>> name:TEST_EVENT_A, localId:696, eventId:0, internalCount:901,
>> previousInternalCount:901 }
>
>    I see no op_que_events (48, or 0x30) packets.

That is because I don't normally log messages (given how Jaybird works,
that is even a little bit hard to do), logging the response in this
specific case is a bit easier, because I use a different way of
processing the event responses over the aux connection.

>> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
>> [V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=44 cap=2048]:
>> 000000340000000000000012010C544553545F4556454E545F42C301000000000000000000000000000002B8
>
>    This is received packet with
>
> TEST_EVENT_B: cnt = 0x1C3 = 451, id = 0x2B5 = 696
>
> You claim it must have it 695, correct ?

Yes, because that is the one I queued.

>> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
>> [V10AsynchronousChannel]Received event id 696, eventCount 451
>> [V10AsynchronousChannel]Queue event: WireEventHandle:{
>> name:TEST_EVENT_A, localId:697, eventId:0, internalCount:451,
>> previousInternalCount:451 }
>>
>> In other words, Firebird posts event data for TEST_EVENT_B (count 450 ->
>> 451) with the local event id of TEST_EVENT_A. On occasion I also see
>> that it resends an earlier - already acknowledged - local event id.
>
>    I can't confirm it without op_que_events content.

I will see if I can add logging in this specific case.

>> As the event name is proper (although not 100% sure it always is),
>
>    This is really interesting to check
>
> ...
>
>> I can reproduce this with Jaybird master, Firebird 3.0.2.32703 on
>> Windows 10 64 bit, but I have also seen it with other Firebird versions,
>> and with Jaybird 2.2 (which has a significantly different implementation
>> of event handling), both with pure java and the native client use.
>>
>> Any thoughts or ideas on this, or is it better if I just create a bug
>> report?
>
>    Ideally, reproducible test case needed. As simple, as possible. Also, we
> could log every packet related to events on server side.

So far I can only trigger it about once every three times if I start the
entire Jaybird test suite. Starting only this test makes it hard to do so.

>> Other example: both A and B are acknowledged with id of event B:
>
>    Provide, please, op_que_events contents to confirm this.

Will try to do that.

Mark
--
Mark Rotteveel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Mark Rotteveel-2
In reply to this post by Vlad Khorsun-2
On 2-4-2017 15:23, Vlad Khorsun wrote:
>    Ideally, reproducible test case needed. As simple, as possible. Also, we
> could log every packet related to events on server side.
>
>> Other example: both A and B are acknowledged with id of event B:
>
>    Provide, please, op_que_events contents to confirm this.

In this example event A (which should have id 212) is posted with the id
of B (213)

[V10AsynchronousChannel]Queue event: WireEventHandle:{
name:TEST_EVENT_A, localId:212, eventId:0, internalCount:245,
previousInternalCount:245 }
[V10AsynchronousChannel]Queue event data:
000000300000000000000012010C544553545F4556454E545F41F500000020200000000000000000000000D4
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=44 cap=2048]:
000000340000000000000012010C544553545F4556454E545F427C00000000000000000000000000000000D3
[V10AsynchronousChannel]Received event id 211, eventCount 124
[V10AsynchronousChannel]Queue event: WireEventHandle:{
name:TEST_EVENT_B, localId:213, eventId:0, internalCount:124,
previousInternalCount:124 }
[V10AsynchronousChannel]Queue event data:
000000300000000000000012010C544553545F4556454E545F427C00000020200000000000000000000000D5
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=44 cap=2048]:
000000340000000000000012010C544553545F4556454E545F41F700000000000000000000000000000000D5
[FBManagedConnection]End called: Xid[1299800912]
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[V10AsynchronousChannel]Received event id 213, eventCount 247
[V10AsynchronousChannel]Queue event: WireEventHandle:{
name:TEST_EVENT_B, localId:214, eventId:0, internalCount:247,
previousInternalCount:247 }
[V10AsynchronousChannel]Queue event data:
000000300000000000000012010C544553545F4556454E545F42F700000020200000000000000000000000D6
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
[V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=44 cap=2048]:
000000340000000000000012010C544553545F4556454E545F427D00000000000000000000000000000000D5
[V10AsynchronousChannel]Received event id 213, eventCount 125

Mark
--
Mark Rotteveel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Vlad Khorsun-2
02.04.2017 17:15, Mark Rotteveel wrote:

> On 2-4-2017 15:23, Vlad Khorsun wrote:
>>    Ideally, reproducible test case needed. As simple, as possible. Also, we
>> could log every packet related to events on server side.
>>
>>> Other example: both A and B are acknowledged with id of event B:
>>
>>    Provide, please, op_que_events contents to confirm this.
>
> In this example event A (which should have id 212) is posted with the id
> of B (213)
>
> [V10AsynchronousChannel]Queue event: WireEventHandle:{
> name:TEST_EVENT_A, localId:212, eventId:0, internalCount:245,
> previousInternalCount:245 }
> [V10AsynchronousChannel]Queue event data:
> 000000300000000000000012010C544553545F4556454E545F41F500000020200000000000000000000000D4
> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
> [V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=44 cap=2048]:
> 000000340000000000000012010C544553545F4556454E545F427C00000000000000000000000000000000D3
> [V10AsynchronousChannel]Received event id 211, eventCount 124
> [V10AsynchronousChannel]Queue event: WireEventHandle:{
> name:TEST_EVENT_B, localId:213, eventId:0, internalCount:124,
> previousInternalCount:124 }
> [V10AsynchronousChannel]Queue event data:
> 000000300000000000000012010C544553545F4556454E545F427C00000020200000000000000000000000D5
> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
> [V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=44 cap=2048]:
> 000000340000000000000012010C544553545F4556454E545F41F700000000000000000000000000000000D5
> [FBManagedConnection]End called: Xid[1299800912]
> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
> [V10AsynchronousChannel]Received event id 213, eventCount 247
> [V10AsynchronousChannel]Queue event: WireEventHandle:{
> name:TEST_EVENT_B, localId:214, eventId:0, internalCount:247,
> previousInternalCount:247 }
> [V10AsynchronousChannel]Queue event data:
> 000000300000000000000012010C544553545F4556454E545F42F700000020200000000000000000000000D6
> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
> [V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=44 cap=2048]:
> 000000340000000000000012010C544553545F4556454E545F427D00000000000000000000000000000000D5
> [V10AsynchronousChannel]Received event id 213, eventCount 125

   So, we have following packets exchange

1. op_que_event TEST_EVENT_A, cnt = 0xF5 (245), id = 0xD4 (212)
2. op_event     TEST_EVENT_B, cnt = 0x7C (124), id = 0xD3 (211)
3. op_que_event TEST_EVENT_B, cnt = 0x7C (124), id = 0xD5 (213)
4. op_event     TEST_EVENT_A, cnt = 0xF7 (247), id = 0xD5 (213)
5. op_que_event TEST_EVENT_B, cnt = 0xF7 (247), id = 0xD6 (214)
6. op_event     TEST_EVENT_B, cnt = 0x7D (125), id = 0xD5 (213)

   Packet 4 should contain id 212, agree. Looks like a server bug.

   But how could you explain packet 5 which contains wrong count (247) ?
op_que_event packet is created by client, it should not be affected by
any server bug.

Regards,
Vlad


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Mark Rotteveel-2
On 2-4-2017 16:48, Vlad Khorsun wrote:

> 02.04.2017 17:15, Mark Rotteveel wrote:
>> On 2-4-2017 15:23, Vlad Khorsun wrote:
>>>    Ideally, reproducible test case needed. As simple, as possible. Also, we
>>> could log every packet related to events on server side.
>>>
>>>> Other example: both A and B are acknowledged with id of event B:
>>>
>>>    Provide, please, op_que_events contents to confirm this.
>>
>> In this example event A (which should have id 212) is posted with the id
>> of B (213)
>>
>> [V10AsynchronousChannel]Queue event: WireEventHandle:{
>> name:TEST_EVENT_A, localId:212, eventId:0, internalCount:245,
>> previousInternalCount:245 }
>> [V10AsynchronousChannel]Queue event data:
>> 000000300000000000000012010C544553545F4556454E545F41F500000020200000000000000000000000D4
>> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
>> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
>> [V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=44 cap=2048]:
>> 000000340000000000000012010C544553545F4556454E545F427C00000000000000000000000000000000D3
>> [V10AsynchronousChannel]Received event id 211, eventCount 124
>> [V10AsynchronousChannel]Queue event: WireEventHandle:{
>> name:TEST_EVENT_B, localId:213, eventId:0, internalCount:124,
>> previousInternalCount:124 }
>> [V10AsynchronousChannel]Queue event data:
>> 000000300000000000000012010C544553545F4556454E545F427C00000020200000000000000000000000D5
>> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
>> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
>> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
>> [V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=44 cap=2048]:
>> 000000340000000000000012010C544553545F4556454E545F41F700000000000000000000000000000000D5
>> [FBManagedConnection]End called: Xid[1299800912]
>> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
>> [V10AsynchronousChannel]Received event id 213, eventCount 247
>> [V10AsynchronousChannel]Queue event: WireEventHandle:{
>> name:TEST_EVENT_B, localId:214, eventId:0, internalCount:247,
>> previousInternalCount:247 }
>> [V10AsynchronousChannel]Queue event data:
>> 000000300000000000000012010C544553545F4556454E545F42F700000020200000000000000000000000D6
>> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
>> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
>> [AbstractWireOperations]readStatusVector arg:isc_arg_gds int: 0
>> [V10AsynchronousChannel]java.nio.HeapByteBuffer[pos=0 lim=44 cap=2048]:
>> 000000340000000000000012010C544553545F4556454E545F427D00000000000000000000000000000000D5
>> [V10AsynchronousChannel]Received event id 213, eventCount 125
>
>    So, we have following packets exchange
>
> 1. op_que_event TEST_EVENT_A, cnt = 0xF5 (245), id = 0xD4 (212)
> 2. op_event     TEST_EVENT_B, cnt = 0x7C (124), id = 0xD3 (211)
> 3. op_que_event TEST_EVENT_B, cnt = 0x7C (124), id = 0xD5 (213)
> 4. op_event     TEST_EVENT_A, cnt = 0xF7 (247), id = 0xD5 (213)
> 5. op_que_event TEST_EVENT_B, cnt = 0xF7 (247), id = 0xD6 (214)
> 6. op_event     TEST_EVENT_B, cnt = 0x7D (125), id = 0xD5 (213)
>
>    Packet 4 should contain id 212, agree. Looks like a server bug.
>
>    But how could you explain packet 5 which contains wrong count (247) ?
> op_que_event packet is created by client, it should not be affected by
> any server bug.

That is because the current implementation in Jaybird only looks at the
id, so when in step 4 it receives id 213 with count 247, it assumes that
247 is the new count for TEST_EVENT_B. It then re-queues that event with
the updated count.

This BTW is also what I believe the native implementation does: only
looking at the id.

--
Mark Rotteveel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[SPAM] Re: Concurrency bugs in posting events?

Vlad Khorsun-2
02.04.2017 18:07, Mark Rotteveel wrote:
> On 2-4-2017 16:48, Vlad Khorsun wrote:
...

>>    So, we have following packets exchange
>>
>> 1. op_que_event TEST_EVENT_A, cnt = 0xF5 (245), id = 0xD4 (212)
>> 2. op_event     TEST_EVENT_B, cnt = 0x7C (124), id = 0xD3 (211)
>> 3. op_que_event TEST_EVENT_B, cnt = 0x7C (124), id = 0xD5 (213)
>> 4. op_event     TEST_EVENT_A, cnt = 0xF7 (247), id = 0xD5 (213)
>> 5. op_que_event TEST_EVENT_B, cnt = 0xF7 (247), id = 0xD6 (214)
>> 6. op_event     TEST_EVENT_B, cnt = 0x7D (125), id = 0xD5 (213)
>>
>>    Packet 4 should contain id 212, agree. Looks like a server bug.
>>
>>    But how could you explain packet 5 which contains wrong count (247) ?
>> op_que_event packet is created by client, it should not be affected by
>> any server bug.
>
> That is because the current implementation in Jaybird only looks at the
> id, so when in step 4 it receives id 213 with count 247, it assumes that
> 247 is the new count for TEST_EVENT_B. It then re-queues that event with
> the updated count.
>
> This BTW is also what I believe the native implementation does: only
> looking at the id.

   Not sure i understand it. Does it means that events buffer, used at packet 4,
was reused to reque same event at packet 5 ? If yes, why event name was changed
by client while events count - doesn't ? If client looked at id when received
event and re-queued it again - why it changed event name ? I could understand
if client, when re-queued same event, leave received buffer as is...

   Also, i thought that test is multy threaded. Now it looks like sample above
was produced by single thread, correct ?

   Asking all that questions i want to be sure that there is no mistake at client
side when working with event buffers.

   As for what "native implementation does" - fbclient does nothing whith events
buffers, it just passed it between application and the server.

Regards,
Vlad

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [SPAM] Re: Concurrency bugs in posting events?

Mark Rotteveel-2
On 2-4-2017 17:35, Vlad Khorsun wrote:

> 02.04.2017 18:07, Mark Rotteveel wrote:
>> On 2-4-2017 16:48, Vlad Khorsun wrote:
> ...
>>>    So, we have following packets exchange
>>>
>>> 1. op_que_event TEST_EVENT_A, cnt = 0xF5 (245), id = 0xD4 (212)
>>> 2. op_event     TEST_EVENT_B, cnt = 0x7C (124), id = 0xD3 (211)
>>> 3. op_que_event TEST_EVENT_B, cnt = 0x7C (124), id = 0xD5 (213)
>>> 4. op_event     TEST_EVENT_A, cnt = 0xF7 (247), id = 0xD5 (213)
>>> 5. op_que_event TEST_EVENT_B, cnt = 0xF7 (247), id = 0xD6 (214)
>>> 6. op_event     TEST_EVENT_B, cnt = 0x7D (125), id = 0xD5 (213)
>>>
>>>    Packet 4 should contain id 212, agree. Looks like a server bug.
>>>
>>>    But how could you explain packet 5 which contains wrong count (247) ?
>>> op_que_event packet is created by client, it should not be affected by
>>> any server bug.
>>
>> That is because the current implementation in Jaybird only looks at the
>> id, so when in step 4 it receives id 213 with count 247, it assumes that
>> 247 is the new count for TEST_EVENT_B. It then re-queues that event with
>> the updated count.
>>
>> This BTW is also what I believe the native implementation does: only
>> looking at the id.
>
>    Not sure i understand it. Does it means that events buffer, used at packet 4,
> was reused to reque same event at packet 5 ? If yes, why event name was changed
> by client while events count - doesn't ? If client looked at id when received
> event and re-queued it again - why it changed event name ? I could understand
> if client, when re-queued same event, leave received buffer as is...

No, that is not how Jaybird works.

* It received an event for id 213 with count 247. Internally it notifies
a listener of that (V10AsynchronousChannel.processSingleEvent)

* That listener will notify all 'event handles'
(AsynchronousChannelListenerDispatcher.eventReceived)

* The one with a matching event id (TEST_EVENT_B in this case) updates
its internal count (WireEventHandle.eventReceived)

* That in turn will cause the event handler to be notified, in this case
that will calculate the change between previous and current count and
post an update to an internal queue for asynchronous notification of
interested parties (FBEventManager.GdsEventHandler.eventOccurred)

* The handler of the previous step will then re-queue the event

The queueing of events is done by writing into a socket stream, it
doesn't reuse buffers for that. See
https://github.com/FirebirdSQL/jaybird/blob/master/src/main/org/firebirdsql/gds/ng/wire/version10/V10AsynchronousChannel.java#L192

>    Also, i thought that test is multy threaded. Now it looks like sample above
> was produced by single thread, correct ?

Depends on what you want to be single or multi-threaded. The inserts
that cause events to be posted are done on multiple threads (5 threads x
100 inserts).

The receiving of event notifications and re-queuing is a single thread
handled by AsynchronousProcessor (speaking about the wire protocol
implementation only).

>    Asking all that questions i want to be sure that there is no mistake at client
> side when working with event buffers.

The only buffer that is reused is the ByteBuffer for reading the
received events from the socket channel, that buffer is automatically
restricted to the number of bytes actually read, it is not possible to
read beyond that and for example read stale data from a previous read.

>    As for what "native implementation does" - fbclient does nothing whith events
> buffers, it just passed it between application and the server.

Then how does the native client identify the correct callback handle to
invoke?

Mark
--
Mark Rotteveel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Mark Rotteveel-2
In reply to this post by Vlad Khorsun-2
On 2-4-2017 15:23, Vlad Khorsun wrote:
> 02.04.2017 14:59, Mark Rotteveel wrote:
>> To come back to this again, there seems to be a concurrency bug in
>> events posted by Firebird to the client. It looks like it overwrites
>> local event ids (shared buffer, race condition?).
>>
>> This is triggered by running the entire Jaybird test suite. Running the
>> specific test, TestFBEventManager.testLargeMultiLoad, in isolation
>> significantly reduces the chance of it occurring.

[..]

>> Any thoughts or ideas on this, or is it better if I just create a bug
>> report?
>
>    Ideally, reproducible test case needed. As simple, as possible. Also, we
> could log every packet related to events on server side.

I have built a sample application that reproduces it a little bit more
consistently (although it still occasionally succeeds without a
mismatch). Note that this doesn't include the logging I showed in this
conversation, let me know if you need that.

You can download it from
https://www.dropbox.com/s/6jxfcadxtojodf8/event-race-condition-1.0-SNAPSHOT.zip?dl=0

Start with ./bin/event-race-condition --help for instructions. It
requires Java 8.

Playing around with the --threadCount and --insertsPerThread can help to
improve predictability to reproduce it. I used the same number of
threads as I have (HT) cores in my machine. Using more inserts per
thread can also increase the chance of it eventually occurring.

defaults are:
     private static String hostName = "localhost";
     private static int portNumber = 3050;
     private static String databasePath = "D:/data/db/fb3/eventrace.fdb";
     private static String user = "sysdba";
     private static String password = "masterkey";
     private static int threadCount = 8;
     private static int insertsPerThread = 200;

Full project: https://github.com/mrotteveel/event-race-condition

--
Mark Rotteveel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Vlad Khorsun-2
09.04.2017 13:00, Mark Rotteveel wrote:
> On 2-4-2017 15:23, Vlad Khorsun wrote:
>> 02.04.2017 14:59, Mark Rotteveel wrote:
...
>>> Any thoughts or ideas on this, or is it better if I just create a bug
>>> report?
>>
>>     Ideally, reproducible test case needed. As simple, as possible. Also, we
>> could log every packet related to events on server side.
>
> I have built a sample application that reproduces it a little bit more
> consistently (although it still occasionally succeeds without a
> mismatch).

   Thanks. I downloaded and run it. Note, i need to change WireCrypt setting to
Enabled to make it work, but this is not so important :)

   It is almost impossible to see the mismatch at my host. Only way to see it
once in a few runs is to set 64 threads and 5000 inserts per thread. Ok, it
is reproduced, but...

   To verify what i see i run FlameRobin and register same two events for
monitoring. When test failed i saved output of FlameRobin into a text file
and convert it into LibreOffice Calc sheet. Then i evaluate sum of every
event counters and check it for mismatch - they are equal. Also, every pair
of "TEST_EVENT_A" and "TEST_EVENT_B" have the same number of events - it looks
like

16:50:54  Monitoring started
16:50:54  TEST_EVENT_A (170)
16:50:54  TEST_EVENT_B (170)
16:50:54  TEST_EVENT_A (195)
16:50:54  TEST_EVENT_B (195)
16:50:54  TEST_EVENT_A (227)
16:50:54  TEST_EVENT_B (227)
16:50:55  TEST_EVENT_A (222)
16:50:55  TEST_EVENT_B (222)

and so on

   I.e. test app shows:

Using thread count 64, number of inserts per thread 5000
Inserts done 320000, matches expected 320000: true
Event EVENT_NAME_A, expected 320000, actual 320000: true
Event EVENT_NAME_B, expected 320000, actual 271541: false <================

while FlameRobin received 301782 of events "TEST_EVENT_A" and same number of
"TEST_EVENT_B". Yes, it is not 320000 as one could expect - it is because i need
some time to connect and to start event monitoring. I can't do it in advance as
test applications recreates database and fails on start if there is attachment
exists. Anyway, 301782 is much more than 271541.


   Then i repeat the test and results is:

Using thread count 64, number of inserts per thread 5000
Inserts done 320000, matches expected 320000: true
Event EVENT_NAME_A, expected 320000, actual 133394: false <================
Event EVENT_NAME_B, expected 320000, actual 320000: true

while FlameRobin received 312263 events "TEST_EVENT_A" and 312263 events
"TEST_EVENT_B".

I can send you sheet with data if you wish.

Regards,
Vlad

PS Every time test app show mismatsh i see that one event have expected counter.
I.e. there was no case when both counters was wrong.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Mark Rotteveel-2
On 9-4-2017 16:07, Vlad Khorsun wrote:

> 09.04.2017 13:00, Mark Rotteveel wrote:
>> On 2-4-2017 15:23, Vlad Khorsun wrote:
>>> 02.04.2017 14:59, Mark Rotteveel wrote:
> ...
>>>> Any thoughts or ideas on this, or is it better if I just create a bug
>>>> report?
>>>
>>>     Ideally, reproducible test case needed. As simple, as possible. Also, we
>>> could log every packet related to events on server side.
>>
>> I have built a sample application that reproduces it a little bit more
>> consistently (although it still occasionally succeeds without a
>> mismatch).
>
>    Thanks. I downloaded and run it. Note, i need to change WireCrypt setting to
> Enabled to make it work, but this is not so important :)

Well, that is a general Jaybird requirement ;)

>    It is almost impossible to see the mismatch at my host. Only way to see it
> once in a few runs is to set 64 threads and 5000 inserts per thread. Ok, it
> is reproduced, but...

Are you using SuperServer?

>    To verify what i see i run FlameRobin and register same two events for
> monitoring. When test failed i saved output of FlameRobin into a text file
> and convert it into LibreOffice Calc sheet. Then i evaluate sum of every
> event counters and check it for mismatch - they are equal. Also, every pair
> of "TEST_EVENT_A" and "TEST_EVENT_B" have the same number of events - it looks
> like
>
> 16:50:54  Monitoring started
> 16:50:54  TEST_EVENT_A (170)
> 16:50:54  TEST_EVENT_B (170)
> 16:50:54  TEST_EVENT_A (195)
> 16:50:54  TEST_EVENT_B (195)
> 16:50:54  TEST_EVENT_A (227)
> 16:50:54  TEST_EVENT_B (227)
> 16:50:55  TEST_EVENT_A (222)
> 16:50:55  TEST_EVENT_B (222)
>
> and so on
>
>    I.e. test app shows:
>
> Using thread count 64, number of inserts per thread 5000
> Inserts done 320000, matches expected 320000: true
> Event EVENT_NAME_A, expected 320000, actual 320000: true
> Event EVENT_NAME_B, expected 320000, actual 271541: false <================
>
> while FlameRobin received 301782 of events "TEST_EVENT_A" and same number of
> "TEST_EVENT_B". Yes, it is not 320000 as one could expect - it is because i need
> some time to connect and to start event monitoring. I can't do it in advance as
> test applications recreates database and fails on start if there is attachment
> exists. Anyway, 301782 is much more than 271541.
>
>
>    Then i repeat the test and results is:
>
> Using thread count 64, number of inserts per thread 5000
> Inserts done 320000, matches expected 320000: true
> Event EVENT_NAME_A, expected 320000, actual 133394: false <================
> Event EVENT_NAME_B, expected 320000, actual 320000: true
>
> while FlameRobin received 312263 events "TEST_EVENT_A" and 312263 events
> "TEST_EVENT_B".
>
> I can send you sheet with data if you wish.

Interesting. Looking through the flame robin code
(https://github.com/mariuz/flamerobin/blob/master/src/ibpp/events.cpp),
it looks like it queues all events it is interested in in one
queue-event call, while Jaybird registers them individually.

This could explain the difference in behavior, maybe it works correctly
because there is only a single event buffer with two events, instead of
two event buffers with each one event.

> PS Every time test app show mismatsh i see that one event have expected counter.
> I.e. there was no case when both counters was wrong.

That is what I usually see as well, because the notification of an event
will re-queue that event.

Mark
--
Mark Rotteveel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Adriano dos Santos Fernandes-3
Mark and Vlad,

So far, I can see that when the problem start to happen, server stopped
to receive op_que_events from client.


Adriano

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Adriano dos Santos Fernandes-3
Vlad,

I think have found the problem in server.

Look at this:

--------
ISC_STATUS rem_port::que_events(P_EVENT * stuff, PACKET* sendL)
{
...

        Rvnt* event;
        for (event = rdb->rdb_events; event; event = event->rvnt_next)
        {
                if (!event->rvnt_iface)
                {
                        event->rvnt_destroyed = 0;
                        break;
                }
        }

        if (!event)
        {
                event = FB_NEW Rvnt;
#ifdef DEBUG_REMOTE_MEMORY
                printf("que_events(server)        allocate event   %x\n", event);
#endif
                event->rvnt_next = rdb->rdb_events;
                rdb->rdb_events = event;
                event->rvnt_callback = FB_NEW Callback(rdb, event);
        }
--------

This code runs concurrently and find the same empty slot for
simultaneous events being registered.


Adriano


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Adriano dos Santos Fernandes-3
After the code block I quoted, there is this line:

        event->rvnt_id = stuff->p_event_rid;

Which makes a already reused event to be reassigned another id.

Later, engine calls the callback and remote respond both events with the
same id.


Adriano


Em 09/04/2017 16:33, Adriano dos Santos Fernandes escreveu:

> Vlad,
>
> I think have found the problem in server.
>
> Look at this:
>
> --------
> ISC_STATUS rem_port::que_events(P_EVENT * stuff, PACKET* sendL)
> {
> ...
>
> Rvnt* event;
> for (event = rdb->rdb_events; event; event = event->rvnt_next)
> {
> if (!event->rvnt_iface)
> {
> event->rvnt_destroyed = 0;
> break;
> }
> }
>
> if (!event)
> {
> event = FB_NEW Rvnt;
> #ifdef DEBUG_REMOTE_MEMORY
> printf("que_events(server)        allocate event   %x\n", event);
> #endif
> event->rvnt_next = rdb->rdb_events;
> rdb->rdb_events = event;
> event->rvnt_callback = FB_NEW Callback(rdb, event);
> }
> --------
>
> This code runs concurrently and find the same empty slot for
> simultaneous events being registered.
>
>
> Adriano
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Vlad Khorsun-2
In reply to this post by Adriano dos Santos Fernandes-3
09.04.2017 22:33, Adriano dos Santos Fernandes wrote:

> Vlad,
>
> I think have found the problem in server.
>
> Look at this:
>
> --------
> ISC_STATUS rem_port::que_events(P_EVENT * stuff, PACKET* sendL)
> {
> ...
>
> Rvnt* event;
> for (event = rdb->rdb_events; event; event = event->rvnt_next)
> {
> if (!event->rvnt_iface)
> {
> event->rvnt_destroyed = 0;
> break;
> }
> }
>
> if (!event)
> {
> event = FB_NEW Rvnt;
> #ifdef DEBUG_REMOTE_MEMORY
> printf("que_events(server)        allocate event   %x\n", event);
> #endif
> event->rvnt_next = rdb->rdb_events;
> rdb->rdb_events = event;
> event->rvnt_callback = FB_NEW Callback(rdb, event);
> }
> --------
>
> This code runs concurrently and find the same empty slot for
> simultaneous events being registered.

   Good catch, thanks !

   I'm testing a patch for this.

Regards,
Vlad

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Mark Rotteveel-2
In reply to this post by Adriano dos Santos Fernandes-3
On 9-4-2017 21:33, Adriano dos Santos Fernandes wrote:

> Vlad,
>
> I think have found the problem in server.
>
> Look at this:
>
> --------
> ISC_STATUS rem_port::que_events(P_EVENT * stuff, PACKET* sendL)
> {
> ...
>
> Rvnt* event;
> for (event = rdb->rdb_events; event; event = event->rvnt_next)
> {
> if (!event->rvnt_iface)
> {
> event->rvnt_destroyed = 0;
> break;
> }
> }
>
> if (!event)
> {
> event = FB_NEW Rvnt;
> #ifdef DEBUG_REMOTE_MEMORY
> printf("que_events(server)        allocate event   %x\n", event);
> #endif
> event->rvnt_next = rdb->rdb_events;
> rdb->rdb_events = event;
> event->rvnt_callback = FB_NEW Callback(rdb, event);
> }
> --------
>
> This code runs concurrently and find the same empty slot for
> simultaneous events being registered.

In the specific example, Jaybird uses a single connection to register
for events, and it takes a lock on the database handle, so there
wouldn't be concurrent que_events invocations on that connection. Is it
possible that a similar problem occurs when que_events call interleaves
with an even notification being posted back to the client?

Mark
--
Mark Rotteveel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Adriano dos Santos Fernandes-3
On 10/04/2017 06:17, Mark Rotteveel wrote:

>
>> This code runs concurrently and find the same empty slot for
>> simultaneous events being registered.
> In the specific example, Jaybird uses a single connection to register
> for events, and it takes a lock on the database handle, so there
> wouldn't be concurrent que_events invocations on that connection. Is it
> possible that a similar problem occurs when que_events call interleaves
> with an even notification being posted back to the client?
>
>
You are right, the Jaybird code syncs the op_que_event send/response.

However, server is reading/writing same data in multiple threads without
using concurrency primitives.

So they may not be accessing "simultaneously", but caching may have bad
tricks in this condition.


Adriano


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Concurrency bugs in posting events?

Adriano dos Santos Fernandes-3
Here is a better description of the problem and a possible fix that Vlad
can use and discard, as he surely better understand this part.

Sorry for some imprecision in my previous description.

Jaybird synchronized the op_queue_events send/response.

Remote queue_events synchronizes with the *sync* port lock.

When event callback is called by the engine, before delivery the event,
remote marks it as dead (event->rvnt_iface = NULL) but using the *async*
port lock.

Even the different locks (sync / async) is not the underling problem by
the way events are marked and verified as dead.

But after event is marked as dead, que_events can concurrently reuse it,
and reuse its rvnt_id.

The problem is that the callback reads rvnt_id after mark it as dead.

So here is my proposed patch.


Adriano


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel

event-race1.diff (621 bytes) Download Attachment
12
Loading...