Networking IO and Threading
Recently I had to write a software component capable of dealing with a non-Java service, it was actually pretty straightforward is I only needed to handle lines of text over a TCP socket. I needed both the send and receive data over the socket, this included commands with replies as well as support for notifications. For the sake of testing I also wrote a small mock server instance.
It's clear that when dealing with incoming data over a network or file system you'll want to wait for the next chunk in another thread, otherwise you'll block the application thread and most of the time this is undesired. Sounds not too hard, I was able to write it in a rather short time frame, tests were all passing ... all was good.
When I ran the tests in another environment I noticed they consistently failed, I quickly realized this was due to the multi-processor architecture the machine was using. I tested on a few other platforms and I could indeed confirm the failures only to occur on multi-processor machines. Mind you, it worked well on Ubuntu Linux using Pentium 4 with hyper-threading but not on Ubuntu using a dual core (Dell D-820).
Since our production environment features multi-processor machines too it was a big must to have this issue fixed, especially since another collegue of mine also wrote some threading code and it showed the same problem.
First I thought it was due to the Maven2 Surefire plugin somehow forking tests in parallel (the error I was getting was saying that the server socket I was trying to create was already in use, the well-known JVM_BIND error), however, that was not the case .. tests were being executing sequentially, the problem was that the code I wrote for both the client-side and server-side threads was not bulletproof, I could see in the log file that some threads would never die.
Now you will probably say "yeah, learn how to work with threads and do it right", and to a certain extent you're right :-) but my case involved a feature which made it a bit special: when a client connection was broken it should automatically attempt to reconnect to it, effectively creating another thread, this required a bit more synchronization and a clear and solid way of starting and stopping threads.
By following the rules of thumb below I was able to easily write stable threading code:
By using the two global variables above and relying on the SocketException when blocked in an IO operation I was sure to be able to handle any situation where the thread would need to be closed. The thread calling the close method would ideally wait for the thread to finish, this can be done using the join method mentioned above. Following these rules your thread will be closed once the close method returns.
I use two variables because I also provide an 'isClosed' method which will only return true when the thread has actually finished completely.
It's clear that when dealing with incoming data over a network or file system you'll want to wait for the next chunk in another thread, otherwise you'll block the application thread and most of the time this is undesired. Sounds not too hard, I was able to write it in a rather short time frame, tests were all passing ... all was good.
When I ran the tests in another environment I noticed they consistently failed, I quickly realized this was due to the multi-processor architecture the machine was using. I tested on a few other platforms and I could indeed confirm the failures only to occur on multi-processor machines. Mind you, it worked well on Ubuntu Linux using Pentium 4 with hyper-threading but not on Ubuntu using a dual core (Dell D-820).
Since our production environment features multi-processor machines too it was a big must to have this issue fixed, especially since another collegue of mine also wrote some threading code and it showed the same problem.
First I thought it was due to the Maven2 Surefire plugin somehow forking tests in parallel (the error I was getting was saying that the server socket I was trying to create was already in use, the well-known JVM_BIND error), however, that was not the case .. tests were being executing sequentially, the problem was that the code I wrote for both the client-side and server-side threads was not bulletproof, I could see in the log file that some threads would never die.
Now you will probably say "yeah, learn how to work with threads and do it right", and to a certain extent you're right :-) but my case involved a feature which made it a bit special: when a client connection was broken it should automatically attempt to reconnect to it, effectively creating another thread, this required a bit more synchronization and a clear and solid way of starting and stopping threads.
By following the rules of thumb below I was able to easily write stable threading code:
- it is good for client threads to rely on the socket throwing a SocketException when blocked in a read operation on the socket's inputstream, when the server socket closes this is what will happen
- have two global variables: 'closing' and 'closed'
- define a method which can close the thread, this method should set 'closing' to true
- at the end of the thread's run method 'closed' must be set to true, take care of any exceptions; synchronize this method
- the thread's run method should check whether 'closing' is set to true, in case it is true the thread should break out of any loop it's currently in
- after calling the thread's close method from another thread (!) it's generally a good idea to also call the join method, this will synchronize this thread with the current one by waiting for it to die
By using the two global variables above and relying on the SocketException when blocked in an IO operation I was sure to be able to handle any situation where the thread would need to be closed. The thread calling the close method would ideally wait for the thread to finish, this can be done using the join method mentioned above. Following these rules your thread will be closed once the close method returns.
I use two variables because I also provide an 'isClosed' method which will only return true when the thread has actually finished completely.



0 Comments:
Post a Comment
<< Home