Thursday, April 19, 2018

Apache Spark: Performance Tuning Notes

I've been working with Apache Spark for the past 2 years. I'm building a data processing engine instead of just running the kinds of jobs that show up in all  the examples and tutorials. The basics of the app are something like this:
    1. read a batch of messages from kafka
    2. make a REST call to a server to register information about each message in the batch
    3. download a file
    4. upload the file to a new location
    5. make another REST call to a server indicating the message was processed
    6. store metrics about everything in openTSDB

Originally everything happened serially. This worked ok, since Spark was feeding work out across multiple executors. However, with a little extra work I was able to drastically improve performance.

First, I made sure that steps 2 and 5 above were changed so that they used an ExecutorService to queue up all the requests and then I waited on the futures so that processing didn't happen out of order. This made it so that if I was going to make 10 separate REST calls they would all be waiting at about the same time instead of one after the other. I'm going to do something very similar for steps 3 and 4.
Next, I have a separate thread storing the metrics in openTSDB. None of this needs to happen in sync with anything else.
Finally, the most important change I made was to the Spark configuration. After some research I figured out how the following settings work in Spark.
    spark.task.cpus
    spark.num.executors
    spark.executor.cores
    spark.executor.total.cores
There are lots of other settings I used but these are the ones that helped improve performance the best.

    spark.executor.total.cores - This controls the total number of cores that will be assigned to your spark application. This is important if you are running in a cluster where you can't just use everything. Since I'm running in a shared mesos environment I can't request all available resources for my long running spark application.

    spark.executor.cores - This controls the number of cores that each executor will be assigned.

    spark.num.executors - This controls the number of executors you want to have running. This number multiplied by the spark.executor.cores should equal the number set in spark.executor.total.cores.

    spark.task.cpus - This number lets Spark know how many cpus you think each task will use. Since my tasks are mostly IO bound, with multiple REST calls and downloads and uploads I assigned 1 cpu for each task.

By assigning each task 1 cpu, and then making sure I had 6 executors with at least 2 cores each I was able to get each batch to process 12 tasks at a time. By processing that many or more concurrently I was able to reduce the processing time for large batches (2000+ records) from hours to minutes.

Wednesday, March 11, 2015

Mockito RuntimeException: test should never throw an exception to this level

So I was writing some tests with junit and mockito and started getting the following error after a few tests:

java.lang.RuntimeException: test should never throw an exception to this level
    at org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:97)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.executeTest(PowerMockJUnit44RunnerDelegateImpl.ja
va:294)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTestInSuper(PowerMockJUnit47RunnerDelegate
Impl.java:127)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTest(PowerMockJUnit47RunnerDelegateImpl.ja
va:82)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runBeforesThenTestThenAfters(PowerMockJUnit44Runn
erDelegateImpl.java:282)
    at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:84)
    at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:49)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.invokeTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:207)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.runMethods(PowerMockJUnit44RunnerDelegateImpl.java:146)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$1.run(PowerMockJUnit44RunnerDelegateImpl.java:120)
    at org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:34)
    at org.junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:44)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.run(PowerMockJUnit44RunnerDelegateImpl.java:122)
    at org.powermock.modules.junit4.common.internal.impl.JUnit4TestSuiteChunkerImpl.run(JUnit4TestSuiteChunkerImpl.java:106)
    at org.powermock.modules.junit4.common.internal.impl.AbstractCommonPowerMockRunner.run(AbstractCommonPowerMockRunner.java:53)
    at org.powermock.modules.junit4.PowerMockRunner.run(PowerMockRunner.java:59)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)

After doing some googleing I found that there aren't a lot of useful resources for figuring out why I'm seeing this particular error.
I decided to try and narrow it down. I removed all tests but the one that was throwing this error. Ran the tests and everything passed as expected. Great, that wasn't much help, so I started adding tests back in one at a time and found that it was a combination of two tests that would make things start failing. So after comparing the tests I realized what was wrong. I'm mocking a few classes on the test class level, which is fine as long as the mockito.when()'s don't overlap. The two tests that caused the failure happened to have a call to mockito.when() on the same mock and function. So to fix my problems all I had to do was call a mockito.reset() on the mocks and now all my tests run and I don't get the RunTimeException anymore.

Thursday, February 26, 2015

PowerMockito: java.lang.RuntimeException: PowerMock internal error

java.lang.RuntimeException: PowerMock internal error
    at org.powermock.api.mockito.internal.expectation.DelegatingToConstructorsOngoingStubbing$InvokeStubMethod.invoke(DelegatingToConstructorsOngoingStubbing.java:133)
    at org.powermock.api.mockito.internal.expectation.DelegatingToConstructorsOngoingStubbing.thenThrow(DelegatingToConstructorsOngoingStubbing.java:71)


I was working on some tests for a project that required the use of PowerMock and Mockito when in one of my tests I started getting the above error.

The test code looks something like this:
@Test 
public void testThrowIOException() throws Exception
{

    File file = mock(File.class);
    when(file.getAbsolutePath()).thenReturn("name.gz");
    IOException exception = new IOException("test message");
       PowerMockito.whenNew(FileInputStream.class).withAnyArguments().thenThrow(exception);
       
    try
    {

        Service service = new Service();
        service.method(file);
        fail("expected exception");
    }
    catch(Exception e)
    {
        assertTrue(e.getMessage().contains("test message"));
    }

}

After banging my head on the wall and after several hundred google searches which resulted in nothing I decided to just start tweaking things. The first thing I tried was to change the line
PowerMockito.whenNew(FileInputStream.class).withAnyArguments().thenThrow(exception);
to
PowerMockito.whenNew(FileInputStream.class).withArguments(file).thenThrow(exception);
I did that since the code I'm testing does actually take an argument, I originally thought that since for the tests sake it didn't matter what the argument was I would use the AnyArguments method, but for some reason changing it to the method that takes arguments fixed my problem and let my tests complete successfully.

Unfortunately I'm not familiar enough with the underlying code to know or easily figure out why the first version caused problems but I think it's obvious that something about how the mock was created without the argument caused the exception and creating the mock with the argument addressed that issue.

Tuesday, September 9, 2014

Plugins and Plugin Security Basics

So I thought it would be fun to create a simple plugin ..... framework? (framework sounds too big for what I did) Regardless, I wanted to play around with dynamically loading classes into the JVM and running code in them just to see what's involved.

So to start you need to create an interface for the plugin classes to implement. This is pretty simple, I'm using Java 8 so my interface also has a default method that controls how the plugin functions. Here is the interface:

public interface Plugin
{
default public void run() {
setup();
performAction();
tearDown();
}
public void setup();
public void tearDown();
public void performAction();
}

So the implementing classes just have to define any setup code needed and then do something in the performAction method. Obviously this is very simple and doesn't lend it self to very dynamic or extensive plugins but it let me learn what I wanted and gets us started.

Now the interesting code, we have to dynamically load code from JAR's (or just class files, doesn't matter), this function will search the "." directory for any .jar files and load any classes in the jar file and if they are instances of our Plugin interface the class will get stored in a list that gets returned:

public List<Plugin> loadPlugins() {

   private List<Plugin> plugins = new ArrayList<Plugin>();

   FileFilter jarFilter = new FileFilter() {
@Override
public boolean accept(File pathname) {
return pathname.getName().toUpperCase().endsWith("JAR");
}
   };

   File filePath = new File(".");
   File[] files = filePath.listFiles(jarFilter);
   List<File> filesList = Arrays.stream(files).collect(Collectors.toList());
   int size = filesList.size();

   for(int i = 0; i < size; i++) {

File file = filesList.get(i);

if(file.isFile()) {
try {
List<String> classNames = new ArrayList<String>();
ZipInputStream zip = null;
try {
zip = new ZipInputStream(new FileInputStream(file.getAbsolutePath()));

for (ZipEntry entry = zip.getNextEntry(); entry != null; entry =                    
                      zip.getNextEntry()) {

if (entry.getName().endsWith(".class") && !entry.isDirectory()) {
classNames.add(entry.getName().substring(0,entry.getName().length()-
                         ".class".length()));
}
}
} finally {
zip.close();
}

URL[] urls = {new URL("jar:file:" + file.getAbsolutePath() + "!/")};
MyLoader loader = null;
try {
loader = new MyLoader(urls, this.getClass().getClassLoader());
try {
for(String className : classNames) {

className = className.replace("/",".");
Class cls = loader.loadClass(className);

Object classInstance = cls.newInstance();
if(classInstance instanceof Plugin) {

plugins.add((Plugin)classInstance);
}
}
} catch(ClassNotFoundException e) {
e.printStackTrace();
} catch(IllegalAccessException | InstantiationException e) {
e.printStackTrace();
}
} finally {
loader.close();
}
} catch(FileNotFoundException e) {
e.printStackTrace();
} catch(IOException e) {
e.printStackTrace();
}
}
   }

   return plugins;
}

Some interesting parts to point out include the fact that when loading a ZipInputStream you can get entries from it and just check the extension on each entry, that's almost too easy!
The cool part here is the loader, this object takes those class entries from the Jar file and loads them into the JVM so that we can run code on them, you can see we call cls.newInstance() which functions the same as a call to new Object().
The MyLoader class is pretty simple, it just extends the URLClassLoader and loads classes from the jar file. It tries to load the class locally, if that fails, it checks if the class has already been loaded and if that fails it falls back to loading the class normally. (This is code I found on someone elses project, check it out at https://github.com/decebals/pf4j/)

private class MyLoader extends URLClassLoader {
 
public MyLoader(URL[] urls, ClassLoader parent) {

super(urls, parent);
}

/**
    * This implementation of loadClass uses a child first delegation model
    * rather than the standard parent first.
   * If the requested class cannot be found in this class loader, the parent
    * class loader will be consulted
   * via the standard ClassLoader.loadClass(String) mechanism.
   */
@Override
   public Class<?> loadClass(String className) throws ClassNotFoundException {
       
       try {
           return getClass().getClassLoader().loadClass(className);
       } catch (ClassNotFoundException e) {
       }
 
       // second check whether it's already been loaded
       Class<?> clazz = findLoadedClass(className);
       if (clazz != null) {
            return clazz;
       }

       // nope, try to load locally
       try {
           clazz = findClass(className);
           return clazz;
       } catch (ClassNotFoundException e) {
        // try next step
       }

       // use the standard URLClassLoader (which follows normal parent delegation)
       return super.loadClass(className);
}

public void close() throws IOException
{
super.close();
}
}

So now that we have the Plugin classes loaded into the running JVM we can actually do something with them. For my simple purposes all I did was use a for loop to execute the run method.

PluginManager pm = new PluginManager();
List<Plugin> plugins = pm.loadPlugins();
m.setAllowAll(false);
for(Plugin plugin : plugins) {

plugin.run();
}

Done, simple plugin framework!

So, there are some interesting things to keep in mind at this point. First off you need to realize that each plugin loaded has full trusted code permissions. Since we haven't done anything to explicitly limit the permissions the plugins have they have all permissions the rest of the code has. Who cares? Well if this were some kind of external system where we let anyone create and add plugins then they could create a plugin that could load other classes, access file systems, shut down the JVM, access databases, really anything that your trusted code can do. For example I created a simple plugin that when run will shut down the JVM running the PluginManager.

public class PluginTest implements Plugin{

@Override
public void setup() {
try {
SecurityManager m = System.getSecurityManager();
if(m != null) {
m.checkPermission(new RuntimePermission("createClassLoader"));
} else {
System.out.println("no security manager");
          }
System.out.println("I can create class loaders");
} catch (Exception e) {

System.out.println("Unable to create class loaders in unsecured code");
}

try {
SecurityManager m = System.getSecurityManager();
if(m != null) {
m.checkPermission(new RuntimePermission("exitVM.{3}"));
} else {
System.out.println("no security manager");
}
System.out.println("I can shut down the system");
} catch (Exception e) {

System.out.println("Unable to exit in unsecured code");
}

System.out.println("Setup Plugin 1");

   }

@Override
public void tearDown(){
try {
SecurityManager m = System.getSecurityManager();
if(m != null) {
m.checkPermission(new RuntimePermission("exitVM.{3}"));
} else {
System.out.println("no security manager");
}
System.out.println("system shut down started");
System.exit(3);
} catch (Exception e) {

System.out.println("Unable to exit in unsecured code");
}
System.out.println("Tear Down Plugin 1");

}

@Override
public void performAction() {
System.out.println("Plugin ACTION!!!!");
}
}

You can see I check for a Security Manager and see if I have permissions to two different risky methods and then in the tearDown method I actually try to shut down the JVM. All the Try/Catches are to make sure the plugin keeps running in case it doesn't have permissions to do it's dastardly deeds.

So what can we do to protect our trusted code from being hacked by this malicious plugin? Well for starters we can add a Security Manager to the running JVM before any plugins are run. There are some problems with this. First you can only set the Security Manager one time. So we can't add a Security Manager before running the plugins and then remove it when we are done. We don't want to just add one at the beginning that has all permissions locked down or we severely limit what our trusted code can do. So really we want to turn up the security before running the plugins then turn it back down once we've ensured all the plugins are finished running. Some limits need to be considered with threads, a malicious plugin could try to start a thread that waits until it has permission before doing something. I'm not going to address this or many other side cases, just the basics here. Perhaps in a future post I'll dive in a bit deeper.
So for my purposes I created a simple Security Manager that allows my to turn on and off all permissions.

public class MySecurityManager extends SecurityManager {
private boolean allowAll = false;

public void setAllowAll(boolean value) {
allowAll = value;
}

@Override
public void checkPermission(Permission perm) {
if(!allowAll) {

super.checkPermission(perm);
}
}
}

Now my code that runs the plugins changes to the following:

MySecurityManager m = new MySecurityManager();
System.setSecurityManager(m);

m.setAllowAll(true);

PluginManager pm = new PluginManager();
List<Plugin> plugins = pm.loadPlugins();

m.setAllowAll(false);
for(Plugin plugin : plugins) {

plugin.run();
}
m.setAllowAll(true);

Now the plugins have no security permissions while they are running, they can't access the file system, they can't call methods like exit, they can't do anything I setup my Security Manager to prevent. The Java API lists the various available permissions that the Security Manager handles.
(http://docs.oracle.com/javase/7/docs/api/java/lang/SecurityManager.html SecurityPermission and RuntimePermission list lots of interesting permissions.)

Find a slightly refactored version of the complete project at https://github.com/hardyc3/SimplePluginFramework

Friday, February 21, 2014

High Performance Web Sites by Steve Souders

This was another great book I read last year. It covers several ways to get better performance from your website without optimizing your code. Surprisingly, as Steve shows, you get better performance gains by making these non-code related tweaks than you do by optimizing code.

Find below my summary of the chapters and what I thought was important, for his own summaries go to his blog at http://stevesouders.com/efws/blogposts.php


Chapter A:
%80-%90 of response time is because of front end problems not back end ones.

Chapter B:

Chapter 1: Make fewer http requests
There are several things that can be done here.
-Combine images and use image maps or css sprites so that for a page only one image is downloaded instead of multiple small images.
-Combine javascript into one minified file
-Combine CSS into one file.
The majority of a pages download time is spent downloading images, scripts and stylesheets so by limiting the number of files a page has to download you eliminate expensive overhead caused by multiple HTTP requests for the individual files.

Chapter 2: Use a CDN
A CDN is a server specifically for static content, these servers should be located closer to users, since they only host the static content, fewer are required to serve the same load. Page load times decrease because the data is closer to the user.

Chapter 3: Add Expires Headers
Adding a max-age or expires header tells the browser how long the file should be cached for. On Apache you can set a default that will be used for all files of a certain type,
<FilesMatch “\.(gif|jpg|js|css)$”> ExpiresDefault “access plus 2 months” </FilesMatch>

Chapter 4: Use gZip
Compress all html, scripts and css. Don’t compress images or pdfs because they are already compressed and compressing them again could increase file size. Configure apache to compress automatically if file sizes are greater than 1-2k. The improvements seen will depend on the size of the file, the users connection speed, and the distance the packets have to travel. gZipping does add load to the server.
Apache 2.x uses the mod_deflate module. Use AddOutputFilterByType DEFLATE text/html text/css application/x-javascript to compress html, css and js files.

Chapter 5: Put stylesheets at the top
All stylesheets should be in the HTML <Head> </head> tags, if they aren’t then you risk one of two problems. Either the user won’t see any of the page until everything is downloaded which makes the page look like it’s frozen, or you show the user portions of the page before the style is downloaded, so they see it unstyled and then styled when the css is downloaded, both are bad user experiences. If style sheets are kept in the <head> tag then the page can be loaded progressively and styled correctly.

Chapter 6: Put scripts at the bottom
Scripts block parallel downloading so having a script at the top or middle of the page causes the entire page to wait for it to download. Putting scripts at the bottom of the page lets all the content that can be downloaded parallel to finish before the scripts start, that way most if not all of the content on the page is displayed before the scripts start downloading. Sometimes you can’t move a script because the layout of the page depends on it. If you can move it, you should.

Chapter 7: Avoid CSS Expressions
If you have to use an expression, have it call a javascript function that overwrites the expression so it is only evaluated once. Expressions can be evaluated thousands of times if they aren’t overwritten.

Chapter 8: Make JavaScript and CSS external
Move javascript into external files so that the files can be cached which will reduce the number of http requests for primed caches. Using the future expires with this will help speed things up even more because a user’s cache will stay primed longer. Inlining javascript will make for fewer http requests but no caching will be possible, and the page size will be bigger.
If you have to inline, you could try dynamically inlinning. This is where in your php/jsp/.net code you copy the js/css file contents into a <script>/<style> tag when the page is requested and then write some javascript that will download the files separately after the page has loaded, then you set a cookie and in the jsp/php/asp file you check for that cookie and only inline the javascript if the cookie doesn’t exist. This way the user gets the files when they aren’t doing anything else anyway, but the next time they hit the page they won’t have to download the javascript again, resulting in faster page loads and fewer http requests when users access the same page multiple times.

Chapter 9: Reduce DNS Lookups
Have fewer domains to go to on a page, try to put everything on the same domain so it only has to do one lookup.
Use keep-alive so more data can be retrieved on a connection

Chapter 10: Minify JavaScript
Gzip combined with JSMin can reduce javascript files sizes by almost 80%
JSMin probably has a version that will work with jsps so we can minify inline scripts

Chapter 11: Avoid Redirects
Instead of using server redirects, server aliases, mod_rewrite and DirectorySlash can be used to accomplish the same task.
Programming to the root instead of the current folder helps because then urls like something.com/something don’t have to be changed to something.com/something/
To track web page use, instead of using redirects you can use referrer logging, where you log the referrer site for all traffic.

Chapter 12: Remove duplicate scripts
If you have the same script included twice IE will download it twice and won’t allow it to be cached, increasing page load time every time.
Functions in duplicated scripts will be executed as many times as the script is duplicated. So slower run times, and potentially invalid results.

Chapter 13: ETags
Make sure etags are set up right. Etags allow you to give content a unique id, however they don’t work across domains so multiple server setups probably shouldn’t use them.

Chapter 14: Make Ajax Cacheable
Use query strings when making ajax requests and make sure they have far future expire dates so that ajax requests that do the same thing each time don’t have to go all the way to the server to get that data. Some obviously can’t be cached because they return different data each time.



Use the YSlow firebug plug in to get help improving page load times.

Java Threads by Scott Oaks

This was a great book, I highly recommend it for anyone who wants a thorough introduction to threads, how to use them and the terminology. If you don't have time to read the whole book then here is my summary of the main points made in each chapter:

Chapter 1 Intro:

Chapter 2:
Thread: separate task within a running application. Every app has at least the main thread which is where most or all of the code is run, many applications have several threads that the dev might not know about for things like a GUI.
Lifecycle: call start, start creates a new thread which executes the run method, when run finishes the thread is shut down, so now there are only the threads running that were running before start was called.  Most common ways to stop a thread are a flag and the interrupt method.

Chapter 3 Data Synchronization:
When two different threads could access the same method declare the method synchronized and then the JVM will manage it for you by only allowing one thread in at a time.
What if however you have a flag that stops the run method, if you declared the run method and the method that set the flag as synchronized then you could never set the flag. So there are other ways, you could synchronize just the section of code that checks the flag, or you could declare the flag as volatile. A variable that is volatile will always be loaded, instead of getting used from a cache, synchronized getters and setters have the same effect, but volatile removes the boilerplate. Also in java reading and writing variables is atomic except for long and double.
JVM scheduling changes can occur at any time, atomic operations are the only thing guaranteed to finish before the scheduler swaps out one thread for another.
To make a block of code synchronized there are a few ways to do that: use classes that implement the Lock interface, use the lock() and unlock() methods in a try/finally. The synchronized keyword can be used on a block of code, you just need to provide an object to synchronize on, usually this is fine, unless there is a specific object the block uses, then use that object. Make the scope as small as possible.
Deadlock: two or more threads waiting on the same locks to be freed, and the circumstances of the program are such that the locks will never be freed.

Chapter 4 Thread Notification:
Wait: must be used in a synchronized block, when called it frees the lock and when it finishes it reaquires the lock.
Notify: must be used in a synchronized block, when called it notifies the any threads waiting that the condition has occurred. No guarantees on which thread will be notified, could be any.
NotifyAll: makes sure all waiting threads get the notification. Wakes up all threads, but they still have to wait for the lock, so they won’t run in parallel.
Notify and Notify all have no specific condition they notify threads about, when a notify() or notifyAll() is called all threads wake up and check to see if they can proceed, because of this, each thread that called wait() needs to have it in a loop so they can continue waiting if the notify wasn’t for them.
You can create a lock object, Object someObj = new Object() and use that for synchronization and waiting and notifications. This allows code to be more parallel so other parts of an object can be used while some part is waiting on a notification, without this the whole object is shut down while one part waits on the notification.
If you use lock objects for synchronization then you have to use Condition objects for wait and notify, because the lock object already overrides the object.wait and object.notify methods to implement the lock.

Chapter 5 Minimal Synchronization Techniques:
Not understanding synchronization can lead to slower code than single threaded code. A lock that is highly contended will slow down the app because many threads get stuck waiting at the same spot.
Try to minimize the scope of synchronization blocks.
Use atomic classes to reduce the need for synchronization. AtomicInteger, AtomicLong, AtomicBoolean, AtomicReference, AtomicIntegerArray, AtomicLongArray, AtmoicReferenceArray, AtomicIntegerFieldUpdater, AtomicLongFieldUpdater, AtomicReferenceFieldUpdater, AtomicMarkableReference, and AtomicStampedReference. The int, long, Boolean and reference classes work just like regular int, long, Boolean and ref classes do, they are just atomic. The array classes provide atomic access to one element. The field updated classes allow you to access a variable atomically that wasn’t declared atomic and you can’t change to atomic. Use the newUpdater() method on the fieldupdater classes. The reference class lets you take some non-atomic object and treat it as atomic.
The purpose of synchronization is not to prevent all race conditions; it is to prevent problem race conditions.
There are trade-offs with minimal synchronization, it might remove slowness caused by synchronization but it makes the code harder to read and maintain.
The java memory model usually puts variables in a register when a method is loaded to continue running, declaring a variable volatile makes the jvm read that value directly instead of loading it into a register first. This is important because if the code reads it out of a local variable stored in a register instead of directly then the value isn’t shared between threads and changes are local to each thread.

Chapter 6 Advanced Synchronization Topics:
Semaphore: lock with a counter. If the permit limit is set to 1, then it’s just like a lock, if it’s permit limit is set to more than one, then it lets that number of threads in before locking.
Barrier: some point where all threads must meet so results can be combined. Conditions, or wait and notify do almost the same thing.
Countdown Latch: lets you set a countdown counter, when it reaches zero all waiting threads are released.
Exchanger: class that lets two threads meet and exchange data, more like a datastructure.
Reader/Writer Locks: lock that allows multiple reads but only a single write.
Deadlock: When two or more threads are waiting on conflicting conditions. Best defense, create a lock hierarchy when designing the program. It’s difficult to debug deadlock because there could be multiple layers reaching a lock or requesting a lock and it’s not always easy to know which one is causing it. The book has a class which can be used to replace all calls to synchronized and any other java lock and the class will report errors when a deadlock condition happens, slow, not good for production but good for testing.
Lock Starvation: This happens when multiple threads contend for one lock and one or more threads never get scheduled when they can acquire the lock. Can be fixed with a fair lock which makes sure each thread gets it eventually.
Reader/Writer lock starvation: happens if the readers aren’t prevented from acquiring the lock when a writer wants it. If readers just keep getting the lock even when a writer is waiting then the writer could never get its turn.

Chapter 7
Skipped because it only dealt with GUIs and Swing

Chapter 8 Threads and Collection Classes:
Thread safe collections: Vector, Stack, Hashtable, ConcurrentHashMap, CopyOnWriteArrayList, CopyOnWriteArraySet, ConcurrentLinkedQueue
Just because those classes are thread safe, doesn’t mean they can be used in any threaded application.
There are two cases for managing synchronization with collection classes, in the collection or in your program.
The easy case is to let the collection manage it by either using a thread safe collection, or creating a wrapper for the collection or by using the Collections.synchronized<Collection Type> methods.
The harder case is when you need to do more than one thing on the collection atomically, like a get, and update. In this case we have to use synchronized blocks.
Thread notification classes, these classes can be used with threads and simplify the usage of collections by providing methods to handle out-of-space and out-of-data errors: ArrayBlockingQueue, LinkedBlockingQueue, SynchronousQueue, PriorityBlockingQueue, DelayQueue.
When using iterators you have to consider carefully what will happen, either synchronize the object and the block that uses the iterator or use a class that makes a copy of the collection, the copy method can lead to race conditions so only use it if you don’t care that the data might be slightly out of date.
Producer/consumer pattern: specific threads create data and different threads use the data, more separation, less concerns with race conditions.

Chapter 9 Thread Scheduling:
The JVM or System has to manage which threads are running on the CPU when there are more threads than CPUs. There is no specification on how the JVM needs to manage the threads so different JVM’s may manage them differently. The only requirement is that the JVM implement some kind of priority based scheduling, this allows the developer to give threads different priorities so higher priority threads will get run more often.
Preemption: higher priority threads take control of the CPU from lower priority threads and get to do their work before the lower priority threads get scheduled again.
Time-slicing: Threads of the same priority get to run on the CPU for a short amount of time before another thread of the same priority gets to run. Kind of like kids in line for a slide, the kids get to ride the slide one at a time, but just because they went down once doesn’t mean they are done, they may get back in line to ride again.
Priority Inversion: When a low priority thread has a lock that a high priority thread is waiting on, the lower priority thread’s priority is temporarily changed to be the same as the high priority thread so that it can do its thing and release the lock its holding.
Complex Priorities: OS’s usually do something more to deal with thread priorities, like they may add the time waiting to the threads priority so that low priority threads eventually get a turn to work even though other higher priority threads are still waiting.
Thread priorities in Java are really just suggestions to the JVM. It doesn’t guarantee anything to set one thread as max priority and another as min. It will most of the time make a difference but there are no guarantees.
Java has various system depending threading models. The Green model, is all managed by the JVM, in this case threads are and idea that don’t extend past the single thread the JVM is running in, so effectively, even though threads are in use, everything is still single threaded. The windows model is one to one, if you create a java thread on a windows JVM the JVM passes that thread onto the os which creates a new thread, this is truly multithreading. Linux is similar to windows. Solaris is quite different, too hard to explain so read up on it if you care.

Chapter 10 Thread Pools:
 A pool can increase the throughput of an application by managing the threads smarter.
You need to have more threads available than CPU’s so that if a thread blocks another one can work in its place.

Chapter 11 Task Scheduling:
Timer class is like a single threaded pool  for tasks that should be run after some amount of time.
TimerTask class needs extended to allow other classes to be scheduled with the timer class. Instances of this class should do checks to make sure it should run, because there are times where it could get scheduled multiple times and not have a chance to run, so when it does run it might be running after another instance that ran and was rescheduled.

Chapter 12 Threads and I/O:
In older versions of Java i/o requests were always blocking, this affects the ability of threads and the system to get work done when entire threads would halt waiting for data. In more recent versions of Java there is a new i/o package which doesn’t block on i/o requests. The new i/o uses a thread to step through all i/o connections and check for ones that are ready, it processes those then goes back to looking for connections that are ready. The main difference between blocking and non-blocking is when you do a read or write in blocking i/o the system will write or read everything before it returns from that call, in non-blocking i/o when you read or write, it will read or write all it can at the time and then in your program you have to account for the times when not all the data is read or written and then you have to handle those cases.

Chapter 13 Miscellaneous Thread Topics:
Thread Groups: every thread created belongs to a group, the default is the main group. You can interrupt all threads  in a group by calling the interrupt method on the group.
Deamon threads: threads that serve other threads, like the garbage collector. When all user threads are closed the deamon threads will close and then the jvm can shut down.
Thread stacks aren’t stored on the heap they are stored on the general jvm memory. So creating new threads could cause an out of memory error if there isn’t enough memory set aside for the jvm.

Chapter 14 Thread Performance:
Java programs are optimized as they are run, so before testing for performance you need to run the code a bunch.
Don’t optimize early, you will make things overly complex and probably won’t gain any performance improvements. Wait until the application is in development and have regular benchmark tests and only optimize when something isn’t within the performance standards.
There is almost no performance improvement for using synchronized collections versus un-sychronized collections when you are in a single threaded environment
Switching from regular variables to atomic variables gives a significant performance boost. Code complexity increases though.
Don’t overuse thread pools, if the application design makes sense with a thread pool use it, otherwise don’t. The performance improvement easily gets lost in the execution of the thread’s content.

Chapter 15 Parallelizing Loops for Multiprocessor Machines:
Parallelize the outer loop, re-write if needed so you can.
Do parallelization where CPU intensive work is happening. Don’t worry about other places. This is usually going to be in some type of loop.
 

Friday, October 11, 2013

Eclipse Content Assist

So last week, as far as I know, no changes were made to my machine, no updates were installed in eclipse and nothing was changed overall on my system, but Friday morning when I started up eclipse the content assist would show an error window and wouldn't assist me anymore with development.
The error says, "The 'org.eclipse.jdt.ui.JavaAllCompletionProposalComputer' proposal computer from the 'org.eclipse.jdt.ui' plug-in did not complete normally. The extension has thrown a runtime exception.
When I look in the .metadata/.log file I see the following stack traces:

!ENTRY org.eclipse.ui 4 0 2013-10-11 11:38:28.802
!MESSAGE Unhandled event loop exception
!STACK 0
java.lang.ClassCastException: org.eclipse.jdt.internal.core.SourceType cannot be cast to java.lang.String
at org.eclipse.jdt.internal.core.JavaModelManager.secondaryTypesSearching(JavaModelManager.java:4579)
at org.eclipse.jdt.internal.core.JavaModelManager.secondaryTypes(JavaModelManager.java:4438)
at org.eclipse.jdt.internal.core.NameLookup.findSecondaryType(NameLookup.java:591)

!ENTRY org.eclipse.jdt.ui 2 0 2013-10-11 11:38:29.270
!MESSAGE The 'org.eclipse.jdt.ui.JavaAllCompletionProposalComputer' proposal computer from the 'org.eclipse.jdt.ui' plug-in did not complete normally. The extension has thrown a runtime exception.
!STACK 0
java.lang.ClassCastException: org.eclipse.jdt.internal.core.SourceType cannot be cast to java.lang.String
at org.eclipse.jdt.internal.core.JavaModelManager.secondaryTypesSearching(JavaModelManager.java:4579)
at org.eclipse.jdt.internal.core.JavaModelManager.secondaryTypes(JavaModelManager.java:4438)
at org.eclipse.jdt.internal.core.NameLookup.findSecondaryType(NameLookup.java:591)
at org.eclipse.jdt.internal.core.NameLookup.findType(NameLookup.java:697)

!ENTRY org.eclipse.jdt.ui 4 2 2013-10-11 11:38:29.425
!MESSAGE Problems occurred when invoking code from plug-in: "org.eclipse.jdt.ui".
!STACK 0
java.lang.ClassCastException: org.eclipse.jdt.internal.core.SourceType cannot be cast to java.lang.String
at org.eclipse.jdt.internal.core.JavaModelManager.secondaryTypesSearching(JavaModelManager.java:4579)
at org.eclipse.jdt.internal.core.JavaModelManager.secondaryTypes(JavaModelManager.java:4438)

!ENTRY org.eclipse.jdt.ui 4 0 2013-10-11 11:38:29.427
!MESSAGE Error in JDT Core during AST creation
!STACK 0
java.lang.ClassCastException: org.eclipse.jdt.internal.core.SourceType cannot be cast to java.lang.String
at org.eclipse.jdt.internal.core.JavaModelManager.secondaryTypesSearching(JavaModelManager.java:4579)
at org.eclipse.jdt.internal.core.JavaModelManager.secondaryTypes(JavaModelManager.java:4438)
at org.eclipse.jdt.internal.core.NameLookup.findSecondaryType(NameLookup.java:591)
at org.eclipse.jdt.internal.core.NameLookup.findType(NameLookup.java:697)


I don't know what they mean and I can't find anything online to solve the problem. The work around I've found is to delete the following folder:
<workspace>/.metadata/.plugins/org.eclipse.jdt.core

Then the next time I start eclipse content assist works.