Vocabulary: 25970
Docs: 1000
Tokens: 106776
Topics: 1000
cluster has 20 servers, each server has 8 core cpub, 48GB mem.
when set --psCount 20, lda work well
set --psCount 40 , lda also work well
but try to set --psCount 60, some tasks of Parameter server jop will failure.
log as following:
java.lang.NegativeArraySizeException
Job aborted due to stage failure: Task 119 in stage 4.0 failed 4 times, most recent failure: Lost task 119.3 in stage 4.0 (TID 147, node-26): java.lang.NegativeArraySizeException
at com.intel.distml.util.store.IntArrayStore.init(IntArrayStore.java:31)
at com.intel.distml.util.DataStore.createStore(DataStore.java:56)
at com.intel.distml.util.DataStore.createStores(DataStore.java:44)
at com.intel.distml.platform.ParamServerDriver.paramServerTask(ParamServerDriver.scala:44)
at com.intel.distml.platform.ParamServerDriver$$anonfun$run$3.apply(ParamServerDriver.scala:75)
at com.intel.distml.platform.ParamServerDriver$$anonfun$run$3.apply(ParamServerDriver.scala:75)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:285)
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace: