Dataiku's DSS Increases Scale and Improves Speed of Analytics with Apache Spark

Apache Spark Integration Brings 10 to 100 times faster processing to DSS

September 29, 2015 - New York, NYDataiku Inc, the maker of Data Science Studio (DSS), announced today the integration with the advanced data processing engine, Apache Spark.  By adopting Spark, data analysts can process much larger Hadoop data sets, ranging into the terabytes and also process that information much more quickly.

Paring the capabilities of Apache Spark with the advanced analytics features of DSS creates significant opportunities for those looking to leverage very large data sets.

Visual Recipes, which are a core component of DSS, can now be executed on the Apache Spark framework, while leveraging the SparkSQL programing language and data processing engine. This helps DSS users perform tasks such as joins and aggregations dozens if not hundreds of times faster than what could be accomplished with Hadoop using Apache Hive.

Apache Spark integration also gives DSS the ability to work with Spark R, SparkSQL, and PySpark, which brings R, SQL, and python based programing to the Spark environment. Much like the other components of Spark, PySpark and Spark R eases and speeds the native capabilities found in DSS and makes Spark a viable alternative to the traditional Hadoop/Hive stack, while also allowing analysts to share data engineering recipes and limit the need to recode or redevelop algorithms.

The addition of Apache Spark to the extensive number of datastores already supported by DSS, allows analysts to create large scale big data analytics projects, without the risk of reaching beyond the capabilities offered by data engines currently in use.

Apache Spark will generate an array of advantages:

  • Data Volume: Spark will help DSS deploy intricate algorithms across vast data.
  • Collaboration: PySpark framework eases sharing of cluster resources.
  • Education: DSS unifies interfaces of various frameworks; no more learning of complicated frameworks, languages and dialects.
  • Future Proof: Spark Project enhancements can be rolled into DSS/Spark arena.

About Dataiku

Dataiku develops the unique advanced analytics software solution that enables companies to build and deliver their own data products more efficiently. Its collaborative, team-based user interface works for all profiles, from data scientists to beginner analysts, and the unified framework allows for both development and deployment of data projects.
Addison Huegel, Media Relations
USA: +1 415-315-9629
Germany: +49 (0) 152 152 731 89
Team Members
Countries with Dataiku users

All About Dataiku