April 29, 2024 By Ali LeClerc 3 min read

VeloxCon 2024, the premier developer conference that is dedicated to the Velox open-source project, brought together industry leaders, engineers, and enthusiasts to explore the latest advancements and collaborative efforts shaping the future of data management. Hosted by IBM® in partnership with Meta, VeloxCon showcased the latest innovation in Velox including project roadmap, Prestissimo (Presto-on-Velox), Gluten (Spark-on-Velox), hardware acceleration, and much more.

An overview of Velox

Velox is a unified execution engine that is built and open-sourced by Meta, aimed at accelerating data management systems and streamlining their development. One of the biggest benefits of Velox is that it consolidates and unifies data management systems so you don’t need to keep rewriting the engine. Today Velox is in various stages of integration with several data systems including Presto (Prestissimo), Spark (Gluten), PyTorch (TorchArrow), and Apache Arrow. You can read more about why Velox was built in Meta’s engineering blog.

Velox at IBM

Presto is the engine for watsonx.data, IBM’s open data lakehouse platform. Over the last year, we’ve been working hard on advancing Velox for Presto – Prestissimo – at IBM. Presto Java workers are being replaced by a C++ process based on Velox. We now have several committers to the Prestissimo project and continue to partner closely with Meta as we work on building Presto 2.0.

Some of the key benefits of Prestissimo include:

  • Hugh performance boost: query processing can be done with much smaller clusters
  • No performance cliffs: no Java processes, JVM, or garbage collections, as memory arbitration improves efficiency
  • Easier to build and operate at scale: Velox gives you reusable and extensible primitives across data engines (like Spark)

This year, we plan to do even more with Prestissimo including:

  • The Iceberg reader
  • Production readiness (metrics collection with Prometheus)
  • New Velox system implementation
  • TPC-DS benchmark runs

VeloxCon 2024

We worked closely with Meta to organize VeloxCon 2024, and it was a fantastic community event. We heard speakers from Meta, IBM, Pinterest, Intel, Microsoft, and others share what they’re working on and their vision for Velox over two dynamic days.

Day 1 highlights

The conference kicked off with sessions from Meta including Amit Purohit reaffirming Meta’s commitment to open source and community collaboration. Pedro Pedreira, alongside Manos Karpathiotakis and Deblina Gupta, delved into the concept of composability in data management, showcasing Velox’s versatility and its alignment with Arrow.

Amit Dutta of Meta explored Prestissimo’s batch efficiency at Meta, shedding light on the advancements made in optimizing data processing workflows. Remus Lazar, VP Data & AI Software at IBM presented Velox’s journey within IBM and vision for its future. Aditi Pandit of IBM followed with insights into Prestissimo’s integration at IBM, highlighting feature enhancements and future plans.

The afternoon sessions were equally insightful, with Jimmy Lu of Meta unveiling the latest optimizations and features in Velox. While Binwei Yang of Intel discussed the integration of Velox with the Apache Gluten project, emphasizing its global impact. Engineers from Pinterest and Microsoft shared their experiences of unlocking data query performance by using Velox and Gluten, showcasing tangible performance gains.

The day concluded with sessions from Meta on Velox’s memory management by Xiaoxuan Meng and a glimpse into the new simple aggregation function interface that was presented by Wei He.

Day 2 highlights

The second day began with a keynote from Orri Erling, co-creator of Velox. He shared insights into Velox Wave and Accelerators, showcasing its potential for acceleration. Krishna Maheshwari from NeuroBlade highlighted their collaboration with the Velox community, introducing NeuroBlade’s SPU (SQL Processing Unit) and its transformative impact on Velox’s computational speed and efficiency.

Sergei Lewis from Rivos explored the potential of offloading work to accelerators to enhance Velox’s pipeline performance. William Malpica and Amin Aramoon from Voltron Data introduced Theseus, a composable, scalable, distributed data analytics engine, using Velox as a CPU backend.

Yoav Helfman from Meta unveiled Nimble, a cutting-edge columnar file format that is designed to enhance data storage and retrieval. Pedro Pedreira and Sridhar Anumandla from Meta elaborated on Velox’s new technical governance model, emphasizing its importance in guiding the project’s development sustainability.

The day also featured sessions on Velox’s I/O optimizations by Deepak Majeti from IBM, strategies for safeguarding against Out-Of-Memory (OOM) kills by Vikram Joshi from ComputeAI, and a hands-on demo on debugging Velox applications by Deepak Majeti.

What’s next with Velox

VeloxCon 2024 was a testament to the vibrant ecosystem surrounding the Velox project, showcasing groundbreaking innovations and fostering collaboration among industry leaders and developers alike. The conference provided attendees with valuable insights, practical knowledge, and networking opportunities, solidifying Velox’s position as a leading open source project in the data management ecosystem.

If you’re interested in learning more and joining the Velox community, here are some resources to get started:

Stay tuned for more updates and developments from the Velox community, as we continue to push the boundaries of data management and accelerate innovation together.

Try Presto with a free trial of watsonx.data
Was this article helpful?
YesNo

More from Analytics

In preview now: IBM watsonx BI Assistant is your AI-powered business analyst and advisor

3 min read - The business intelligence (BI) software market is projected to surge to USD 27.9 billion by 2027, yet only 30% of employees use these tools for decision-making. This gap between investment and usage highlights a significant missed opportunity. The primary hurdle in adopting BI tools is their complexity. Traditional BI tools, while powerful, are often too complex and slow for effective decision-making. Business decision-makers need insights tailored to their specific business contexts, not complex dashboards that are difficult to navigate. Organizations…

IBM unveils Data Product Hub to enable organization-wide data sharing and discovery

2 min read - Today, IBM announces Data Product Hub, a data sharing solution which will be generally available in June 2024 to help accelerate enterprises’ data-driven outcomes by streamlining data sharing between internal data producers and data consumers. Often, organizations want to derive value from their data but are hindered by it being inaccessible, sprawled across different sources and tools, and hard to interpret and consume. Current approaches to managing data requests require manual data transformation and delivery, which can be time-consuming and…

A new era in BI: Overcoming low adoption to make smart decisions accessible for all

5 min read - Organizations today are both empowered and overwhelmed by data. This paradox lies at the heart of modern business strategy: while there's an unprecedented amount of data available, unlocking actionable insights requires more than access to numbers. The push to enhance productivity, use resources wisely, and boost sustainability through data-driven decision-making is stronger than ever. Yet, the low adoption rates of business intelligence (BI) tools present a significant hurdle. According to Gartner, although the number of employees that use analytics and…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters