Language Selection

English French German Italian Portuguese Spanish

Databricks brings its Delta Lake project to the Linux Foundation

Filed under
Linux

Databricks, the big data analytics service founded by the original developers of Apache Spark, today announced that it is bringing its Delta Lake open-source project for building data lakes to the Linux Foundation and under an open governance model. The company announced the launch of Delta Lake earlier this year and even though it’s still a relatively new project, it has already been adopted by many organizations and has found backing from companies like Intel, Alibaba and Booz Allen Hamilton.

“In 2013, we had a small project where we added SQL to Spark at Databricks […] and donated it to the Apache Foundation,” Databricks CEO and co-founder Ali Ghodsi told me. “Over the years, slowly people have changed how they actually leverage Spark and only in the last year or so it really started to dawn upon us that there’s a new pattern that’s emerging and Spark is being used in a completely different way than maybe we had planned initially.”

This pattern, he said, is that companies are taking all of their data and putting it into data lakes and then do a couple of things with this data, machine learning and data science being the obvious ones. But they are also doing things that are more traditionally associated with data warehouses, like business intelligence and reporting. The term Ghodsi uses for this kind of usage is ‘Lake House.’ More and more, Databricks is seeing that Spark is being used for this purpose and not just to replace Hadoop and doing ETL (extract, transform, load). “This kind of Lake House patterns we’ve seen emerge more and more and we wanted to double down on it.”

Read more

The LF's press release

  • The Delta Lake Project Turns to Linux Foundation to Become the Open Standard for Data Lakes

    Amsterdam and San Francisco, October 16, 2019 – The Linux Foundation, the nonprofit organization enabling mass innovation through open source, today announced that it will host Delta Lake, a project focusing on improving the reliability, quality and performance of data lakes. Delta Lake, announced by Databricks earlier this year, has been adopted by thousands of organizations and has a thriving ecosystem of supporters, including Intel, Alibaba and Booz Allen Hamilton. To further drive adoption and contributions, Delta Lake will become a Linux Foundation project and use an open governance model.

    Every organization aspires to get more value from data through data science, machine learning and analytics, but they are massively hindered by the lack of data reliability within data lakes. Delta Lake addresses data reliability challenges by making transactions ACID compliant enabling concurrent reads and writes. Its schema enforcement capability helps to ensure that the data lake is free of corrupt and not-conformant data. Since its launch in October 2017, Delta Lake has been adopted by over 4,000 organizations and processes over two exabytes of data each month.

Delta Lake finds new home at Linux Foundation

  • Delta Lake finds new home at Linux Foundation

    Databricks used the currently happening Spark + AI Summit Europe to announce a change in the governance of Delta Lake.

    The storage layer was introduced to the public in April 2019 and is now in the process of moving to the Linux Foundation, which also fosters software projects such as the Linux kernel and Kubernetes.

    The new home is meant to drive the adoption of Delta Lake and establish it as a standard for managing big data. Databricks’ cofounder Ali Ghodsi commented the move in a canned statement. “To address organizations’ data challenges we want to ensure this project is open source in the truest form. Through the strength of the Linux Foundation community and contributions, we’re confident that Delta Lake will quickly become the standard for data storage in data lakes.”

Open source Delta Lake project moves to the Linux Foundation

  • Open source Delta Lake project moves to the Linux Foundation

    Databricks Inc.’s Delta Lake today became the latest open-source software project to fall under the banner of the Linux Foundation.

    Delta Lake has rapidly gained momentum since it was open-sourced by Databricks in April, and is already being used by thousands of organizations, including important backers such as Alibaba Group Holding Ltd., Booz Allen Hamilton Corp. and Intel Corp., its founders say. The project was conceived as a way of improving the reliability of so-called “data lakes,” which are systems or repositories of data stored in its natural format, usually in object “blobs” or files.

    Data lakes are popularly used by large enterprises as they provide a reliable way of ensuring that data can be accessed by anyone within an organization. They can be used to store any kind of data, including both structured and unstructured information in its native format, and also support analysis of data that helps provide real-time insights on business matters.

Databricks contributes Delta Lake to the Linux Foundation

  • Databricks contributes Delta Lake to the Linux Foundation

    The Databricks-led open source Delta Lake project is getting a new home and a new governance model at the Linux Foundation.

    In April, the San Francisco-based data science and analytics vendor open sourced the Delta Lake project, in an attempt to create an open community around its data lake technology. After months of usage and feedback from a community of users, Databricks decided that a more open model for development, contribution and governance was needed and the Linux Foundation was the right place for that.

Databricks’ Delta Lake Moves To Linux Foundation

Unifying cloud storage and data warehouses: Delta Lake project..

  • Unifying cloud storage and data warehouses: Delta Lake project hosted by the Linux Foundation

    Going cloud for your storage needs comes with some baggage. On the one hand, it's cheap, elastic, and convenient - it just works. On the other hand, it's messy, especially if you are used to working with data management systems like databases and data warehouses.

    Unlike those systems, cloud storage was not designed with things such as transactional support or metadata in mind. If you work with data at scale, these are pretty important features. This is why Databricks introduced Delta Lake to add those features on top of cloud storage back in 2017.

SDxCentral coverage

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

More in Tux Machines

Bringing PostgreSQL to Government

  • Crunchy Data, ORock Technologies Form Open Source Cloud Partnership for Federal Clients

    Crunchy Data and ORock Technologies have partnered to offer a database-as-a-service platform by integrating the former's open source database with the latter's managed offering designed to support deployment of containers in multicloud or hybrid computing environments. The partnership aims to implement a PostgreSQL as a service within ORock's Secure Containers as a Service, which is certified for government use under the Federal Risk and Authorization Management Program, Crunchy Data said Tuesday.

  • Crunchy Data and ORock Technologies Partnership Brings Trusted Open Source Cloud Native PostgreSQL to Federal Government

    Crunchy Data and ORock Technologies, Inc. announced a partnership to bring Crunchy PostgreSQL for Kubernetes to ORock’s FedRAMP authorized container application Platform as a Service (PaaS) solution. Through this collaboration, Crunchy Data and ORock will offer PostgreSQL-as-a-Service within ORock’s Secure Containers as a Service with Red Hat OpenShift environment. The combined offering provides a fully managed Database as a Service (DBaaS) solution that enables the deployment of containerized PostgreSQL in hybrid and multi-cloud environments. Crunchy PostgreSQL for Kubernetes has achieved Red Hat OpenShift Operator Certification and provides Red Hat OpenShift users with the ability to provision trusted open source PostgreSQL clusters, elastic workloads, high availability, disaster recovery, and enterprise authentication systems. By integrating with the Red Hat OpenShift platform within ORock’s cloud environments, Crunchy PostgreSQL for Kubernetes leverages the ability of the Red Hat OpenShift Container Platform to unite developers and IT operations on a single FedRAMP-compliant platform to build, deploy, and manage applications consistently across hybrid cloud infrastructures.

Hardware, Science and History

  • An Open Source Toolbox For Studying The Earth

    Fully understanding the planet’s complex ecosystem takes data, and lots of it. Unfortunately, the ability to collect detailed environmental data on a large scale with any sort of accuracy has traditionally been something that only the government or well-funded institutions have been capable of. Building and deploying the sensors necessary to cover large areas or remote locations simply wasn’t something the individual could realistically do. But by leveraging modular hardware and open source software, the FieldKit from [Conservify] hopes to even the scales a bit. With an array of standardized sensors and easy to use software tools for collating and visualizing collected data, the project aims to empower independent environmental monitoring systems that can scale from a handful of nodes up to several hundred.

  • The Early History of Usenet, Part II: Hardware and Economics

    There was a planning meeting for what became Usenet at Duke CS. We knew three things, and three things only: we wanted something that could be used locally for administrative messages, we wanted a networked system, and we would use uucp for intersite communication. This last decision was more or less by default: there were no other possibilities available to us or to most other sites that ran standard Unix. Furthermore, all you needed to run uucp was a single dial-up modem port. (I do not remember who had the initial idea for a networked system, but I think it was Tom Truscott and the late Jim Ellis, both grad students at Duke.) There was a problem with this last option, though: who would do the dialing? The problems were both economic and technical-economic. The latter issue was rooted in the regulatory climate of the time: hardwired modems were quite unusual, and ones that could automatically dial were all but non-existent. (The famous Hayes Smartmodem was still a few years in the future.) The official solution was a leased Bell 801 autodialer and a DEC DN11 peripheral as the interface between the computer and the Bell 801. This was a non-starter for a skunkworks project; it was hard enough to manage one-time purchases like a modem or a DN11, but getting faculty to pay monthly lease costs for the autodialer just wasn't going to happen. Fortunately, Tom and Jim had already solved that problem.

  • UNIX Version 0, Running On A PDP-7, In 2019

    WIth the 50th birthday of the UNIX operating system being in the news of late, there has been a bit of a spotlight shone upon its earliest origins. At the Living Computers museum in Seattle though they’ve gone well beyond a bit of historical inquiry though, because they’ve had UNIX (or should we in this context say unix instead?) version 0 running on a DEC PDP-7 minicomputer. This primordial version on the original hardware is all the more remarkable because unlike its younger siblings very few PDP-7s have survived. The machine running UNIX version 0 belongs to [Fred Yearian], a former Boeing engineer who bought his machine from the company’s surplus channel at the end of the 1970s. He restored it to working order and it sat in his basement for decades, while the vintage computing world labored under the impression that including the museum’s existing machine only four had survived — of which only one worked. [Fred’s] unexpected appearance with a potentially working fifth machine, therefore, came as something of a surprise.

Audiocasts/Shows: Linux Action News and Open Source Security Podcast

Red Hat and Containers

  • Queensland government looks to open source for single sign-on project

    Red Hat Single Sign-On, which is based on the open source Keycloak project, and the Apollo GraphQL API Gateway platform will be the two key software components underpinning a Queensland effort to deliver a single login for access to online government services. Queensland is implementing single sign-on capabilities for state government services, including ‘tell us once’ capabilities that will allow basic personal details of individuals to be, where consent is given by an individual, shared between departments and agencies.

  • Red Hat Releases Open Source Project Quay Container Registry
  • Red Hat open sources Project Quay container registry

    Yesterday, Red Hat introduced the open source Project Quay container registry, which is the upstream project representing the code that powers Red Hat Quay and Quay.io. Open-sourced as a Red Hat commitment, Project Quay “represents the culmination of years of work around the Quay container registry since 2013 by CoreOS, and now Red Hat,” the official post reads. Red Hat Quay container image registry provides storage and enables users to build, distribute, and deploy containers. It will also help users to gain more security over their image repositories with automation, authentication, and authorization systems. It is compatible with most container environments and orchestration platforms and is also available as a hosted service or on-premises.

  • Red Hat declares Quay code open

    Red Hat has open sourced the code behind Project Quay, the six year old container registry it inherited through its purchase of CoreOS. The code in question powers both Red Hat Quay and Quay.IO, and also includes the Clair open source security project which was developed by the Quay team, and integrated with the registry back in 2015. In the blog post announcing the move, Red Hat principal software engineer – and CoreOS alumnus – Joey Schorr, wrote, “We believe together the projects will benefit the cloud-native community to lower the barrier to innovation around containers, helping to make containers more secure and accessible.”

  • New Open Source Offerings Simplify Securing Kubernetes

    In advance of the upcoming KubeCon 2019 (CyberArk booth S55), the flagship event for all things Kubernetes and Cloud Native Computing Foundation, CyberArk is adding several new Kubernetes offerings to its open source portfolio to improve the security of application containers within Kubernetes clusters running enterprise workloads.

  • Java Applications Go Cloud-Native with Open-Source Quarkus Framework

    "With Quarkus, Java developers are able to continue to work in Java, the language they are proficient in, even when they are working with new, cloud-native technologies," John Clingan, senior principal product manager of middleware at Red Hat, told IT Pro Today. "With memory utilization measured in 10s of MB and startup time measured in 10s of milliseconds, Quarkus enables organizations to continue with their significant Java investments for both microservices and serverless." Many organizations have been considering alternative runtimes to Java, like Node.js and Go, due to high memory utilization of Java applications, according to Clingan. In addition, Java’s startup times are generally too slow to be an effective solution for serverless environments. As such, Clingan said that even if an organization decided to stick with Java for microservices, it would be forced to switch to an alternative runtime for serverless, or functions-as-a-service (FaaS), deployment.

  • Styra Secures $14M in Funding Led by Accel to Expand Open Source and Commercial Solutions for Kubernetes/Cloud-native Security

    New technology—like Kubernetes, Containers, ServiceMesh, and CICD Automation—speed application delivery and development. However, they lack a common framework for authorization to determine where access should be allowed, and where it should be denied. Styra’s commercial and open source solutions—purpose-built for the scale of cloud-native development—provide this authorization layer to mitigate risk across cloud application components, as well as the infrastructure they are built upon.