Databricks

KYCO: Know Your Company
Reveal Profile
30 October 2025

1) Overview of the Service Provider

Databricks, Inc. is a unified data and artificial intelligence platform that serves over 15,000 organizations worldwide, including more than 60% of the Fortune 500 companies such as Block, Comcast, Condé Nast, Rivian, and Shell. Founded in 2013 by the original creators of Apache Spark, Delta Lake, and MLflow, the company operates as a private enterprise software provider headquartered in San Francisco with a global workforce of approximately 8,000 employees.

The company’s core offering is the Databricks Data Intelligence Platform, built on a lakehouse architecture that combines the flexibility of data lakes with the performance and governance of data warehouses. This unified platform integrates data engineering, data science, machine learning, and business intelligence capabilities, enabling organizations to process structured, semi-structured, and unstructured data at scale. The platform operates across major cloud providers including Amazon Web Services, Microsoft Azure, and Google Cloud Platform, offering multi-cloud flexibility without vendor lock-in.

Databricks serves diverse industries including financial services, healthcare, manufacturing, retail, media, telecommunications, and public sector organizations. In financial services specifically, the platform supports use cases such as real-time fraud detection, risk management, regulatory compliance, and customer personalization through advanced analytics and AI capabilities. The company has established significant presence in this sector with clients including JPMorgan Chase, Barclays, HSBC, and numerous other global financial institutions.

The company has achieved remarkable financial performance, reaching a $4 billion annual revenue run rate as of 2025 with over 50% year-over-year growth. Databricks maintains strong customer economics with net revenue retention exceeding 140% and over 650 customers generating more than $1 million in annual recurring revenue. The platform has been recognized as a Leader in multiple Gartner Magic Quadrants, including the 2024 Magic Quadrant for Cloud Database Management Systems and the 2025 Magic Quadrant for Data Science and Machine Learning Platforms.

From a competitive positioning perspective, Databricks competes primarily with Snowflake in the data warehousing space, while also facing competition from cloud providers’ native services such as AWS Redshift, Google BigQuery, and Azure Synapse Analytics. The company differentiates itself through its unified approach to data and AI, open-source foundation, and strong capabilities in machine learning and artificial intelligence workloads. Recent strategic acquisitions including MosaicML for $1.3 billion and Tabular for over $1 billion have strengthened its position in the generative AI and data management markets.

2) History

Databricks was founded in 2013 by seven computer science academics from the University of California, Berkeley, who were the original creators of Apache Spark, an open-source distributed computing framework that emerged from UC Berkeley’s AMPLab research project. The founding team included Ali Ghodsi, Ion Stoica, Matei Zaharia, Patrick Wendell, Reynold Xin, Andy Konwinski, and Arsalan Tavakoli-Shiraji, who had collaborated on developing Apache Spark starting in 2009 to address the limitations of Hadoop MapReduce for big data processing.

The company’s origins trace back to Matei Zaharia’s PhD research at UC Berkeley, where he developed the concept of Resilient Distributed Datasets (RDDs) that enabled Apache Spark to achieve speeds up to 100 times faster than Hadoop by caching data in memory rather than reading from disk. The founding team’s initial motivation was purely academic, notably competing in the Netflix Prize challenge to build better recommendation algorithms, where they achieved significant success by demonstrating Spark’s superior performance.

In 2013, after Spark became an Apache incubator project, the founders established Databricks with initial Series A funding of $14 million led by Andreessen Horowitz. The company’s first commercial product launched in 2015 as a cloud-based notebook interface that simplified Spark deployment and management for enterprises. However, the early years proved challenging, with the company generating only $1 million in revenue by 2015 despite Spark’s growing popularity in the open-source community.

A pivotal transformation occurred in January 2016 when Ali Ghodsi became CEO after Ion Stoica stepped down to return to his Berkeley professorship. Under Ghodsi’s leadership, Databricks shifted from a self-service model to an enterprise sales strategy, hired experienced executives, and began developing proprietary features alongside its open-source offerings. This strategic pivot proved successful, with the company securing its first million-dollar deal in 2017 and growing recurring revenue from $40 million in 2017 to $200 million by Q3 2019.

The company achieved significant milestones through strategic partnerships, most notably the November 2017 announcement of Azure Databricks, a first-party service integration with Microsoft Azure that exposed Databricks to Microsoft’s large enterprise customer base. This partnership marked a crucial expansion phase, as did subsequent integrations with Google Cloud in February 2021.

From 2020 to 2023, Databricks pioneered the “lakehouse” architecture concept, combining data lake flexibility with data warehouse performance and governance. The company launched Delta Lake in 2019, contributed it to the Linux Foundation, and released Databricks SQL in November 2020 to capture data warehousing workloads. During this period, Databricks executed multiple strategic acquisitions including Redash for data visualization in June 2020, 8080 Labs in October 2021, and several others to strengthen its platform capabilities.

The company’s most transformative phase began in June 2023 with its acquisition of MosaicML for $1.3 billion, positioning Databricks as a leader in generative AI and large language model training. This acquisition was followed by the launch of the Data Intelligence Platform concept, integrating AI capabilities throughout the unified analytics platform. Recent major acquisitions include Tabular for over $1 billion in June 2024, bringing the creators of Apache Iceberg in-house to strengthen data format interoperability.

Databricks has demonstrated exceptional financial growth, progressing through multiple funding rounds from its initial $14 million Series A to a $10 billion Series J in December 2024 that valued the company at $62 billion. In September 2025, the company announced an additional $1 billion Series K funding round, pushing its valuation above $100 billion and making it the fourth private company to achieve this milestone.

3) Key Executives

Ali Ghodsi has served as Chief Executive Officer and Co-Founder of Databricks since January 2016, having previously held the role of Vice President of Engineering and Product Management. Ghodsi earned his PhD in Computer Science from KTH/Royal Institute of Technology in Sweden in 2006 in the area of Distributed Computing, and holds an MBA from Mid-Sweden University from 2003. He continues to serve as an adjunct professor at UC Berkeley and is on the board at UC Berkeley’s RiseLab, where he was one of the original creators of Apache Spark. Prior to founding Databricks, Ghodsi co-founded Peerialism AB, a Stockholm-based company developing a peer-to-peer data transfer system, and served as an assistant professor at KTH from 2008 to 2009.

Dave Conte has served as Chief Financial Officer since October 2019, bringing over 30 years of finance and administration experience in multinational public and private companies within the technology industry. Conte holds a Bachelor’s degree in Business Economics from UC Santa Barbara and previously served as CFO at Splunk for eight years, where he took the company public and helped it grow from $100 million in annual revenue to more than $2 billion annually. Prior to Splunk, he served as CFO at Opsware, an IT automation software company that was acquired by HP for $1.65 billion, and served as a board member at Anaplan from 2016 until its acquisition by Thoma Bravo in 2022.

Hatim Shafique currently serves as Chief Operating Officer, having been promoted from his previous role as Chief Customer Officer in 2022. Shafique brings more than 14 years of experience building enterprise software products to Databricks, having previously served as Chief Customer Officer and Senior Vice President of Tech Ops for AppDynamics where he ran all customer-facing and technology functions. He joined AppDynamics as one of the first 10 employees and was instrumental in helping the company grow from a handful of customers to a market leader worth $3.7 billion, and holds a Masters in Computer Science from University of Southern California.

Trâm Phi serves as Senior Vice President and General Counsel, appointed in May 2022. Phi brings decades of experience scaling high-growth companies in both public and private markets, having previously served as SVP, General Counsel at DocuSign where she helped scale the legal function and led the transition to a mature public company. Prior to DocuSign, she served as Chief Legal Officer and Chief of Staff at Imperva for nearly eight years and was Vice President, General Counsel of ArcSight, leading both cybersecurity software providers’ legal teams through transitions from private to public markets.

Naveen Zutshi has served as Chief Information Officer since January 2022, leading the global information technology functions at Databricks. Zutshi spent six years as CIO at Palo Alto Networks overseeing IT strategy and operations, helping the company scale rapidly and expand into new security categories. Previously, he served as Senior Vice President of Technology at Gap Inc., where he oversaw global infrastructure, operations and security for the company, championing the transition to a cloud-based infrastructure and helping scale the retailer’s e-commerce business.

Fermín Serna joined as Chief Security Officer in August 2021, bringing over 20 years of enterprise information security expertise to the leadership team. Serna most recently served as Chief Information Security Officer at Citrix, where he led a robust global security and data privacy practice, and previously served as Head of Product Security at Google, where he led the team responsible for Google Product Security and worked with external technology partners to build highly secure APIs and development libraries. He also held senior security roles at Microsoft and was previously the CSO for Semmle, which was acquired by GitHub.

4) Ownership

Databricks operates as a privately-held company with ownership distributed among founders, venture capital firms, institutional investors, and strategic technology partners. The company was founded in 2013 by seven co-founders from UC Berkeley – Ali Ghodsi, Ion Stoica, Matei Zaharia, Patrick Wendell, Reynold Xin, Andy Konwinski, and Arsalan Tavakoli-Shiraji – who maintain significant influence through their leadership roles and board representation. While exact equity distribution among founders is not publicly disclosed, the collaborative founding structure and continued executive involvement suggests substantial founder ownership retention.

The company’s capital structure has evolved significantly through multiple funding rounds, culminating in a Series K round in September 2025 that valued Databricks at over $100 billion. This followed a record-breaking $10 billion Series J round in December 2024 at a $62 billion valuation, complemented by $5.25 billion in debt financing arranged by major investment banks including JPMorgan Chase, Barclays, Citi, Goldman Sachs, and Morgan Stanley. The Series K round was co-led by Andreessen Horowitz, Insight Partners, MGX, Thrive Capital, and WCM Investment Management, demonstrating continued confidence from both existing and new investors.

Databricks has attracted over 90 institutional investors throughout its funding history, including prominent venture capital firms such as Andreessen Horowitz (which led the initial Series A in 2013), Tiger Global Management, Fidelity Investments, and T. Rowe Price. Strategic technology investors have played an increasingly important role, with Microsoft participating since the Series E round in 2019, followed by Amazon Web Services and Salesforce Ventures in subsequent rounds. Recent strategic investments include significant stakes from Meta, Nvidia, Qatar Investment Authority, Temasek, and Ontario Teachers’ Pension Plan, reflecting the company’s positioning as critical AI infrastructure.

The board of directors includes founder representation with Ali Ghodsi as CEO, Ion Stoica as Executive Chairman, and Matei Zaharia as Chief Technology Officer, alongside representatives from major investment firms. Board members include Ben Horowitz from Andreessen Horowitz, Pete Sonsini from New Enterprise Associates, and industry executives Elena Donio and Jonathan Chadwick, creating a balance between founder control and investor governance. The company’s share structure likely concentrates voting power among founders and early investors, a common arrangement for high-growth private technology companies seeking to maintain strategic control while accessing growth capital.

Total funding across all rounds has exceeded $15.7 billion in equity financing, making Databricks one of the most heavily capitalized private companies globally. The company has demonstrated strong unit economics with over 140% net revenue retention and gross margins exceeding 80%, supporting its high valuation multiples. With over 650 customers generating more than $1 million in annual recurring revenue and positive free cash flow achieved in 2025, Databricks has built a sustainable financial foundation that reduces pressure for immediate public market access.

5) Legal Claims and Actions

Databricks faces significant legal challenges stemming primarily from copyright infringement allegations related to AI model training. Multiple class-action lawsuits filed in 2024 and 2025 allege that Databricks and its subsidiary MosaicML used copyrighted books to train their MPT and DBRX large language models without permission. Court proceedings include active cases in both US federal court and a Canadian class-action suit filed in July 2025. While some direct infringement claims regarding the newer DBRX model were dismissed in August 2025, claims regarding the MPT model remain active, and a separate ruling in June 2025 allowed plaintiffs to expand their complaint to target the DBRX models.

The company has also engaged in patent litigation, filing a lawsuit in September 2024 against patent monetization firm ByteWeavr (formerly Ascend IP) alleging an “extortion scheme.” Additionally, Databricks is named as a defendant in a patent infringement suit filed by R2 Solutions LLC in December 2023. An employment discrimination case filed against the company in June 2024 was dismissed in November 2024.

These ongoing legal matters could result in substantial financial settlements, ongoing legal costs, and potential restrictions on AI model development processes that could impact Databricks’ competitive positioning in the generative AI market.

6) Recent Media Coverage

Media coverage from 2023 to 2025 has been dominated by Databricks’ aggressive fundraising, strategic acquisitions, and emerging legal challenges related to its AI development. In September 2025, the company announced it had crossed a $4 billion revenue run-rate, with its AI products contributing over $1 billion to that figure, and was closing a $1 billion Series K funding round at a valuation exceeding $100 billion. This followed a $10 billion Series J round in January 2025 at a $62 billion valuation, which was supplemented by $5.25 billion in debt financing, and a Series I round of over $500 million in September 2023 that valued the company at $43 billion.

Databricks has executed a series of high-value acquisitions to build out its Data Intelligence Platform. Major deals include the purchase of generative AI startup MosaicML for $1.3 billion in July 2023, data replication provider Arcion for over $100 million in October 2023, data migration firm BladeBridge in February 2025, and data management company Tabular for over $1 billion in June 2024. The acquisition spree continued into 2025 with the purchases of database startup Neon for approximately $1 billion in May, machine learning startup Tecton in August, and database technology provider Mooncake Labs in October. These acquisitions are aimed at unifying data management and accelerating the development of AI agents on its platform.

The company faces significant legal scrutiny concerning copyright infringement in the training of its AI models. In March and May of 2024, multiple class-action lawsuits were filed by authors alleging that Databricks and its subsidiary MosaicML used copyrighted books to train their MPT and DBRX large language models without permission. Court dockets from July 2025 indicate a Canadian class-action suit was also filed against Databricks and MosaicML for copyright infringement related to training its LLMs. In August 2025, a court dismissed certain direct infringement claims related to the newer DBRX model in one case, though claims regarding the MPT model remain active; however, a separate ruling in June 2025 allowed plaintiffs in another case to expand their complaint to target the DBRX models.

Databricks has actively expanded its strategic partnerships and global footprint. In 2025, the company announced an extended partnership with Microsoft for Azure Databricks, a new alliance with Palantir focused on government clients, and integrations with Kyndryl, S&P Global, LSEG, NTT DATA, and OpenAI. The company is also expanding internationally, announcing a $250 million investment in India in April 2025 and taking its first stake in a Latin American startup, Indicium, in September 2025. In November 2024, the company’s federal subsidiary achieved Department of Defense (DoD) Impact Level 5 (IL5) authorization, enabling it to manage highly sensitive government data.

Key executive changes include the September 2025 departure of Naveen Rao, the head of AI who joined through the MosaicML acquisition, to launch a new AI hardware startup in which Databricks plans to invest. Leadership in its India division also saw turnover, with the vice president for the region stepping down in January 2025, followed by the appointment of Kamalkanth Tummala as the new country manager in September 2025.

7) Strengths

Industry Leadership and Market Recognition

Databricks has established itself as the definitive leader in data and AI platforms, earning recognition as a Leader in multiple Gartner Magic Quadrants including the 2025 Magic Quadrant for Data Science and Machine Learning Platforms where it ranks #1 in both Execution and Vision, and the 2024 Magic Quadrant for Cloud Database Management Systems. The company has also been recognized as a Leader in the 2025 Forrester Wave for Data Management for Analytics, 2024 Forrester Wave for Data Lakehouses, and 2024 Forrester Wave for AI Foundation Models. This comprehensive industry recognition validates Databricks’ technological excellence and market positioning across the entire data and AI spectrum.

Unified Lakehouse Architecture Innovation

Databricks pioneered the revolutionary lakehouse architecture in 2020, which combines the flexibility and cost-effectiveness of data lakes with the performance and governance capabilities of data warehouses. This unified approach has been adopted by 74% of global enterprises and eliminates the traditional data silos that separate data engineering, analytics, business intelligence, data science, and machine learning. The lakehouse architecture enables organizations to manage structured, semi-structured, and unstructured data types within a single platform, significantly reducing complexity and operational overhead while enabling advanced AI and machine learning capabilities that traditional data warehouses cannot support.

Open Source Foundation and Innovation Leadership

The company’s technology foundation is built on open-source innovations originally created by Databricks’ founding team, including Apache Spark, Delta Lake, MLflow, and Unity Catalog. This open-source heritage provides customers with flexibility and avoids vendor lock-in, while Databricks’ commercial platform adds enterprise-grade features, security, and support. The company continues to lead innovation in the open-source community, contributing major enhancements to flagship projects and maintaining its position as a thought leader in data processing and machine learning technologies.

Exceptional Financial Performance and Customer Economics

Databricks has demonstrated remarkable financial growth, reaching a $4 billion annual revenue run rate in 2025 with over 50% year-over-year growth, while maintaining strong customer economics with net revenue retention exceeding 140%. The company serves over 15,000 organizations worldwide, including more than 60% of the Fortune 500, with over 650 customers generating more than $1 million in annual recurring revenue. These metrics demonstrate both the platform’s enterprise appeal and its ability to expand within existing customer accounts, indicating strong product-market fit and customer satisfaction.

Comprehensive AI and Generative AI Capabilities

Through strategic acquisitions including MosaicML for $1.3 billion and significant investment in AI infrastructure, Databricks has positioned itself at the forefront of the generative AI revolution. The company’s AI capabilities include the Mosaic AI platform for building and deploying AI agents, Vector Search for retrieval-augmented generation applications, and native support for foundation models from leading providers. The recent $100 million partnership with OpenAI brings GPT-5 and other frontier models natively to the Databricks platform, providing customers with cutting-edge AI capabilities integrated with their enterprise data.

Strong Partner Ecosystem and Multi-Cloud Strategy

Databricks has cultivated a robust ecosystem of over 6,000 partners worldwide, including strategic relationships with major cloud providers, consulting firms, and technology vendors. The platform operates natively across AWS, Microsoft Azure, and Google Cloud Platform, providing customers with multi-cloud flexibility and avoiding vendor lock-in. Strategic partnerships with major consulting firms like Accenture, Deloitte, and others enable comprehensive implementation support, while technology partnerships with firms like Tableau, Fivetran, and dbt create an integrated modern data stack.

8) Potential Risks and Areas for Further Due Diligence

Active Legal Challenges in AI Model Training

Multiple class-action lawsuits filed in 2024 and 2025 represent significant ongoing legal risks related to copyright infringement in AI model training. Authors have alleged that Databricks and its subsidiary MosaicML used copyrighted books to train their MPT and DBRX large language models without permission, with court proceedings still active as of 2025. While some direct infringement claims regarding the newer DBRX model were dismissed in August 2025, claims regarding the MPT model remain active, and a separate ruling in June 2025 allowed plaintiffs to expand their complaint to target the DBRX models. These lawsuits could result in substantial financial settlements, ongoing legal costs, and potential restrictions on AI model development processes that could impact Databricks’ competitive positioning in the generative AI market.

Technology Platform Limitations and Complexity

The platform exhibits significant technical limitations that could impact scalability and user adoption. Databricks notebooks have strict sizing constraints with individual cells limited to 6 MB input and maximum notebook sizes of 10 MB for revision snapshots, creating potential workflow bottlenecks for complex analytics projects. The serverless compute environment has numerous restrictions including no support for Scala and R programming languages, limited data source support, and constraints on machine learning capabilities with no GPU support for model serving endpoints. Query performance limitations include the lack of native full-text search support and restricted alerting capabilities, forcing many organizations to implement additional tools like OpenSearch or Elasticsearch alongside Databricks. The platform’s complexity, evidenced by extensive documentation across three separate cloud environments and steep learning curves reported by users, could hinder adoption among non-technical teams and increase implementation timelines.

Infrastructure Vulnerability and Security Incidents

Despite strong security frameworks, Databricks has experienced notable security vulnerabilities that highlight potential infrastructure risks. In January 2023, a vulnerability was disclosed that could allow low-privileged users to bypass cluster isolation, requiring the company to migrate affected scripts. Security researchers have demonstrated the ability to access sensitive system files and environment variables through file upload vulnerabilities, though Databricks’ incident response team detected and remediated these issues. Recent operational incidents include Delta sharing service outages in January 2025 affecting UK South and North Europe regions due to load balancing errors. The platform’s architecture separating control plane and data plane operations, while designed for security, creates complexity that could introduce additional attack vectors as the platform scales globally across multiple cloud environments.

Executive Turnover and Leadership Stability

Recent leadership changes raise questions about organizational stability during a critical growth phase. In September 2025, Naveen Rao, the head of AI who joined through the MosaicML acquisition, departed to launch a new AI hardware startup. This departure comes at a pivotal time as Databricks positions itself in the generative AI market and approaches a potential IPO. Changes in the India division include the departure of the vice president for the region in January 2025, followed by the appointment of a new country manager in September 2025. While the founding team remains involved with Ali Ghodsi as CEO since 2016 and other co-founders in key technical roles, the departure of senior AI leadership during rapid expansion could signal potential challenges in retaining top talent in the competitive AI sector.

High Valuation and Market Dependencies

The company’s extraordinary valuation exceeding $100 billion creates substantial pressure for sustained execution and growth delivery. With a valuation approximately 25 times its current revenue run-rate, Databricks faces heightened expectations that any growth deceleration or competitive pressure could trigger significant valuation corrections. The company’s business model depends heavily on cloud partnerships with AWS, Microsoft Azure, and Google Cloud Platform, creating potential risks if these relationships change or if cloud providers prioritize their own competing analytics services. Customer concentration risks exist with approximately 50 customers each spending over $10 million annually out of 15,000 total customers, meaning the loss of major accounts could materially impact revenue growth. The company operates in an intensely competitive market facing established players like Snowflake, cloud giants offering native services, and emerging AI-focused platforms, requiring continuous innovation and substantial R&D investment to maintain market leadership.

Operational and Regulatory Compliance Challenges

As a provider serving regulated industries including financial services and healthcare, Databricks must navigate evolving regulatory frameworks regarding AI governance, data privacy, and export controls that could impact operations and compliance costs. The platform’s multi-tenant architecture and collaborative features, while enabling broad adoption, create inherent risks around data isolation and potential cross-contamination between customer environments. Disaster recovery capabilities, while documented, require complex configuration and testing procedures that many organizations may find challenging to implement effectively, potentially exposing them to business continuity risks. The company’s rapid global expansion across over 18 countries introduces operational complexity in managing diverse regulatory requirements, local data residency laws, and varying cybersecurity standards that could increase compliance costs and execution risks.

Sources

  1. Databricks, Inc.: Homepage
  2. Databricks closes $1 billion round, projects $4 billion in annualized revenue
  3. Databricks eyes over $100 billion valuation as investors back AI growth plans
  4. Databricks continues M&A spree, will buy Neon for $1 billion in AI-agent push
  5. Exclusive: Databricks to buy Sequoia-backed Tecton in AI agent push
  6. Databricks takes stake in Indicium, marking first investment in Latin American startup
  7. Databricks sues patent holders over alleged ‘extortion’ scheme
  8. Nvidia, Databricks Sued by Authors Over AI Model Training (1)
  9. Databricks Beats Authors’ Copyright Claims Over One AI Model
  10. Databricks AI Chief to Exit, Launch a New Computer Startup
  11. Databricks says annualized revenue to reach $3.7 billion by … – CNBC
  12. Databricks confirms new $100B valuation on $4B ARR – TechCrunch
  13. Sources: Naveen Rao’s new AI hardware startup targets $5 …
  14. S&P Global Expands Collaboration with Databricks with the addition of S&P Capital IQ Pro datasets via Delta Sharing
  15. Databricks Announces $15B in Financing to Attract Top AI Talent and Accelerate Global Expansion
  16. Databricks Names Elena Donio and Jonathan Chadwick to Board of Directors
  17. Databricks Achieves Authorization for DoD IL5 on AWS GovCloud
  18. Latham Advises on Databricks US5 25 Billion Credit Facilities
  19. Databricks, Inc. Large Language Model Litigation
  20. In Re Mosaic LLM Litigation, 3:24-cv-01451
Save as PDF