• Cloudera Cloudera
  • CDP resources

    Build skills to deliver innovation with Cloudera Data Platform

    Keep abreast of the latest Cloudera technology with content and tools tailored for developers, analysts, data scientists, architects, and admins.

    WEEKLY DEMOS

    Join our experts for live weekly demos 

    Find out CDP's latest features and capabilities and get answers to pressing questions by joining Cloudera product experts in live weekly demos. 

    Build AI-based web Applications with Cloudera Machine Learning

    Thursday, August 04, 2022

    Learn how Cloudera Machine Learning (CML) enables data science practitioners to quickly deliver an ML-based web application to business users.

    The demo will demonstrate how users can discover and ingest data sets, train ML models with whatever library or language they are most comfortable with, quickly deploy the ML model with an API, and build a web application for business users to interact with the ML model’s API.


    CDP Demos
    • Exploratory data analytics to uncover answers to burning business questions [Complete Recording]
    • Exploratory Data Science to discover and visualize data for building machine learning models [Complete Recording]
    • Universal Data Distribution to connect data from any source to any destination [Complete Recording]
    • Multistage Data Pipelines with Cloudera Data Platform (CDP) [highlight]
    • Security & Governance with Cloudera Shared Data Experience (SDX)  [highlight]
    • Streaming Data with Cloudera DataFlow (CDF) [highlight]
    • Enterprise Machine Learning with Cloudera Machine Learning (CML) [highlight]
    • Analytics with Cloudera Data Warehouse (CDW) [highlight]
    • Application Development with Cloudera Operational Database (COD) [highlight]
    VIDEOS

    See the benefits of CDP through how-to videos. 

    Understand the use cases that CDP solves and learn how to successfully deploy and use the full range of the Cloudera Data Platform.


     

    TOURS

    Experience CDP for yourself

    Click below to begin an interactive CDP product tour

     

    More CDP Tours

    Tutorials to help build, deploy and scale 

    Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products.

    Less CDP Tours
    TUTORIALS
    Tutorials

    Tutorials to help build, deploy and scale 

    Optimize your time with detailed tutorials that clearly explain the best way to deploy, use, and manage Cloudera products.

    How to Create a CDP Private Cloud Base Development Cluster

    Walk through the installation process for CDP Private Cloud Base (trial version).

    Create a Simple Web Application using Cloudera Operational Database

    Use Cloudera Operational Database (COD) and Machine Learning (CML) to create a simple web application.

    Processing DICOM Files with Spark on CDP

    Use Cloudera Data Engineering (CDE) on Cloudera Data Platform (CDP) to transform the DICOM files produced by an MRI into PNG images.

    Using NVIDIA RAPIDS to Accelerate AI Training in CDP Hybrid Cloud

    Explore how you can leverage NVIDIA's RAPIDS framework using Cloudera Machine Learning (CML), on the Cloudera Data Platform (CDP).

     

    EVENTS
    Event banner for the Apache Iceberg: Looking Below the Waterline Hybrid Meetup
    Meetup

    Special Hybrid Event – Apache Iceberg: Looking Below the Waterline


    Thursday, December 8, 2022

    In a very short span of time since its advent, Apache Iceberg has become the most popular, fastest growing and widely adopted open table format in the big data space. It addresses some of the known big data pain points around data consistency, scalability, performance, schema and partition evolution. In this meetup, you'll hear from the key partners in the open source community leading and driving Iceberg enhancements and roadmap. We have a full agenda; here's a summary of the four talks we plan to deliver:

    Apache Iceberg for BI use cases

    This talk will cover the integration of Iceberg open table format with Apache Hive and Impala compute engines, Iceberg v1 and v2 capabilities support, customer use cases and future Iceberg enhancements and innovations in the works at Cloudera. We'll take a detailed look into the following capabilities supported in Hive and Impala:

    • Critical functional and performance enhancements
    • Materialized views support
    • In-place Table migration of Hive external to Iceberg tables
    • Row level update/delete
    • Table rollback
    • Table maintenance

    Learn how Teranet keeps up with the changing growth and requirements of their business using Apache Iceberg for their change data capture use case leveraging Spark and Impala.

     

    Multi-function Analytics with Apache Iceberg

    This session will present a demonstration of using Spark with Iceberg tables, highlighting key Iceberg features. We'll show the interoperability of Spark with Hive and Impala. Along the way, we'll cover Cloudera's contributions for improving Spark and Impala support on Iceberg.

     

    Apache Iceberg's REST Catalog - Real and Potential Uses Beyond Data Workflows

    Iceberg's new REST catalog provides a friendly access point for the rich metadata and functionality that comes with an Iceberg-powered data warehouse. This makes Iceberg even easier to integrate into compute engines and makes catalog operations available from pretty much any client you can imagine. However, the power of the REST catalog doesn't stop there. There are a myriad of tools and features that sit on the edge of the data platform that benefit highly from the REST catalog design. In this talk, we want to cover a few creative uses that currently exist as well as some imaginative uses that could exist.


    Incremental compaction using Apache Iceberg

    At Linkedin, streaming data in the form of Kafka topics is ingested to the data lake by low-latency ingestion pipelines powered by Apache Gobblin. This often leads to smaller files that can contain duplicate records due to at-least once delivery semantics, which lead to the creation of another set of pipelines that deduplicate data for correctness and compact into larger files for storage and query efficiency.

    Those compaction pipelines are bursty, compute intensive and have higher latency due to their batch processing nature. With the increase in data volume, it becomes increasingly important to process/compute data in an incremental fashion for optimal resource utilization and lower latency. In this talk, we present how Linkedin leverages Iceberg to migrate its compaction pipelines from batch to incremental processing models and solve such latency and compute problems. We also show how that leads to an improvement in overall cluster resource utilization and more uniform workload distribution. Furthermore, we will also focus on how we optimize compaction and data deduplication in light of late data.

    The registration link below will prompt you to log into your Linked In account:


    Past Virtual Meetups

    Watch recordings of some of our recent "virtual meetups" held by one of our Future of Data network of local meetup groups via YouTube to see why more than 49,000 of the world's data practitioners choose to work with Cloudera products and services.

    CDP TECHNICAL BLOGS

    Switching from CPUs to GPUs for NYC Taxi Fare Predictions with NVIDIA RAPIDS

    By Jacob Bengtson

    This blog demonstrates how easy it is to adapt a script built with popular CPU based Python libraries, like Pandas and Scikitlearn, to instead run with GPU based Python libraries, like cuDF and cuML.

    Next Stop – Predicting on Data with Cloudera Machine Learning

    By Robert Hryniewicz

    This blog series follows the manufacturing and operations data lifecycle stages (Predictive Analytics) of an electric car manufacturer - typically experienced in large, data-driven manufacturing companies.

    Next Stop - Building a Data Pipeline from Edge to Insight

    By Tui Leauanae and Nicolas Pelaez

    This blog series follows the manufacturing, operations and sales data for a connected vehicle manufacturer as the data goes through stages and transformations typically experienced in a large manufacturing company on the leading edge of current technology.

    Digital Transformation is a Data Journey From Edge to Insight

    By Tui Leauanae, David LeGrand, and Nicolas Pelaez

    This is the first in a six-part blog series that outlines the data journey from edge to AI and the business value data produces along the journey. The data journey is not linear, but it is an infinite loop data lifecycle – initiating at the edge, weaving through a data platform, and resulting in business imperative insights applied to real business-critical problems that result in new data-led initiatives.

    COMMUNITY
    Group of people talking and meeting

    Explore the Cloudera Community

    Join the Cloudera Community and connect with more than 69,000 of your peers, discussing more than 18,000 solutions.

    How to Connect Go Applications to Cloudera Operational Database

    The Cloudera Operational Database (COD) experience is a managed dbPaaS solution. It can auto-scale based on the workload utilization of the cluster and will be adding the ability to auto-tune (better performance within the existing infrastructure footprint) and auto-heal (resolve operational problems automatically) later this year.

    See article

    Spark Structured Streaming Example with CDE

    This demo will pull from the Twitter API using NiFi, write to payload to a Kafka topic named "twitter".

    See article

    How to Configure K9s for Cloudera Data Engineering

    How to use K9s to fetch metrics and logs for Cloudera Data Warehouse Experience , I decided to create the same tutorial for Cloudera Data Engineering. The process is very similar, as you can see below.

    See article

    CLOUDERA EDUCATIONAL SERVICES
    CDP Training

    CDP training

    Hone your big data skills with the world’s leading experts through Cloudera Educational Services’ curriculum.

    Get certified. Stand out.
    PROFESSIONAL SERVICES

    Accelerate success with Cloudera SmartServices expertise

    Move from pilot to production quickly, cost-effectively, and securely with hands-on technical insight from Cloudera experts. Our comprehensive portfolio of services helps you shorten time to value from CDP by providing the right offerings and support for everything from launching to accelerating and expanding your deployment.

    Group of people talking and meeting

    CloudSmart: CDP Public Cloud adoption service

    Evaluate cloud options, optimize data, and scale analytics, moving workloads to the public cloud with confidence and minimal risk.

    Get CloudSmart

    SmartMigrate: Services for moving to Cloudera Data Platform

    Upgrade existing CDH and HDP deployments and migrate to CDP Data Center while minimizing risk, business disruptions, and SLA violations. 

    Get SmartMigrate

    SmartHealth: Platform health check for optimal performance

    Ensure peak performance with comprehensive platform deployment and use case implementation health check.

    Get SmartHealth

    DOCUMENTATION
     

    Central repository for technical content on all Cloudera products.

    Find guides, quick starts, manuals, and best practices broken down by product and by task.

    Your form submission has failed.

    This may have been caused by one of the following:

    • Your request timed out
    • A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page.