Spark Streaming Playground

Welcome to the Spark Streaming Playground! This project showcases various aspects of Apache Spark Streaming, providing a hands-on environment to explore stateful computations, structured streaming, utility functions, and integrations.

Project Structure

The project is organized into several packages, each focusing on different functionalities within Spark Streaming. Below is a detailed overview of each Java class and its purpose:

1. Advanced Package

StatefulComputations.java
This class demonstrates how to perform stateful computations using Spark's MapGroupsWithStateFunction. It utilizes Spark's stateful operations to maintain and update state information across streaming batches, enabling complex processing tasks like aggregations and transformations that rely on historical data.
Watermarks.java
In this class, watermarks are implemented to handle late data in streaming applications. It allows Spark to manage event time and define thresholds for when data can be considered too late for processing. This is crucial in scenarios where data arrival times are unpredictable.
EventTimeWindows.java
This class focuses on processing data within defined event time windows. It leverages Spark's time-based windowing functions to group data by specified time intervals, allowing for aggregate computations over the window of time in which the events occur.

2. Structured Streaming Package

StreamingDatasets.java
This class introduces the concept of streaming datasets, illustrating how to create and manipulate datasets that are continually updated. It integrates Spark's Dataset API to facilitate operations on streaming data in a structured format, enhancing readability and performance.
StreamingJoins.java
Here, the class implements streaming joins between datasets. It demonstrates how to perform real-time joins on streaming data, ensuring that related data from different sources can be combined seamlessly during the streaming process.
StreamingDataFrames.java
This class provides functionality to work with streaming DataFrames, focusing on the differences between static and streaming DataFrames. It emphasizes the importance of managing the schema and applying transformations to the streaming data efficiently.
StreamingAggregations.java
This class covers various aggregation functions applied to streaming data. It showcases how to compute aggregations like counts, sums, and averages in real-time, emphasizing the continuous nature of streaming data processing.

3. Utilities Package

SparkUtils.java
The utility class that simplifies the creation and management of Spark sessions. It provides a flexible approach to configure Spark applications, ensuring ease of scalability and testability for future extensions.

4. Integrations Package

IntegratingKafka.java
This class illustrates the integration of Spark Streaming with Kafka. It demonstrates how to consume and produce messages from Kafka topics, facilitating the seamless handling of streaming data sourced from a message broker.

5. Common Package

Schemas.java
This class defines schemas used across various DataFrames and Datasets. It encapsulates the structure of the data, ensuring consistency and reliability when processing structured data.
Stock.java
This class represents a model for stock data, encapsulating the properties and behaviors related to stock information within the application.
Car.java
Similar to the Stock class, this class models car data, defining its attributes and any necessary methods to manipulate car-related information in the streaming context.

6. Low Level Package

DStreams.java
This class provides a low-level abstraction for working with DStreams (Discretized Streams) but it has been deprecated.

Conclusion

The Spark Streaming Playground serves as a valuable resource for exploring the capabilities of Apache Spark Streaming. By examining the various classes and their functionalities, users can gain insights into stateful processing, structured streaming, and integrations with external data sources like Kafka.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
sql		sql
src		src
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt
cql.sh		cql.sh
docker-clean.sh		docker-clean.sh
docker-compose.yml		docker-compose.yml
psql.sh		psql.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Streaming Playground

Project Structure

1. Advanced Package

2. Structured Streaming Package

3. Utilities Package

4. Integrations Package

5. Common Package

6. Low Level Package

Conclusion

About

Releases

Packages

Languages

CodingMatte/spark-streaming-playground

Folders and files

Latest commit

History

Repository files navigation

Spark Streaming Playground

Project Structure

1. Advanced Package

2. Structured Streaming Package

3. Utilities Package

4. Integrations Package

5. Common Package

6. Low Level Package

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages