What is the name of a file that is used as virtual memory and to store dump data after a crash within Windows?
There are several ways to monitor Spark applications: web UIs, metrics, and external instrumentation. Show
Web InterfacesEvery SparkContext launches a Web UI, by default on port 4040, that displays useful information about the application. This includes:
You can access this
interface by simply opening Note that this information is only available for the duration of the application by default. To view the web UI after the fact, set Viewing After the FactIt is still possible to construct the UI of an application through Spark’s history server, provided that the application’s event logs exist. You can start the history server by executing:
This creates a web interface at When using the file-system provider class (see The spark jobs themselves must be configured to log events, and to log them to the same shared, writable directory. For example, if the server was configured with a log directory of
The history server can be configured as follows: Environment Variables
Applying compaction on rolling event log filesA long-running application (e.g. streaming) can bring a huge single event log file which may cost a lot to maintain and also requires a bunch of resource to replay per each update in Spark History Server. Enabling Spark History Server can apply compaction on the rolling event log files to reduce the overall size of logs, via setting the configuration Details will be described below, but please note in prior that compaction is LOSSY operation. Compaction will discard some events which will be no longer seen on UI - you may want to check which events will be discarded before enabling the option. When the compaction happens, the History Server lists all the available event log files for the application, and considers the event log files having less index than the file with smallest index which will be retained as target of compaction. For example, if the application A has 5 event log files and Once it selects the target, it analyzes them to figure out which events can be excluded, and rewrites them into one compact file with discarding events which are decided to exclude. The compaction tries to exclude the events which point to the outdated data. As of now, below describes the candidates of events to be excluded:
Once rewriting is done, original log files will be deleted, via best-effort manner. The History Server may not be able to delete the original log files, but it will not affect the operation of the History Server. Please note that Spark History Server may not compact the old event log files if figures out not a lot of space would be reduced during compaction. For streaming query we normally expect compaction will run as each micro-batch will trigger one or more jobs which will be finished shortly, but compaction won’t run in many cases for batch query. Please also note that this is a new feature introduced in Spark 3.0, and may not be completely stable. Under some circumstances, the compaction may exclude more events than you expect, leading some UI issues on History Server for the application. Use it with caution. Spark History Server Configuration OptionsSecurity options for the Spark History Server are covered more detail in the Security page.
Note that in all of these UIs, the tables are sortable by clicking their headers, making it easy to identify slow tasks, data skew, etc. Note
REST APIIn addition to viewing the metrics in the UI, they are also available as JSON. This gives developers an easy way to create new visualizations and monitoring tools for Spark. The JSON is available for both running applications, and in the history server. The endpoints are mounted at In the API, an application is referenced by its application ID,
The number of jobs and stages which can be retrieved is constrained by the same retention mechanism of the standalone Spark UI; Executor Task MetricsThe REST API exposes the values of the Task Metrics collected by Spark executors with the granularity of task execution. The metrics can be used for performance troubleshooting and workload characterization. A list of the available metrics, with a short description:
Executor MetricsExecutor-level metrics are sent from each executor to the driver as part of the Heartbeat to describe the performance metrics of Executor itself like JVM heap memory, GC information. Executor metric values and their measured memory peak values per executor are exposed via the REST API in JSON format and in Prometheus format. The JSON end point is exposed at:
The computation of RSS and Vmem are based on proc(5) API Versioning PolicyThese endpoints have been strongly versioned to make it easier to develop applications on top. In particular, Spark guarantees:
Note that even when examining the UI of running applications, the MetricsSpark has a configurable metrics system based on the Dropwizard Metrics Library. This allows users to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV files. The metrics are generated by sources embedded in the Spark code base. They
provide instrumentation for specific activities and Spark components. The metrics system is configured via a configuration file that Spark expects to be present at Spark’s metrics are decoupled into different instances corresponding to Spark components. Within each instance, you can configure a set of sinks to which metrics are reported. The following instances are currently supported:
Each instance can report to zero or more sinks. Sinks are contained in the
Spark also supports a Ganglia sink which is not included in the default build due to licensing restrictions:
To install the The syntax of the metrics configuration file and the parameters
available for each sink are defined in an example configuration file, When using Spark configuration parameters instead of the metrics configuration file, the relevant parameter names are composed by the prefix
Default values of the Spark metrics configuration are as follows:
Additional sources can be configured using the metrics configuration file or the configuration parameter List of available metrics providersMetrics used by Spark are of multiple types: gauge, counter, histogram, meter and timer, see
Dropwizard library documentation for details. The following list of components and metrics reports the name and some details about the available metrics, grouped per component instance and source namespace. The most common time of metrics used in Spark instrumentation are gauges and counters. Counters can be recognized as they have the Component instance = DriverThis is the component with the largest amount of instrumented metrics
Component instance = ExecutorThese metrics are exposed by Spark executors.
Source = JVM SourceNotes:
Component instance = applicationMasterNote: applies when running on YARN
Component instance = mesos_clusterNote: applies when running on mesos
Component instance = masterNote: applies when running in Spark standalone as master
Component instance = ApplicationSourceNote: applies when running in Spark standalone as master
Component instance = workerNote: applies when running in Spark standalone as worker
Component instance = shuffleServiceNote: applies to the shuffle service
Advanced InstrumentationSeveral external tools can be used to help profile the performance of Spark jobs:
Spark also provides a plugin API so that custom instrumentation code can be added to Spark applications. There are two configuration keys available for loading plugins into Spark:
Both take a comma-separated list of class names that implement the What is Windows crash dump?A system crash (also known as a “bug check” or a "Stop error") occurs when Windows can't run correctly. The dump file that is produced from this event is called a system crash dump.
What is virtual memory paging file?On Windows 10, virtual memory (or paging file) is an essential component (hidden file) designed to remove and temporarily store less frequently use modified pages allocated in RAM (random-access memory) to the hard drive.
What is memory dump in Windows Server?A complete memory dump records all the contents of system memory when your computer stops unexpectedly. A complete memory dump may contain data from processes that were running when the memory dump was collected.
What is the page file used for?Page files enable the system to remove infrequently accessed modified pages from physical memory to let the system use physical memory more efficiently for more frequently accessed pages.
|