Skip to content

Commit

Permalink
doc: update kustomize README & add FAQ about impact on IDE perf
Browse files Browse the repository at this point in the history
  • Loading branch information
yaohui-wyh committed Aug 28, 2022
1 parent e28acf4 commit 2eb96de
Show file tree
Hide file tree
Showing 9 changed files with 134 additions and 7 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,5 @@ bin/
### Mac OS ###
.DS_Store

stat.log
stat.log
*-dev.md
45 changes: 39 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,11 @@ StatisticsView subscribes to [listeners](https://plugins.jetbrains.com/docs/inte
Whenever an event is received, StatisticsView records the event type, timestamp, and fileUri (only if the event is file related). StatisticsView won't process events for files that are not valid source files (e.g. binary files, library folders, intentionally "marked as excluded" directories would be ignored).

### Data Storage
### Data storage

StatisticsView saves events directly to a disk file to ensure minimal dependency. This brings some limitations, such as it could be impossible to make complex queries on the file-based datasets, e.g. filter by time range, group by directory, etc. Since we need to show summarized information in real-time in the IDE Project view, the query could only be performed against some pre-aggregated data structure in memory.
StatisticsView saves events directly to a disk file to ensure minimal dependency. This brings some limitations, such as it could be impossible to make complex queries on the file-based datasets, e.g. filter by time range, group by directory, etc. Since we need to show summarized information in real-time in the IDE Project view, the query could only be performed against some pre-aggregated data structure in memory.

The write load is relatively low (10+ writes per second at most after debounced / dedup) and events could be safely queued on a file writer and saved to disk periodically in the background thread (EDT) which would not impact IDE performance.
The write load is relatively low (10+ writes per second at most after debounced / dedup) and events could be safely queued on a file writer and saved to disk periodically in the background thread which would not impact IDE performance.

> This pre-aggregated hashMap + async file writer practice couldn't handle back-filling situations: e.g. when the user performs a file renamed action in IDE, events with the previous fileUri should be updated.
Expand All @@ -79,15 +79,43 @@ Example of log file:
{"ts":1661075850804,"action":"IDE_DEACTIVATED","file":"","tags":{}}
```

## Data Analysis & IDE Productivity [WIP]
## Impact on IDE (FAQ)

### Performance

1. If I `enable logging` all the time, will it slow down IDE startup?
- No. Long-running & IO-bound tasks (e.g. serialize/deserialize events data, R/W log file) are executed on the pooled thread and will not affect the IDE responsiveness. You can check the startup cost for plugins using the IntelliJ IDE's built-in diagnostic tools: `IDE Help | Diagnostic Tools | Analyze Plugin Startup Performance`:
- <img src="docs/ide-diagnostic.png" width="480" alt="diagnostic"/>
- IDE events handling are performed in the event dispatch thread (EDT) which will not block the UI thread.
2. Since events keep accumulating, will they eat up my memory?
- The raw events are queued in memory and written to the disk file periodically, and once the writing is done, they are cleared from the queue and got garbage collected.
3. The only negative performance impact is the Project view if you turn on `Show File Statistics`/`Show Directory Statistics` which adds statistical information next to each file/directory node. The ProjectView rendering finishes instantly most of the time, however if the project consists of a huge amount of files, the total rendering cost could be significant. If you encounter some sluggishness with the Project view, just turn off the `Show ... Statistics` actions, and turn on them when you need to see the statistical information.

### Data

1. Where is the raw event log stored?
- The raw event log (`stat.log`) is saved in the project `.idea` folder, and IDE doesn't need your permission to read/write from somewhere other than the current project's folder. It's by nature project-wide, and you won't bother cleaning up the log once you delete the IDE project. It would also be less likely to mess up your VCS since most projects' `.idea` folder is already ignored.
2. Can I safely share the log file with someone else?
- Sure. No personal identity information (e.g. username, hostname) is logged, and all `fileUri` are saved as the relative path to the project root.
3. Any telemetry or data reporting?
- No and never.
4. Log rotation / data compression
- Will be provided in v1.0.2.

## Data analysis & IDE productivity [WIP]

> Note: most of this part is not closely related to the duty of the StatisticsView plugin. There are many similarities between IDE code activities and microservices observability concepts.
### Import event data into PostgreSQL [Experimental]

checkout [data-analysis-poc](./data-analysis-poc) for a k8s manifest for importing event logs into PostgreSQL and visualizing in Grafana dashboard (WIP)
checkout [data-analysis-poc](./data-analysis-poc) for a k8s manifest for importing event logs into PostgreSQL and visualizing in Grafana dashboard

<img src="docs/grafana.png" width="800"/>
<img src="docs/grafana-dashboard.png" width="800" alt="grafana dashboard"/>

## Acknowledgment

- Thank [@dkandalov] for creating the awesome plugin [activity-tracker](https://github.com/dkandalov/activity-tracker).
- Thank [@unknwon] for helping with documentation and guides for the OSS project structure.

<!-- Badges -->

Expand All @@ -97,3 +125,8 @@ checkout [data-analysis-poc](./data-analysis-poc) for a k8s manifest for importi
[plugin-downloads-svg]: http://img.shields.io/jetbrains/plugin/d/19747
[plugin-rating-svg]: http://img.shields.io/jetbrains/plugin/r/stars/19747
[plugin-version-svg]: https://img.shields.io/jetbrains/plugin/v/19747?label=version

<!-- Badges -->

[@dkandalov]: https://github.com/dkandalov
[@unknwon]: https://github.com/unknwon
87 changes: 87 additions & 0 deletions data-analysis-poc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
## About

This is a proof-of-concept data analytics stack (using [Kustomize](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/)), which includes deployments of:

- [TimescaleDB](https://www.timescale.com/) (an extension on top of PostgreSQL) with an [SQL script](./kustomize/postgres/100_init.sql) converting raw IDE event logs to the normalized event table
- [Grafana](https://grafana.com/) for data visualization

> To run this, you will need a working Kubernetes cluster, and access via [kubectl](https://kubernetes.io/docs/tasks/tools/) (at least version 1.14, otherwise you need to install the standalone [kustomize](https://kustomize.io/) command-line tool). To deploy and run all the workloads, you will need about 2 CPU cores and 2GB RAM available in your k8s cluster.
### Setup

1. Copy your IDE project's `stat.log` to `<project-root>/data-analysis-poc/kustomize/postgres/stat.log`. You can locate the `stat.log` file in your system file manager via the plugin's `Show Data File` action.
2. Apply the manifests by running `kubectl -k <project-root>/data-analysis-poc/kustomize`:

```
$ kubectl -k ./data-analysis-poc/kustomize
configmap/ide-data-log-kfkbf22df7 created
configmap/init-sql-cmfcc6kc2k created
service/grafana created
service/postgres created
deployment.apps/grafana created
deployment.apps/postgres created
```
3. Wait for all workloads to be ready: `kubectl get pod -w`
```
NAME READY STATUS RESTARTS AGE
grafana-669f94f445-9cg9n 1/1 Running 0 8m44s
postgres-96bd574d-2pgdj 1/1 Running 0 8m44s
```
4. Connect to Grafana web server via port-forward: `kubectl port-forward service/grafana 3000`
```
Forwarding from 127.0.0.1:3000 -> 3000
Forwarding from [::1]:3000 -> 3000
```
5. Now Grafana can be visited from your browser: `http://localhost:3000`. Enter `admin` for username and password (checkout more at [Sign in to Grafana](https://grafana.com/docs/grafana/latest/setup-grafana/sign-in-to-grafana/))
6. Create a PostgreSQL Datasource with settings:
- PostgreSQL Connection
- Host: `postgres:5432`
- Database: `postgres`
- User: `postgres` | Password: `postgres`
- TLS/SSL Mode: `disabled`
- PostgreSQL details
- versions: `12+`
- TimescaleDB: `enabled`
- <img src="../docs/postgres-config.png" width="640"/>
7. Now you can explore the datasets by running some SQL
- <img src="../docs/grafana-explore.png" width="520"/>
8. You can create a Grafana dashboard, such as using a State Timeline panel to visualize how IDE file open events change over time.
- <img src="../docs/grafana-dashboard.png" width="520"/>
- a sample query with TimescaleDB hyperfunctions `time_bucket`, `first`, `last`. Some great tutorials can be found at [Getting Started with Grafana and TimescaleDB](https://docs.timescale.com/timescaledb/latest/tutorials/grafana/)
```sql
SELECT
time_bucket('30s', ts) AS time,
file,
count(*) AS cnt
FROM
(SELECT first(ts, ts) AS s, last(ts, ts) AS e FROM ide_event) AS time_range,
ide_event
WHERE
file != ''
AND ts >= time_range.s
AND ts <= time_range.e
GROUP BY time, file
ORDER BY time
```
### Debug
#### 1. TimescaleDB (Postgres) Pod failed to start
- this is mostly caused by invalid `stat.log` (e.g. unexpected empty lines), check the log via `kubectl logs pod <postgres-pod>`. If the `stat.log` is parsed and imported properly, you could see the init script executed with logs:
```
...
/usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/100_init.sql
CREATE TABLE
COPY 814
SELECT 814
...
```
#### 2. Grafana failed to run queries
- Check if the query (raw SQL) could be executed properly. You could `kubectl exec` onto the postgres Pod and use `psql` to open up a SQL console.
- In case Grafana web service is unreachable: rerun service `port-forward` and try visiting Grafana again (the kubectl port-forwarding connection could be broken/dropped).
File renamed without changes
Binary file added docs/grafana-explore.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/ide-diagnostic.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/postgres-config.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ import org.yh.statistics.StatisticsTag.FILE_LINE_OF_CODE
import org.yh.statistics.model.StatisticsEvent


/**
* Listen to Editor fileOpen / fileClose event
*/
class MyFileEditorManagerListener(val project: Project) : FileEditorManagerListener {

private val settings = project.service<PluginSettings>()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ import org.yh.statistics.StatisticsData
import org.yh.statistics.model.StatisticsEvent


/**
* Listen to IDE activated/deactivated events (i.e., IDE move back to foreground or lost focus)
*/
class MyFrameStateListener : FrameStateListener {

override fun onFrameDeactivated() {
Expand Down

0 comments on commit 2eb96de

Please sign in to comment.