Massive slowdown of DAG File processor due to JSON schema upgrade in Airflow's core. #28445
Replies: 3 comments
-
Thanks for opening your first issue here! Be sure to follow the issue template! |
Beta Was this translation helpful? Give feedback.
-
Thanks for detailed report. They had similar issue reported before as 4.0.1 was degrading performance where refs wer used: python-jsonschema/jsonschema#853 but apparently they solved it in 4.3.1. Could you please provide some examples of generated jsonschema to validate and open an issue in jsonschema repository detailing it - since you have all the test scenarios, we will not know if the problem will be fixed. They seem to react very fast and fix such problems - and likely they will need additional informatoin and iterations, so it makes sense that you will open such issue (you can refer to this one and even copy the content - but maybe provide more info on the validated content/ Also before maybe you should try 4.3.1 and see if it solves the problem - maybe the latest version in has some regression. And maybe then it will be easy for you to bisect the version that causes the issue? |
Beta Was this translation helpful? Give feedback.
-
Since we have not heard from the user - converting it into a discussion. Shall there be more information/data provided we can consider what to do with it. |
Beta Was this translation helpful? Give feedback.
-
Apache Airflow version
2.4.3
What happened
We recently updated our dev environment from airflow 2.2.5 (python 3.9) to 2.4.3 (python 3.10). For our workloads, we use a DAG Factory that parses JSONS and converts them into DAGs. In airflow 2.2.5, the DAG Factory needed approximately 30-140 seconds to generate 100 DAGs. In airflow 2.4.3, the same Dags required considerably more time to load (from 3x to over 5x at some tests).
We investigated by using scalene (python profiler) by running the DagFileProcess directly and discovered the following:
A huge percentage of CPU time was spent on json validation at line 91 of models/params. Indeed, our Factory does generate quite a few params per DAG it creates, so it would make sense for it to need some time to validate all of them per DAG. However, upgrading airflow shouldn't result in such a big, flat increase in parsing time, and we figured that jsonschema was the probable issue.
To verify that the JSON validation was the reason for the increase, we checked airflow's dependencies and found out that in the official image for 2.4.3 jsonschema version 3.2.0 is used, in airflow 2.2.5, jsonschema 4.17.3 is used instead.
As a final test, we uninstalled jsonschema version 4.17.3 from our image and replaced it with 3.2.0. The DAG Factory immediately run as expected, taking approximately 30 seconds to load 100 DAGs when the cluster was under little load, or about 100-140 when the cluster was under heavy load.
Example logs:
Version 2.4.3:
{{processor.py:176}} INFO - Processing /opt/airflow/dags/{other_folders}/{file_name}.py took 125.556 seconds
Version 2.2.5:
{{processor.py:249}} WARNING - Killing DAGFileProcessorProcess (PID=5943)
This occured constantly with a timeout setting of 300 seconds
What you think should happen instead
Airflow should require the same time to parse 100 DAGs in both versions.
How to reproduce
Create a DAG with many params (ideally over 20-30, the more the better), using mainly string, integer and nested dict types. Check how long it takes to load in airflow 2.2.5. Then use airflow 2.4.3. There should be a noticeable difference in loading times (at least 3x).
Operating System
Debian GNU Linux 11
Versions of Apache Airflow Providers
No relevant providers used
Deployment
Other Docker-based deployment
Deployment details
We use an AKS cluster in combination with a customised Docker image stemming from the official full docker image (not slim).
Anything else
This problem may be very noticable for us and our deployment due to the way we build DAGs (many params), but it should impact all DAG generation where params are used.
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions