Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to new nwp-consumer #697

Open
5 of 10 tasks
peterdudfield opened this issue Nov 28, 2024 · 18 comments
Open
5 of 10 tasks

Update to new nwp-consumer #697

peterdudfield opened this issue Nov 28, 2024 · 18 comments
Assignees

Comments

@peterdudfield
Copy link
Contributor

peterdudfield commented Nov 28, 2024

Detailed Description

Update to new NWP-consumer, after major refactor

Context

  • some new env vars? - Done ✅ in secrets, will need updating ❌ for non-secrets in terraform
  • might need to update the forecast apps too ❌

Possible Implementation

  • India dev ECMWF
  • India dev GFS
  • India dev MO
  • India pro ECMWF
  • India pro GFS
  • India pro MO
  • Uk dev UKV
  • Uk dev ECMWF
  • Uk pro ECMWF
  • Uk pro UKV
@peterdudfield
Copy link
Contributor Author

Started doing this in #715

@peterdudfield
Copy link
Contributor Author

This is for version 1.0.5.

@peterdudfield
Copy link
Contributor Author

Version 1.0.5 is 331 MB (4556 files), where as 0.5.33 is 34.3MB (182 files)

@devsjc
Copy link
Collaborator

devsjc commented Dec 16, 2024

This is because it pulls in all the ECMWF live data, which covers both the UK and India. Crops are taken when the data is loaded anyway by the forecasters so it shouldn't be a problem downstream.

@peterdudfield
Copy link
Contributor Author

This is because it pulls in all the ECMWF live data, which covers both the UK and India. Crops are taken when the data is loaded anyway by the forecasters so it shouldn't be a problem downstream.

ok, I could clip to UK, before it does some regridding to help

@peterdudfield
Copy link
Contributor Author

peterdudfield commented Jan 2, 2025

NWP Consumer 1.0.7 works for PVnet 2.4.18 and for PVNet DA 2.4.18 for ECMWF

EDIT: not yet!

@peterdudfield
Copy link
Contributor Author

Made a bug report here - joblib/joblib#1637, just to see if anyone can help

@peterdudfield
Copy link
Contributor Author

PVnet and PNVnet DA works for 2.4.19

@peterdudfield
Copy link
Contributor Author

PVnet on dev, makes the foreacst look weird. need to investigate

@peterdudfield
Copy link
Contributor Author

tested ECMWF India and saved to s3://india-nwp-development/ecmwf/data//2025010706.zarr/

@peterdudfield
Copy link
Contributor Author

This should solve PVnet 19.30 bug

@peterdudfield
Copy link
Contributor Author

UKV does not work yet, for new nwp-consumer

@peterdudfield
Copy link
Contributor Author

peterdudfield commented Jan 28, 2025

Currently all deployed on dev india but forecast looks different, investigating why

This was due to GFS data looking differently.

Looking at the raw data, prate looks fine, wind speeds looks fine. Tempeature looks very different

dev
Image

pro

Image

@peterdudfield
Copy link
Contributor Author

u10 values looks quite different too. I wonder if we arent taking surface variables or something like htat?

@peterdudfield
Copy link
Contributor Author

In the batches

dev
Image

pro

Image

@devsjc
Copy link
Collaborator

devsjc commented Jan 28, 2025

Pre refactor consumer (https://github.com/openclimatefix/nwp-consumer/blob/612bb6f9dbd09e52283f966485a1415338826ccb/src/nwp_consumer/internal/inputs/noaa/aws.py#L93)

# URLs

filename=f"gfs.t{it.hour:02}z.pgrb2.1p00.f{step:03}"
url=f"{self.baseurl}/gfs.{it.strftime('%Y%m%d')}/{it.hour:02}/atmos"

# Process
# * Splits files, then re merges

surface = [d for d in ds if "surface" in d.coords]
heightAboveGround = [d for d in ds if "heightAboveGround" in d.coords]
isobaricInhPa = [d for d in ds if "isobaricInhPa" in d.coords]

for i, d in enumerate(surface):
    unwanted_variables = [v for v in d.data_vars if v not in self.parameters]
    surface[i] = d.drop_vars(unwanted_variables)
for i, d in enumerate(heightAboveGround):
    unwanted_variables = [v for v in d.data_vars if v not in self.parameters]
    heightAboveGround[i] = d.drop_vars(unwanted_variables)
for i, d in enumerate(isobaricInhPa):
    unwanted_variables = [v for v in d.data_vars if v not in self.parameters]
    isobaricInhPa[i] = d.drop_vars(unwanted_variables)

surface_merged = xr.merge(surface, compat="override").drop_vars(
    ["unknown_surface_instant", "valid_time"],
    errors="ignore",
)
del surface
hag_merged = xr.merge(heightAboveGround).drop_vars("valid_time", errors="ignore")
del heightAboveGround
iso_merged = xr.merge(isobaricInhPa).drop_vars("valid_time", errors="ignore")
del isobaricInhPa

total_ds = (
    xr.merge([surface_merged, hag_merged, iso_merged])
    .rename({"time": "init_time"})
    .expand_dims("init_time")
    .expand_dims("step")
    .transpose("init_time", "step", ...)
    .sortby("step")
    .chunk({"init_time": 1, "step": 1})
)
del surface_merged, hag_merged, iso_merged

Refactored consumer:

# Process

dss: list[xr.Dataset] = cfgrib.open_datasets(
	path.as_posix(),
	backend_kwargs={
	    "squeeze": True,
	    "ignore_keys": {
	        "levelType": ["isobaricInhPa", "depthBelowLandLayer", "meanSea"],
	    },
	    "errors": "raise",
	    "indexpath": "",  # TODO: Change when above TODO is resolved
	},
)

processed_das: list[xr.DataArray] = []
for i, ds in enumerate(dss):
	ds = entities.Parameter.rename_else_drop_ds_vars(
    	ds=ds,
    	allowed_parameters=NOAAS3RawRepository.model().expected_coordinates.variable,
	)
# Ignore datasets with no variables of interest
if len(ds.data_vars) == 0:
    continue
# Ignore datasets with multi-level variables
# * This would not work without the "squeeze" option in the open_datasets call,
#   which reduces single-length dimensions to scalar coordinates
if any(x not in ["latitude", "longitude" ,"time"] for x in ds.dims):
    continue
da: xr.DataArray = (
    ds
    .drop_vars(names=[
        c for c in ds.coords if c not in ["time", "step", "latitude", "longitude"]
    ])
    .rename(name_dict={"time": "init_time"})
    .expand_dims(dim="init_time")
    .expand_dims(dim="step")
    .to_dataarray(name=NOAAS3RawRepository.model().name)
)
da = (
    da.drop_vars(
        names=[
            c for c in da.coords
            if c not in NOAAS3RawRepository.model().expected_coordinates.dims
        ],
    )
    .transpose(*NOAAS3RawRepository.model().expected_coordinates.dims)
    .assign_coords(coords={"longitude": (da.coords["longitude"] + 180) % 360 - 180})
    .sortby(variables=["step", "variable", "longitude"])
    .sortby(variables="latitude", ascending=False)
)

There's a difference in the processing step between the two, where the pre-refactor consumer split and then re-merged, the new refactor can avoid this since it writes regionally so each dataset can be written individually. I'm investigating to see whether this logic difference could be the cause of it though.

@devsjc
Copy link
Collaborator

devsjc commented Jan 28, 2025

Temperature exists at multiple levels in the grib of course, so I'm wondering whether the wrong one (I.e. not the surface temperature) is being surfaced in the consumer.

$ grib_ls -w shortName=t gfs.t00z.pgrb2.1p00.f000.grib
gfs.t00z.pgrb2.1p00.f000.grib
edition      centre       date         dataType     gridType     stepRange    typeOfLevel  level        shortName    packingType
2            kwbc         20250128     fc           regular_ll   0            isobaricInPa  1            t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInPa  2            t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInPa  4            t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInPa  7            t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInPa  10           t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInPa  20           t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInPa  40           t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInPa  70           t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  1            t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  2            t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  3            t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  5            t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  7            t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  10           t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  15           t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  20           t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  30           t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  40           t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  50           t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  70           t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  100          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  150          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  200          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  250          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  300          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  350          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  400          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  450          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  500          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  550          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  600          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  650          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  700          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  750          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  800          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  850          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  900          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  925          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  950          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  975          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            isobaricInhPa  1000         t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            surface      0            t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            tropopause   0            t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            maxWind      0            t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            heightAboveGround  80           t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            heightAboveGround  100          t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            heightAboveSea  1829         t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            heightAboveSea  2743         t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            heightAboveSea  3658         t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            pressureFromGroundLayer  3000         t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            sigma        1            t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            potentialVorticity  2000         t            grid_complex_spatial_differencing
2            kwbc         20250128     fc           regular_ll   0            potentialVorticity  2147485648   t            grid_complex_spatial_differencing
53 of 696 messages in gfs.t00z.pgrb2.1p00.f000.grib

53 of 696 total messages in 1 files

@devsjc
Copy link
Collaborator

devsjc commented Jan 28, 2025

See openclimatefix/nwp-consumer#232. No longer pulls "t" to avoid overriding "t2m".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants