Skip to content
This repository has been archived by the owner on Oct 2, 2024. It is now read-only.

Trying to fix issue with NaN, Inf, -Inf #45

Merged

Conversation

dsuhinin
Copy link
Collaborator

@dsuhinin dsuhinin commented Sep 20, 2024

Current solution

to be fully compatible with original MLflow we have to use the same marshaller/unmarshaller -> protojson from google.golang.org/protobuf/encoding/protojson package. By using such marshaller/unmarshaller helps out of the box correctly handle unusual cases related to NaN, Inf and -Inf values.

Previous description (for context)

A bit of context and answers:

  • Why bring in the custom json source code?
    -- custom json serialiser allows to correctly wrap/unwrap such a constants like: Nan, Inf, +Inf, -Inf. Unfortunately original custom json serialiser worked a bit differently then MLFlow behaves. It allows parsing Nan, Inf, +Inf, -Inf when they are literal values like:
{
  "value_1": NaN,
  "value_2": Inf,
  "value_3": -Inf
}

but unfortunately Mlflow sends such values as a string values like:

{
  "value_1": "NaN",
  "value_2": "Inf",
  "value_3": "-Inf'
}

that's why I've changed it to support what Mlflow wants and as a result pull it to our codebase.

  • Why did response serialization need a change?
    -- probably the same answer like first item. All the Nan, Inf, +Inf, -Inf values should be serialised as a string values.
  • Is any python unit test solved with any of this?
    -- yes, now all the rest tests are green.

Signed-off-by: Software Developer <7852635+dsuhinin@users.noreply.github.com>
Signed-off-by: Software Developer <7852635+dsuhinin@users.noreply.github.com>
@dsuhinin dsuhinin marked this pull request as ready for review September 23, 2024 14:14
Signed-off-by: Software Developer <7852635+dsuhinin@users.noreply.github.com>
@nojaf
Copy link
Collaborator

nojaf commented Sep 23, 2024

@dsuhinin could you add a high level walkthrough of this in the description please?
By the code alone, I cannot tell why you are doing what you are doing.

Why bring in the custom json source code?
Why did response serialization need a change? Is any python unit test solved with any of this?

Signed-off-by: Software Developer <7852635+dsuhinin@users.noreply.github.com>
@dsuhinin
Copy link
Collaborator Author

@dsuhinin could you add a high level walkthrough of this in the description please? By the code alone, I cannot tell why you are doing what you are doing.

Why bring in the custom json source code? Why did response serialization need a change? Is any python unit test solved with any of this?

Is any python unit test solved with any of this?

  • probably this one you can check in README.md file. Yes, the last rest endpoint test has been fixed by this.

@dsuhinin
Copy link
Collaborator Author

@dsuhinin could you add a high level walkthrough of this in the description please? By the code alone, I cannot tell why you are doing what you are doing.

Why bring in the custom json source code? Why did response serialization need a change? Is any python unit test solved with any of this?

answered in a description.

@nojaf
Copy link
Collaborator

nojaf commented Sep 23, 2024

Thanks, Dmitry! I really appreciate the extra context.

You probably checked this, but was it not possible to hook into the unmarshalling of a single property? It feels a bit cumbersome to include all the JSON source code. We should probably document this in the README. Any new contributor with Go knowledge will most likely be put off by this JSON code. It is also unclear which parts we changed and whether it's possible to update the JSON code if needed.

If there's truly no better way, then I think this is fine, but it would be best to summarize this in an ADR note.

@dsuhinin
Copy link
Collaborator Author

dsuhinin commented Sep 23, 2024

Thanks, Dmitry! I really appreciate the extra context.

You probably checked this, but was it not possible to hook into the unmarshalling of a single property? It feels a bit cumbersome to include all the JSON source code. We should probably document this in the README. Any new contributor with Go knowledge will most likely be put off by this JSON code. It is also unclear which parts we changed and whether it's possible to update the JSON code if needed.

If there's truly no better way, then I think this is fine, but it would be best to summarize this in an ADR note.

emmm, Im not sure, about another approach really. Maybe, maybe it could be but it is hard to say. Logical way to solve this custom behaviour of MLFlow just to have own marshaler/unmarshaller. This guy also created own package for that -> https://github.com/xhhuango/json so it means that this is not really possible to simply inject something into a parser.

If there's truly no better way, then I think this is fine, but it would be best to summarize this in an ADR note.

  • I think that this is most resonable way if we want to handle custom surprises from MLFlow.

@dsuhinin
Copy link
Collaborator Author

I can't see any problem to use own json marshaller/unmarshaller, because json itself is not something that changes pretty often. unfortunately we can't extend std lib.

Signed-off-by: Software Developer <7852635+dsuhinin@users.noreply.github.com>
@nojaf
Copy link
Collaborator

nojaf commented Sep 23, 2024

Can't we do something along the lines of

package main

import (
	"encoding/json"
	"fmt"
	"strconv"
)

// Custom struct with a string field that should be converted to float during JSON marshaling/unmarshaling
type CustomStruct struct {
	Name  string  `json:"name"`
	Value FloatAsString `json:"value"`
}

// Define a custom type that implements json.Marshaler and json.Unmarshaler
type FloatAsString float64

// Custom JSON marshal function
func (f FloatAsString) MarshalJSON() ([]byte, error) {
	// Convert the float value to a string and return it as a JSON string
	return json.Marshal(fmt.Sprintf("%.2f", f))
}

// Custom JSON unmarshal function
func (f *FloatAsString) UnmarshalJSON(data []byte) error {
	// Unmarshal the data into a string, then parse the string as a float
	var s string
	if err := json.Unmarshal(data, &s); err != nil {
		return err
	}

	// Convert the string to a float
	parsed, err := strconv.ParseFloat(s, 64)
	if err != nil {
		return err
	}

	*f = FloatAsString(parsed)
	return nil
}

func main() {
	// Example JSON with a string value for "value"
	data := []byte(`{"name": "example", "value": "123.45"}`)

	var cs CustomStruct
	err := json.Unmarshal(data, &cs)
	if err != nil {
		fmt.Println("Error unmarshaling:", err)
		return
	}

	fmt.Printf("Unmarshaled struct: %+v\n", cs)

	// Marshal it back to JSON
	jsonData, err := json.Marshal(cs)
	if err != nil {
		fmt.Println("Error marshaling:", err)
		return
	}

	fmt.Println("Marshaled JSON:", string(jsonData))
}

@jgiannuzzi
Copy link
Owner

what I had in mind last time we discussed this is exactly what Florian proposed above in #45 (comment)

Signed-off-by: Software Developer <7852635+dsuhinin@users.noreply.github.com>
Signed-off-by: Software Developer <7852635+dsuhinin@users.noreply.github.com>
Comment on lines 34 to 48
func (p *HTTPRequestParser) ParseBody(ctx *fiber.Ctx, input proto.Message) *contract.Error {
if protojsonErr := protojson.Unmarshal(ctx.Body(), input); protojsonErr != nil {
if jsonErr := json.Unmarshal(ctx.Body(), input); jsonErr != nil {
var unmarshalTypeError *json.UnmarshalTypeError
if errors.As(jsonErr, &unmarshalTypeError) {
result := gjson.GetBytes(ctx.Body(), unmarshalTypeError.Field)

return contract.NewError(
protos.ErrorCode_INVALID_PARAMETER_VALUE,
fmt.Sprintf("Invalid value %s for parameter '%s' supplied", result.Raw, unmarshalTypeError.Field),
)
return contract.NewError(
protos.ErrorCode_INVALID_PARAMETER_VALUE,
fmt.Sprintf("Invalid value %s for parameter '%s' supplied", result.Raw, unmarshalTypeError.Field),
)
}
}

return contract.NewError(protos.ErrorCode_BAD_REQUEST, err.Error())
return contract.NewError(protos.ErrorCode_BAD_REQUEST, protojsonErr.Error())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nojaf @jgiannuzzi protojson doesn't provide any information about the filed, that's why I've decided to fallback to json validation. a bit ugly and strange but there is no new failing tests. what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable, we want to know what is up when things fail.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. I tried to only keep protojson but it is absolutely trick then to find the field name where it failed. We can document that probably, because it is a bit confusing.

@@ -1,4 +1,4 @@
//nolint:ireturn
// nolint
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to change in this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by some reason it started failing. Ill double check that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this before, we should update Go and/or the linter. I believe this was fixed somewhere.

.golangci.yml Outdated Show resolved Hide resolved
.golangci.yml Outdated Show resolved Hide resolved
Copy link
Owner

@jgiannuzzi jgiannuzzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please update the PR description with the current approach? and maybe keep the previous description under a heading "Previous description (for context)"?

@dsuhinin
Copy link
Collaborator Author

could you please update the PR description with the current approach? and maybe keep the previous description under a heading "Previous description (for context)"?

sure, Ill update description.

nojaf
nojaf previously approved these changes Sep 24, 2024
Copy link
Collaborator

@nojaf nojaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!
I'm glad this turned out so elegantly!
Nice job, both of you!

Signed-off-by: Software Developer <7852635+dsuhinin@users.noreply.github.com>
Signed-off-by: Software Developer <7852635+dsuhinin@users.noreply.github.com>
@dsuhinin dsuhinin merged commit 5d4641d into jgiannuzzi:main Sep 24, 2024
19 checks passed
jgiannuzzi pushed a commit that referenced this pull request Sep 24, 2024
#45

Signed-off-by: Software Developer <7852635+dsuhinin@users.noreply.github.com>
jgiannuzzi pushed a commit to mlflow/mlflow-go-backend that referenced this pull request Sep 24, 2024
jgiannuzzi/mlflow-go#45

Signed-off-by: Software Developer <7852635+dsuhinin@users.noreply.github.com>
jgiannuzzi pushed a commit to mlflow/mlflow-go-backend that referenced this pull request Sep 24, 2024
jgiannuzzi/mlflow-go#45

Signed-off-by: Software Developer <7852635+dsuhinin@users.noreply.github.com>
jgiannuzzi pushed a commit to mlflow/mlflow-go-backend that referenced this pull request Oct 1, 2024
jgiannuzzi/mlflow-go#45

Signed-off-by: Software Developer <7852635+dsuhinin@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants