Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Find-IshDocumentObj and Find-IshPublicationOutput cmdlets for server-side out-of-memory protection by time slicing #49

Open
3 of 12 tasks
ddemeyer opened this issue Dec 17, 2018 · 5 comments

Comments

@ddemeyer
Copy link
Contributor

ddemeyer commented Dec 17, 2018

Shorter crisper interactive experience is nice. Programming-wise, an explicit -IshSession is still preferred. Remember you can still use two sessions to compare or migrate content. Attempted as part of #45

  • Requires Add Get-IshBackgroundTask #45 merge for New-IshSession adaptions, etc
  • Get-IshEvent protected by -ModifiedSince defaulting to last day
  • Get-IshBackgroundTask protected by -ModifiedSince defaulting to last day
  • Find-IshBaseline used to return everything, low risk on bringing the server down
  • Find-IshEDT used to return everything, low risk on bringing the server down
  • Find-IshOutputFormat used to return everything, low risk on bringing the server down
  • Find-IshUserGroup used to return everything, low risk on bringing the server down
  • Find-IshUserRole used to return everything, low risk on bringing the server down
  • Find-IshUser used to return everything, medium risk on bringing the server down
  • Find-IshDocumentObj used to return everything, high risk on bringing the server down but would mean breaking behavior compatibility
  • Find-IshPublicationOutput used to return everything, high risk on bringing the server down but would mean breaking behavior compatibility
  • Find-IshAnnotation Add cmdlets *-IShAnnotation #78 will return everything, medium risk on bringing the server down but would mean breaking behavior compatibility

Thinking out loud... options are...

  1. Keep backward behavior compatibility even if having an implicit IshSession a single Find-IshDocumentObj could bring everything to its knees. Current 0.x behavior, no code change required.
  2. Keep backward behavior compatibility but time slice by adding optional -ModifiedSince (DeltaDateTimeStart, the year 2000 or so), -ModifiedUntil (DeltaDateTimeEnd, so Now+1day) and -ModifiedStep (DeltaTimeSpan, so per year?). In practice the API calls would use a MODIFIED-ON filter to return less from over the API function in one go, but if not pipelined in PowerShell the client-side memory could still explode. Preferably with Write-Progress like behavior. Preferred option if I have the time, cleans up the ISHInsights DeltaCrawl code base as well.
    • Note that only Find-IshDocumentObj and Find-IshPublicationOutput need this protection I feel. All others are optional for consistency but can be implemented already over -MetadataFilter
  3. Break compatibility. Do the above -ModifiedSince (DeltaDateTimeStart, defaulting to last day), -ModifiedUntil (DeltaDateTimeEnd, so Now+1day) and -ModifiedStep (DeltaTimeSpan, so more than one day).
@ddemeyer ddemeyer added this to the v0.7 milestone Dec 17, 2018
@ddemeyer ddemeyer added the could label Dec 17, 2018
ddemeyer added a commit that referenced this issue Dec 17, 2018
…PublicationOutput work without parameters potentially returning full database, ROLLING BACK the explicit last-day-MetadataFilter that was added in this branch. Created separate issue #49 for tracking
@ddemeyer ddemeyer modified the milestones: v0.7, v0.8 Mar 21, 2019
ddemeyer added a commit that referenced this issue Mar 22, 2019
* Extended Get-IshTypeFieldDefinition with ISHBackgroundTask type

* #45 Basic skeleton of Get-IshBackgroundTask is there, all code requires clean up. Up next, the IshBackgroundTask xml parsing and console formatting.

* #45 Basic formatting, basic test, and xml object parsing is there. Next check and test all cmdlet parameters

* #45 Get-IshMetadataField accepts IshBackgroundTask

* #45 Get-IshBackgroundTask is there with doubtfully useful pipelining with test.

* #45 Corrected test and added Get-Help

* #45 More sturdy test

* #45 Introduced NameHelper where I chose underscore as separator. Currently Wrap-function is in the ISHType cmdlet like ISHBackgroundTask (to push down to TrisoftCmdlet all CardField type objects need a generic root class)

* #45 Review by hvermeiren on IsBasic and IsDescriptive for ISHBackgroundTask. Removed object.EventType. Implemented 13.0.2+ check. Wide format table layout.

* #45 IshSession is no longer mandatory. New-IshSession will save in SessionStore every time. Next optional -RequestedMetadata transformation like Descriptive/Basic/System/All

* #45 Parameter value autocompletion on field Name and LovId now respects SessionState

* #45 Optional -RequestedMetadata that now initializes to IShSession.DefaultRequestedMetadata (defaults to Basic, old behavior Descriptive, don't care about performance use All). Tweaked some tests. Up next Get-IshEvents over IshTypeFieldSetup or IshSession not mandatory everywhere...

* #46 Optional IshSession in cmdlets for Application, Baseline and Field... slower test though because of superfluous New-IshSession

* #46 Optional IshSession in cmdlets for Application, Baseline, Field, DocumentObj...More Get-Help examples... but slower tests though, because of superfluous New-IshSession

* #46 Optional IshSession in cmdlets for Application, BackgroundTask, Baseline, DocumentObj, EDT, Event, Feature, Field, Folder, ListOfValues, OutputFormat,...More Get-Help examples... but slower tests though, because of superfluous New-IshSession

* #46 Optional IshSession in cmdlets for Application, BackgroundTask, Baseline, DocumentObj, EDT, Event, Feature, Field, Folder, ListOfValues, OutputFormat, PublicationOutput, Settings, User, UserGroup, UserRole...More Get-Help examples... but slower tests though, because of superfluous New-IshSession

* #48 Implemented WrapAsPSObjectAndAddNoteProperties-Switch-block for DocumentObj... next is Format.ps1xml (see Find-IshDocumentObj example)... then all other classes

* #48 Removed WrapAsPSObjectAndAddNoteProperties-Switch-block for BackgroundTask/DocumentObj... implemented IshBaseObject allowing a one-time implementation of WriteObject in TrisoftCmdlt. Next is Format.ps1xml (see Find-IshDocumentObj example)... then all other classes

* #48 TrisoftCmdlet::WriteObject using IshBaseObject implemented for BackgroundTask, Baseline, DocumentObj. Next is Format.ps1xml (see Find-IshDocumentObj example)... then all other classes

* #48 TrisoftCmdlet::WriteObject using IshBaseObject implemented for BackgroundTask, Baseline, DocumentObj, EDT, Folder, OutputFormat, PublicationOutput, User, UserGroup, UserRole ...Next is more derived IshObject types to allow Format.ps1xml specialized rendering (see Find-IshDocumentObj example)... and Get-IshEvent

* #46 Optional IshSession in Get-IshTypeFieldDefinition

* #48 TrisoftCmdlet::WriteObject strips hyphens from a field name like DOC-LANGUAGE to avoid clumsy code like $ishObject.'doc-language' ...Next is more derived IshObject types to allow Format.ps1xml specialized rendering (see Find-IshDocumentObj example)... and Get-IshEvent

* #46 Optional IshSession makes single Find-IshDocumentObj and Find-IshPublicationOutput work without parameters potentially returning full database, putting in explicit last-day-MetadataFilter

* #46 Optional IshSession makes single Find-IshDocumentObj and Find-IshPublicationOutput work without parameters potentially returning full database, ROLLING BACK the explicit last-day-MetadataFilter that was added in this branch. Created separate issue #49 for tracking

* #48 Factory wrapping base type IshObject allowing .Format.ps1xml formatting now implemented for *-IshDocumentObj cmdlets

* #48 Factory wrapping base type IshObject allowing .Format.ps1xml formatting now implemented for *-IshDocumentObj cmdlets, tuned to use cmdlets ISHType for code consistency

* #48 Factory wrapping base type IshObject allowing .Format.ps1xml formatting now implemented for *-IshPublicationOutput cmdlets

* #46 Optional IshSession in Get-IshTypeFieldDefinition, making sure that TriDKXmlSetupFilePath overload still works

* #48 Factory wrapping base type IshObject allowing .Format.ps1xml formatting now implemented for *-IshBaseline cmdlets and IshBaselineItem prints sortable date

* #48 Factory wrapping base type IshObject allowing .Format.ps1xml formatting now implemented for *-IshEDT cmdlets

* #48 Factory wrapping base type IshObject allowing .Format.ps1xml formatting now implemented for *-IshFolder cmdlets

* #48 Factory wrapping base type IshObject allowing .Format.ps1xml formatting now implemented for *-IshOutputFormat cmdlets

* #48 Factory wrapping base type IshObject allowing .Format.ps1xml formatting now implemented for *-IshOutputFormat cmdlets, fixed OutputType, added samples and allowed empty value

* #48 Factory wrapping base type IshObject allowing .Format.ps1xml formatting now implemented for *-IshUser cmdlets and cleanup of double IshFolder ps1xml entry

* #48 Factory wrapping base type IshObject allowing .Format.ps1xml formatting now implemented for *-IshUserGroup cmdlets

* #48 Factory wrapping base type IshObject allowing .Format.ps1xml formatting now implemented for *-IshUserRole cmdlets

* #48 Find-IshDocumentObj over DocumentObj25.Find still returns ISHReusedObj, so altered default to avoid this. Quick performance test.

* #4 Reduce warning count in solution by replacing ArgumentNullException with ArgumentException, add readonly list,

* #4 Reduce warning count in solution by replacing ArgumentNullException with ArgumentException, add readonly list, added globalsuppressions.cs, wrapped ShouldProcess, refactored using statements

* #4 Adding AddIshPublicationOutput.Tests.ps1

* #47 Add GetIshEvent.Tests.ps1 testing the current old-style cmdlet, needs more work, broken at the moment

* #47 Add GetIshEvent.Tests.ps1 testing the cmdlet, ready for the code changes. Extended Get-IshTypeFieldDefinition with ISHEvent type.

* #47 Finally broke and rewrote Get-IshEvent to use PSNoteProperty with standardized property names. Matching GetIshEvent.Tests.ps1. And proper Format.ps1xml

As part of the v0.7 milestone, this request 
- Closes #44 
- Closes #45 
- Closes #46 
- Closes #47 
- Closes #48 
- Closes #50
@ddemeyer ddemeyer removed this from the v0.8 milestone May 9, 2019
@HildeVermeiren
Copy link
Contributor

First of all, I think we kind of want to have backward compatibility behavior, but still want to protect the application and database server

So I would do something like...

  • introduce 2 new optional parameters: -ModifiedSince and -ModifiedStep (possibly also -ModifiedBefore)
  • if only a metadata filter is provided, then we have the current behavior
    • So if they did not filter wisely that might still give an issue
  • if no metadata filter and none of the new optional parameters is provided, I would throw an exception to protect the system
    • I don't see how you can have a good default for the new optional parameters that will make sense for all customers
  • if only -ModifiedSince is provided, I would either do a smart default value for -ModifiedStep (per month if -ModifiedSince is less than 2 year, per year if -ModifiedSince is more than 2 year) or throw an exception that you also need to specify -ModifiedStep
  • if the metadata filter is provided and -ModifiedSince (and -ModifiedStep) are also provided, then I would throw if the MODIFIED-ON is present in the metadata filter

@ddemeyer
Copy link
Contributor Author

Thanks, more food for thought... It looks like we are heading for option 2 so backwards compatible only doing x times more API calls then before, so theoretically somewhat slower but much more predictable for larger setups. On bigger database the Find cmdlet without any filter went wrong anyway as you attempt to pull the full database over.

  • The MODIFIED-ON will be on language level, not logical otherwise you might miss updates of blobs
  • The ModifiedSince default value would be the year 2000 for now, birth date of any database
  • The -ModifiedStep default value for PublicationOutput is 1 year while for DocumentObj it should be smaller like 2 months. Note that on very big databases, or actually databases where in those 2 months a big legacy import happened, it could still go wrong server-side or client-side - in those scenario you can overwrite the defaults provided.

Now a legacy conversion could be something better than below, the Find cmdlet could even show a progress bar

Find-IshDocumentObj -MetadataFilter (Set-IshMetadataFilterField -Level Lng -Name MODIFIED-ON -FilterOperator GreaterThan -Value 01/09/2019) |
Set-IshMetadataField -Name FCOMMENTS -Level Lng -Value "Hilde was here" | 
Set-IshDocumentObj

@ddemeyer
Copy link
Contributor Author

In all scenarios the -ModifiedStep goes up, but you could also count down. So from very recent to the birth date of the database. This way you get recent results first which often make more sense.

@ddemeyer
Copy link
Contributor Author

Was looking for more standardized terminology and a way to make querying from Now to database birth date the default. So still pursuing backward compatible option 2.

  • -ModifiedBefore (instead of -ModifiedUntil) would default to Now+1day (DeltaDateTimeEnd)
  • -ModifiedAfter (instead of -ModifiedSince) would default to database birth date, so year 2000 (DeltaDateTimeStart). Theoretically the last server-side Find operations will return empty results quite quickly.
  • -ModifiedStep default value for PublicationOutput is a Timespan of 1 year while for DocumentObj it should be smaller like 3 months (DeltaTimeSpan). Note that on very big databases, or actually databases where in those months a big legacy import happened, it could still go wrong server-side or client-side - in those scenario you can overwrite the defaults provided. The step would always be used to step back into history.
  • The three above parameters are all optional, and all have defaults protecting the server-side system. No need to throw. In case -MetadataFilter is offered, then we suggest to simply merge, if that causes 3+ MODIFIED-ON filters, so be it - potentially push a Write-Warning out.
  • Document the potential performance slowdown which can be bypassed by explicitly passing a massive -ModifiedStep, but would need that
  • Write-Progress is a must; showing the exact count of server-side Find operations and a progress bar.
  • As only implementation for Find-IshDocumentObj and Find-IshPublicationOutput is really required. The MODIFIED-ON will be on language level, not logical otherwise you might miss updates of blobs

Considered but not required for closing this issue

  • Align parameter set across all Find-* cmdlets, probably Find-IshAnnotation first using MODIFIED-ON on annotation level
  • Customize to other date fields, requiring -ModifiedFieldName and -ModifiedFieldLevel (on multi-card object types, always None on single card types)

@ddemeyer ddemeyer added this to the v0.12 milestone Mar 31, 2020
@ddemeyer ddemeyer changed the title Find-cmdlets without mandatory parameters need out-of-memory protection by time slicing Extend Find-IshDocumentObj and Find-IshPublicationOutput cmdlets for server-side out-of-memory protection by time slicing Mar 31, 2020
@ddemeyer
Copy link
Contributor Author

ddemeyer commented Apr 5, 2020

Investigating further, the idea is good, the performance and accuracy guarantees however not. ISHRemote tries to be version-agnostic where possible, for #49 there are two reasons to put this idea on hold:

  1. On older Content Manager versions only one date filter (so MODIFIED-ON) will be passed to the initial database query for an API Find operation. This means that potentially all objects are retrieved from the database server to the application server, before they get filtered again to be pushed to the client (so ISHRemote).
  2. On older Content Manager versions, on initial object creation (e.g. Add-IshDocumentObj), the MODIFIED-ON field is not filled in, only the CREATED-ON field as they are in essence the same. So a null on MODIFIED-ON simply complicates matters.

As a reminder, the main problem is how to iterate all data, even for large enterprise sets of data. Where this idea was to iterate over time, we are going back to iterating over the folder structure. Continuing with #92 and #91, together they allow to iterate the folder structure and in turn find content-objects/publicationoutputs based on filter criteria like language or recently changed.

@ddemeyer ddemeyer removed this from the v0.12 milestone Apr 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants