Skip to content

Commit

Permalink
Add delay option to data_generator operation (#899)
Browse files Browse the repository at this point in the history
This PR makes the following changes:

### New Features
- Adds new `delay` option to the `data_generator` operation 
- This option allows the user to specify how long it will take for the
worker to complete its slice.
- _Example:_ If **delay** is set to `60` and **size** is set to `10000`
with only 1 worker, the job will generate 10,000 records per minute
approximately. You can achieve about the same record output with more
slices by decreasing size and decreasing delay.
### Documentation 
- Bumps **standard-assets** from `v1.0.3` to `1.1.0`
- Updated docs to include use case and description for `delay` param in
`data_generator`

#898
  • Loading branch information
sotojn authored Sep 13, 2024
1 parent 5868448 commit 3b086ed
Show file tree
Hide file tree
Showing 8 changed files with 61 additions and 34 deletions.
2 changes: 1 addition & 1 deletion asset/asset.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"name": "standard",
"version": "1.0.3",
"version": "1.1.0",
"description": "Teraslice standard processor asset bundle"
}
2 changes: 1 addition & 1 deletion asset/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "standard",
"displayName": "Asset",
"version": "1.0.3",
"version": "1.1.0",
"private": true,
"description": "Teraslice standard processor asset bundle",
"repository": {
Expand Down
8 changes: 7 additions & 1 deletion asset/src/data_generator/fetcher.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import {
Fetcher, Context, TSError, AnyObject
Fetcher, Context, TSError, AnyObject, pDelay
} from '@terascope/job-components';
import { ExecutionConfig } from '@terascope/types';
import { Mocker } from 'mocker-data-generator';
Expand Down Expand Up @@ -41,6 +41,12 @@ export default class DataGeneratorFetcher extends Fetcher<DataGenerator> {
})
.catch((err) => Promise.reject(new TSError(err, { reason: 'could not generate mocked data' })));
}
// default is zero which is falsy
if (this.opConfig.delay) {
// convert rate value from seconds to milliseconds
const time = this.opConfig.delay * (1000);
await pDelay(time);
}
return mocker
.addGenerator('faker', faker)
.addGenerator('chance', chance)
Expand Down
16 changes: 16 additions & 0 deletions asset/src/data_generator/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ export default class Schema extends ConvictSchema<DataGenerator> {
throw new Error('Invalid data_generator configuration, id_start_key must be used with set_id parameter, please set the missing parameters');
}

if (opConfig.stress_test && opConfig.delay !== 0) {
throw new Error('Invalid data_generator configuration, setting "delay" while "stress_test" is true is not permitted.');
}

if (opConfig.start && opConfig.end) {
const startingTime = new Date(opConfig.start).getTime();
const endingTime = new Date(opConfig.end).getTime();
Expand Down Expand Up @@ -74,6 +78,18 @@ export default class Schema extends ConvictSchema<DataGenerator> {
default: false,
format: Boolean
},
delay: {
doc: 'Time in seconds that a worker will delay the completion of a slice. Great'
+ 'for generating controlled amounts of data within a loose time window.',
default: 0,
format(val: any) {
if (isNaN(val)) {
throw new Error('Invalid rate parameter for data_generator, must be a number');
} else if (val < 0) {
throw new Error('Invalid rate parameter for data_generator, must not be negative');
}
}
},
date_key: {
doc: 'key value on schema where date should reside',
default: 'created',
Expand Down
26 changes: 26 additions & 0 deletions docs/operations/data_generator.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,31 @@ results[0] === {
}
```

### Generate controlled stream of data over a period of time

Example of a job using the `data_generator` processor to generate approximately 10,000 records per minute or 60,000 records per hour. Results could very as this is a loose approximation.

```json
{
"name" : "testing",
"workers" : 1,
"lifecycle" : "persistent",
"assets" : [
"standard"
],
"operations" : [
{
"_op": "data_generator",
"size": 5000,
"delay": 30
},
{
"_op": "noop"
}
]
}
```

## Parameters

| Configuration | Description | Type | Notes |
Expand All @@ -170,6 +195,7 @@ results[0] === {
| start | Start of date range | String | optional, only used with format `isoBetween` or `utcBetween`, defaults to Thu Jan 01 1970 00:00:00 GMT-0700 (MST) |
| end | End of date range | String | optional, only used with format `isoBetween` or `utcBetween`, defaults to new Date() |
| stress_test | If set to true, it will send non-unique documents following your schema as fast as possible. Helpful to determine downstream performance limits or constraints | Boolean | optional, defaults to false |
| delay | Time in seconds that a worker will delay the completion of a slice. Good for generating controlled amounts of data within a loose time window. | Number | optional but can't be used when stress_test is set to `true`|
| date_key | Name of they date field. If set, it will remove the `created` field on the default schema. | String | optional, defaults to created |
| set_id | Sets an `id` field on each record whose value is formatted according the the option given. The options are `base64url`, `hexadecimal`, `HEXADECIMAL` | String | optional, it does not set any metadata fields, ie `_key`. See the `set_key` processor on how to set the `_key` in the metadata. |
| id_start_key | Set if you would like to force the first part of the `id` to a certain character or set of characters | Sting | optional, must be used in tandem with `set_id`. `id_start_key` is essentially a regex. If you set it to "a", then the first character of the id will be "a", can also set ranges [a-f] or randomly alternate between b and a if its set to "[ab]" |
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "standard-assets-bundle",
"displayName": "Standard Assets Bundle",
"version": "1.0.3",
"version": "1.1.0",
"private": true,
"description": "Teraslice standard processor asset bundle",
"type": "module",
Expand Down
4 changes: 4 additions & 0 deletions test/data-generator/schema-spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ describe('data-generator schema', () => {
expect(schema.set_id).toBeNull();
expect(schema.id_start_key).toBeNull();
expect(schema.stress_test).toBeFalse();
expect(schema.delay).toBeNumber();
expect(schema.delay).toEqual(0);
expect(schema.date_key).toEqual('created');
expect(schema.size).toEqual(5000);
});
Expand All @@ -48,6 +50,8 @@ describe('data-generator schema', () => {
await expect(makeSchema({ start: 'asdf987asdf' })).toReject();
await expect(makeSchema({ end: 'asdf987asdf' })).toReject();
await expect(makeSchema({ format: 12341234 })).toReject();
await expect(makeSchema({ delay: -555 })).toReject();
await expect(makeSchema({ delay: 555, stress_test: true })).toReject();
});

it('should throw if start is later than end', async () => {
Expand Down
35 changes: 5 additions & 30 deletions yarn.lock
Original file line number Diff line number Diff line change
Expand Up @@ -831,14 +831,14 @@
typedoc-plugin-markdown "~4.0.3"
yargs "^17.7.2"

"@terascope/types@^1.0.1", "@terascope/types@^1.1.0":
"@terascope/types@^1.1.0":
version "1.1.0"
resolved "https://registry.yarnpkg.com/@terascope/types/-/types-1.1.0.tgz#0c87a0340e45b2e387caab27017b8c1e3ea10028"
integrity sha512-bmPl5w/X4ZIcWK8uqqbCMRObd9QaR52/S9TTZbgQNQA3QNz5KH4/HuC1kNMwdzL7rGWKu7KsqSM2Q4yiPRqmXw==
dependencies:
prom-client "^15.1.3"

"@terascope/utils@^1.0.0", "@terascope/utils@^1.0.1", "@terascope/utils@^1.1.0":
"@terascope/utils@^1.1.0":
version "1.1.0"
resolved "https://registry.yarnpkg.com/@terascope/utils/-/utils-1.1.0.tgz#92a34e757ac3c381b87444bb0b76bf42932c9dbc"
integrity sha512-cDuivbvHxtKFQ9iKJI9h7bmQT/2iKoCbmxC9E0J85RkFnGUGqnOH5B1cNCVJQuOIsLT+BhiE/1B0MaTbt99OfA==
Expand Down Expand Up @@ -6305,16 +6305,7 @@ string-length@^4.0.1:
char-regex "^1.0.2"
strip-ansi "^6.0.0"

"string-width-cjs@npm:string-width@^4.2.0":
version "4.2.3"
resolved "https://registry.yarnpkg.com/string-width/-/string-width-4.2.3.tgz#269c7117d27b05ad2e536830a8ec895ef9c6d010"
integrity sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==
dependencies:
emoji-regex "^8.0.0"
is-fullwidth-code-point "^3.0.0"
strip-ansi "^6.0.1"

string-width@^4.1.0, string-width@^4.2.0, string-width@^4.2.3:
"string-width-cjs@npm:string-width@^4.2.0", string-width@^4.1.0, string-width@^4.2.0, string-width@^4.2.3:
version "4.2.3"
resolved "https://registry.yarnpkg.com/string-width/-/string-width-4.2.3.tgz#269c7117d27b05ad2e536830a8ec895ef9c6d010"
integrity sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==
Expand Down Expand Up @@ -6408,14 +6399,7 @@ string_decoder@~1.1.1:
dependencies:
safe-buffer "~5.1.0"

"strip-ansi-cjs@npm:strip-ansi@^6.0.1":
version "6.0.1"
resolved "https://registry.yarnpkg.com/strip-ansi/-/strip-ansi-6.0.1.tgz#9e26c63d30f53443e9489495b2105d37b67a85d9"
integrity sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A==
dependencies:
ansi-regex "^5.0.1"

strip-ansi@^6.0.0, strip-ansi@^6.0.1:
"strip-ansi-cjs@npm:strip-ansi@^6.0.1", strip-ansi@^6.0.0, strip-ansi@^6.0.1:
version "6.0.1"
resolved "https://registry.yarnpkg.com/strip-ansi/-/strip-ansi-6.0.1.tgz#9e26c63d30f53443e9489495b2105d37b67a85d9"
integrity sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A==
Expand Down Expand Up @@ -7023,16 +7007,7 @@ word-wrap@^1.2.5:
resolved "https://registry.yarnpkg.com/word-wrap/-/word-wrap-1.2.5.tgz#d2c45c6dd4fbce621a66f136cbe328afd0410b34"
integrity sha512-BN22B5eaMMI9UMtjrGd5g5eCYPpCPDUy0FJXbYsaT5zYxjFOckS53SQDE3pWkVoWpHXVb3BrYcEN4Twa55B5cA==

"wrap-ansi-cjs@npm:wrap-ansi@^7.0.0":
version "7.0.0"
resolved "https://registry.yarnpkg.com/wrap-ansi/-/wrap-ansi-7.0.0.tgz#67e145cff510a6a6984bdf1152911d69d2eb9e43"
integrity sha512-YVGIj2kamLSTxw6NsZjoBxfSwsn0ycdesmc4p+Q21c5zPuZ1pl+NfxVdxPtdHvmNVOQ6XSYG4AUtyt/Fi7D16Q==
dependencies:
ansi-styles "^4.0.0"
string-width "^4.1.0"
strip-ansi "^6.0.0"

wrap-ansi@^7.0.0:
"wrap-ansi-cjs@npm:wrap-ansi@^7.0.0", wrap-ansi@^7.0.0:
version "7.0.0"
resolved "https://registry.yarnpkg.com/wrap-ansi/-/wrap-ansi-7.0.0.tgz#67e145cff510a6a6984bdf1152911d69d2eb9e43"
integrity sha512-YVGIj2kamLSTxw6NsZjoBxfSwsn0ycdesmc4p+Q21c5zPuZ1pl+NfxVdxPtdHvmNVOQ6XSYG4AUtyt/Fi7D16Q==
Expand Down

0 comments on commit 3b086ed

Please sign in to comment.