Skip to content

Commit

Permalink
fix bug in sample_random when probability_to_keep is 0 (#989)
Browse files Browse the repository at this point in the history
This PR makes the following changes:
- Fixes a bug in the `sample_random` processor when
`probability_to_keep` is set to 0%. If the random number generator
returns 0 then a record would be returned ( 0 <= 0 is true ). Setting
the `random()` function's minimum value to 1 ensures that no records
will ever be returned ( 1 <= 0 is false).
- Update docs
- Add test with large dataset and `probability_to_keep` set to 0%
- bump standard asset from v1.3.1 to v1.3.2

ref: #988
  • Loading branch information
busma13 authored Jan 17, 2025
1 parent 2ba1022 commit a6c5246
Show file tree
Hide file tree
Showing 6 changed files with 15 additions and 6 deletions.
2 changes: 1 addition & 1 deletion asset/asset.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "standard",
"version": "1.3.1",
"version": "1.3.2",
"description": "Teraslice standard processor asset bundle",
"minimum_teraslice_version": "2.0.0"
}
2 changes: 1 addition & 1 deletion asset/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "standard",
"displayName": "Asset",
"version": "1.3.1",
"version": "1.3.2",
"private": true,
"description": "Teraslice standard processor asset bundle",
"repository": {
Expand Down
2 changes: 1 addition & 1 deletion asset/src/sample_random/processor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ export default class SampleRandom extends BatchProcessor<SampleRandomConfig> {
const outData: DataEntity[] = [];

for (const doc of dataArray) {
if (random(0, 99) <= this.opConfig.probability_to_keep) {
if (random(1, 100) <= this.opConfig.probability_to_keep) {
outData.push(doc);
}
}
Expand Down
5 changes: 3 additions & 2 deletions docs/operations/sample_random.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# sample_random

given an array of JSON documents will return an array containing a subset of those input documents. It iterates through the array and generates a random number between 0 and 100 for each record, and if the number <= probability it is kept. Must be between 0 and 100, with 100 keeping all records and 0 rejecting all records.
given an array of JSON documents will return an array containing a subset of those input documents. It iterates through the array and generates a random number between 1 and 100 for each record, and if the number <= probability it is kept. Must be between 0 and 100, with 100 keeping all records and 0 rejecting all records.

## Usage

Expand Down Expand Up @@ -29,6 +29,7 @@ Example of a job using the `sample_random` processor
}

```

Example of the data and the expected results

```javascript
Expand All @@ -52,4 +53,4 @@ results === [
| Configuration | Description | Type | Notes |
| ------------- | ------------------------------------------------------------- | ------ | ---------------------------- |
| _op | Name of operation, it must reflect the exact name of the file | String | required |
| probability_to_keep | The probability of the record being kept. It iterates through the array and generates a random number between 0 and 100, and if the number <= probability it is kept. Must be between 0 and 100, with 100 keeping all records and 0 rejecting all records | required, defaults to 100 |
| probability_to_keep | The probability of the record being kept. It iterates through the array and generates a random number between 1 and 100, and if the number <= probability it is kept. Must be between 0 and 100, with 100 keeping all records and 0 rejecting all records | Number, defaults to 100 | required |
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "standard-assets-bundle",
"displayName": "Standard Assets Bundle",
"version": "1.3.1",
"version": "1.3.2",
"private": true,
"description": "Teraslice standard processor asset bundle",
"type": "module",
Expand Down
8 changes: 8 additions & 0 deletions test/sample_random/processor-spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,14 @@ describe('sample_random', () => {
expect(results.length).toBeLessThan(5400);
expect(results.length).toBeGreaterThan(4600);
});

it('with large datasets and 0%', async () => {
const data = makeData(10000);
harness = await makeTest({ probability_to_keep: 0 });
const results = await harness.runSlice(data);

expect(results.length).toEqual(0);
});
});

interface FakeData {
Expand Down

0 comments on commit a6c5246

Please sign in to comment.