-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
AWS S3 Transport with Full Test Suite (#30)
* S3 Transport * S3 Integration Tests itests - add both kinesis & s3 add sleep after docker-compose down dynamic context discovery with formatting * transporter constructor - remove batchSize as not needed * Unit Tests first transporter_test test error handling - better messaging more tests * itests - remove added sleeps * revert back Dockerfile debug modification * comments * PR comments 0 * poller-s3 - use an incremental wait instead of expect counts * revert changes to mock_kinesis.go * docs - s3 transport doc for key construction * s3 transporter - use all underscores for filename * Remove (revert addition of) start sleep * itests - poller-s3 - update moving_deadline * itests - ability to run single test * s3 - use retryPolicy and not client retries * s3 - retries unit tests with restart io.Reader * dockerfile - make test only in CI * s3 - fatal on seek reset error * itests - exit on error, dependency ordering, test placement itests - bugfix - only move deadline on new keys itests - docker-compose rm after success itests - runner - use exit on error itests - have containers clean up their generated & mounted files circleci - parallelism - increase to 2 itests - docker-compose - dependency ordering with yaml lint itests - s3-poller - config for single file itests - pollers - configurations for specific tests itests - test placement - fix base & kinesis tests * itests - s3-poller - retry on bucket creation * PR comments 2018-05-08 * itests - docker-compose - data-poller ordering * PR comments 2018-05-08 part2 * PR comments 2018-05-08 part3
- Loading branch information
Showing
43 changed files
with
5,159 additions
and
290 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# This docker-compose can be used to bring up a basic postgres container for manual testing | ||
version: '3' | ||
services: | ||
postgres: | ||
container_name: postgres | ||
build: itests/containers/postgres | ||
environment: | ||
- POSTGRES_PASSWORD=pgbifrost | ||
ports: | ||
- 5432:5432 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# transport: s3 | ||
|
||
The S3 transporter puts logical WAL messages as *gzipped* objects in an S3 bucket partitioned by year, month and day in increasing order of time. This time is at the time of the PUT request creation, and *not* when the data was written to Postgres nor read from the pg-bifrost client. | ||
|
||
## Key Construction | ||
|
||
The basic construction of an S3 key for this transporter is defined as: | ||
``` | ||
<bucket name>/<key space>/<year as YYYY>/<month as MM>/<day as DD>/<datetime>_<first wal record id>.gz | ||
``` | ||
|
||
### Example | ||
|
||
For a given configuration: | ||
``` | ||
BIFROST_S3_BUCKET=test_bucket | ||
BIFROST_S3_KEY_SPACE=test_files | ||
``` | ||
|
||
A batch written at `2000:01:02 12:03:04` with the WAL record ID of the first record of `1336674354448` will be put at: | ||
``` | ||
test_bucket/test_files/2000/01/02/2000_01_32_12_03_04_1336674354448.gz | ||
``` | ||
|
||
You may also omit `BIFROST_S3_KEY_SPACE` which will result in an object being put at: | ||
```$xslt | ||
test_bucket/2000/01/02/2000_01_32_12_03_04_1336674354448.gz | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
""" | ||
This script obtains records from Kinesis and writes them to a local file as | ||
as defined by OUT_FILE. It will exit when no additional files have been read for WAIT_TIME. | ||
""" | ||
|
||
import os | ||
import sys | ||
import time | ||
|
||
import boto3 | ||
from botocore import exceptions | ||
from gzip import GzipFile | ||
from io import BytesIO | ||
from retry import retry | ||
|
||
# Variables | ||
OUT_FILE = os.getenv('OUT_FILE', '/output/test') | ||
BUCKET_NAME = os.getenv('BUCKET_NAME', 'itests') | ||
CREATE_BUCKET = bool(os.getenv('CREATE_BUCKET', '1')) | ||
ENDPOINT_URL = os.getenv('ENDPOINT_URL', 'http://localstack:4572') | ||
AWS_REGION = os.getenv('AWS_REGION', 'us-east-1') | ||
EXPECTED_COUNT = int(os.getenv('EXPECTED_COUNT', '1')) # expect number of records (only used for logging) | ||
INITIAL_WAIT_TIME = int(os.getenv('S3_POLLER_INITIAL_WAIT_TIME', '90')) # time to wait for initial list of keys | ||
WAIT_TIME = int(os.getenv('S3_POLLER_WAIT_TIME', '10')) # incremental time to wait for new keys if none have been seen | ||
MAP_KEYS_TO_OUTPUT_FILES = bool(os.getenv('S3_POLLER_MAP_KEYS_TO_OUTPUT_FILES', '')) # whether to create a single output file | ||
|
||
client = boto3.client('s3', | ||
endpoint_url=ENDPOINT_URL, | ||
region_name=AWS_REGION) | ||
|
||
# Create a bucket | ||
@retry(exceptions.EndpointConnectionError, tries=10, delay=.5) | ||
def _create_bucket(name): | ||
print("Trying to create bucket {}".format(name)) | ||
return client.create_bucket( | ||
Bucket=name) | ||
|
||
|
||
@retry(ValueError, tries=10, delay=.5) | ||
def _get_all_s3_keys(bucket): | ||
"""Get a list of all keys in an S3 bucket.""" | ||
keys = [] | ||
|
||
resp = client.list_objects(Bucket=bucket) | ||
|
||
file_list = resp['Contents'] | ||
|
||
for s3_key in file_list: | ||
keys.append(s3_key['Key']) | ||
|
||
return keys | ||
|
||
|
||
if CREATE_BUCKET: | ||
# Create the bucket | ||
print("Creating a bucket") | ||
try: | ||
_create_bucket(BUCKET_NAME) | ||
except exceptions.EndpointConnectionError: | ||
print("Unable to contact endpoint at {}".format(ENDPOINT_URL)) | ||
exit(1) | ||
except exceptions.ClientError as e: | ||
if e.response['Error']['Code'] != 'ResourceInUseException': | ||
raise e | ||
|
||
|
||
# get initial set of keys with a deadline of INITIAL_WAIT_TIME | ||
all_keys = [] | ||
timeout_for_first_keys = time.time() + INITIAL_WAIT_TIME | ||
|
||
while True: | ||
if time.time() > timeout_for_first_keys: | ||
print("No data received to poller. Exiting.") | ||
exit(1) | ||
|
||
print("Getting initial keys list...") | ||
sys.stdout.flush() | ||
try: | ||
all_keys = _get_all_s3_keys(BUCKET_NAME) | ||
break | ||
except KeyError: | ||
time.sleep(1) | ||
pass | ||
|
||
all_keys.sort() | ||
|
||
key_i = 0 | ||
total = 0 | ||
|
||
print("Records expected: {}".format(EXPECTED_COUNT)) | ||
|
||
# Start the moving deadline and iterate over new keys | ||
moving_deadline = time.time() + WAIT_TIME | ||
|
||
|
||
while time.time() <= moving_deadline: | ||
if key_i >= len(all_keys): | ||
# our pointer is past the length of the keys we have seen, so we wait for more... | ||
print("Waiting for more keys...") | ||
sys.stdout.flush() | ||
time.sleep(1) | ||
|
||
remote_keys = _get_all_s3_keys(BUCKET_NAME) | ||
if len(remote_keys) > len(all_keys): | ||
# if there are new keys, update our all_keys list and process | ||
all_keys = list(set(all_keys + remote_keys)) | ||
all_keys.sort() | ||
|
||
# update deadline as if we had new keys | ||
moving_deadline = time.time() + WAIT_TIME | ||
else: | ||
# else, look back around | ||
continue | ||
|
||
record_count = 0 | ||
|
||
# get object data | ||
resp = client.get_object( | ||
Bucket=BUCKET_NAME, | ||
Key=all_keys[key_i], | ||
) | ||
|
||
bytestream = BytesIO(resp['Body'].read()) | ||
got_text = GzipFile(None, 'rb', fileobj=bytestream).read().decode('utf-8') | ||
records = got_text.split('\n') | ||
|
||
# filter out any empty lines | ||
records = filter(None, records) | ||
|
||
sys.stdout.flush() | ||
|
||
# By default we only create a single file no matter how many S3 keys we have | ||
_file_num = 0 | ||
|
||
if MAP_KEYS_TO_OUTPUT_FILES: | ||
_file_num = key_i | ||
|
||
with open(OUT_FILE + "." + str(_file_num), "a") as fp: | ||
for record in records: | ||
fp.write(record) | ||
fp.write('\n') | ||
|
||
fp.flush() | ||
record_count += len(records) | ||
|
||
# update pointer in keys read | ||
key_i += 1 | ||
|
||
total += record_count | ||
print("total so far: {}".format(total)) | ||
|
||
if record_count == 0: | ||
time.sleep(1) | ||
sys.stdout.flush() | ||
|
||
print("Records read {}".format(total)) | ||
sys.stdout.flush() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,6 @@ | ||
TRANSPORT_SINK=kinesis | ||
|
||
LOCALSTACK_PORT=4568 | ||
ENDPOINT=http://localstack:4568 | ||
|
||
BIFROST_KINESIS_STREAM=itests |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
TRANSPORT_SINK=s3 | ||
|
||
LOCALSTACK_PORT=4572 | ||
ENDPOINT=http://localstack:4572 | ||
CREATE_BUCKET=1 | ||
|
||
BIFROST_S3_BUCKET=itests |
Oops, something went wrong.