Where are the pitfalls for adding a live stream zmq publisher service to imagenode? #3

shumwaymark · 2020-04-17T15:24:36Z

Jeff,

For the TL;DR on this, just scroll to the bottom to get to my questions.

A brief introduction for context...

Have been brainstorming a personal side-project for the past few months, and feel like I'm ready to start putting it together. The motivation was just that this is something that seemed interesting and fun. And also as possibly a cool learning vehicle for my grandson

The goal is a small-scale distributed facial recognition and leaming pipeline hosted on a network of Raspberry Pi computers. Something that could easily support presence detection within the context of smart home automation. Have bigger/crazier ideas too, but this was a good place to start.

Had learned about imagezmq from the PylmageSearch blog, and that led me here.

Just being completely honest here, my first reaction to your imagenode and imagehub repositories went something like... Awesome! I'm going to steal a bunch of this stuff.

Well done. And after looking at it for awhile, I've come to recognize that what you've built is a much closer fit to my design than I had initially realized.

My initial goals here are to be able to recognize people and vehicles (and possibly pets) that are known to the house. Identifying package and mail delivery. Knowing when a strange car has pulled into the driveway.

Significantly, any new/unknown face should automatically be enrolled and subsequently recognized. We can always make a "formal introduction" later by labeling the new face at our leisure. Or deleting any that are not wanted.

A full video clip of each motion event is automatically captured, preserving the original data.
Will likely support a configurable timestamp on the captured video as an option
The motion detector will also perform object tracking
Object ids and associated tracking centroids are logged as an outcome of motion detection
An optional time-lapse capture is also part of the design

I wanted central logging of errors and exceptions rather than keeping them on the SD card of the camera nodes. Using PyZMQ async logging for both that reason and to capture camera event data. A single detector could potentially generate a number of different result values in sequence: there can be multiple objects.

To support this design pattern, the camera startup goes something like this.

Initialize logging, and bind to the socket as publisher
Connect to the data sink with a ZMQ REQ
Send a "CAMERA UP" command
Data sink notifies cam watcher about the new camera using ZMQ request
Cam watcher connects and subscribes to publishing service of camera (if not already connected)
Cam watcher replies with OK
Data sink responds to camera with OK
8, Camera initialization completes and processing loop begins

This allows cameras to be added and removed dynamically. The cameras can push out a periodic heartbeat over the log as a health check. The cameras just need to know which data sink to connect to. The data sink then introduces the cam watcher.

Most inference runs as a batch job on a separate box(s). Some inference can be moved onto specific cameras that have USB accelerators where real time facial recognition is desired, such as the front door or foyer. All results are stored in a database.

Motion event playback can produce the original video, and support the inclusion of optional annotations/labeling. i.e. show the bound boxes around each face along with a name.

Does any of this design interest you? I guess what I'm trying to ask in a round about way... Should I just fork your stuff and move on, or would you like any of this for yourself?

PyCon 2020 Questions

It looks like the imagenode camera detectors run single threaded. Was this a design decision, or is there more to that than meets the eye?
What are the pit falls for adding a live-stream video imagezmq publishing service on the imagenode?

My thinking on that second question, is that it might be desirable to tap into the live camera feed on demand. This would support not only monitoring on a console or handheld, but would also allow a batch job to analyze a motion event while it is in progress.

Most cameras wouldn't have a subscriber, they would just routinely publish on the socket, it would be available for any application that might want it.

Thanks Jeff!

Mark.Shumway@swanriver.dev
https://blog.swanriver.dev

jeffbass · 2020-04-17T20:32:55Z

Hi Mark,
Your design and layout look great. Thanks for sharing.

Per your question # 1: In the currently posted version of imagenode on GitHub, threading is used for 2 things:

Capturing images (imutils.Videostream() is threaded for image capture) (see lines 491-503 of imagenode/tools/imaging.py).
Temperature sensor readings (see lines 372-378 of imagenode/tools/imaging.py).

Threading is a good idea and in my newer versions of imagenode, I am experimenting with both threading and with multiprocessing for the detectors in addition to image capture and sensor capture. As I have done those experiments, I have learned that multiprocessing may be a better choice, since the RPi has 4 cores and with Python threading, only 1 core is used for all the threads. But multiprocessing has its own drawbacks: Python objects cannot be shared directly between processes and multiprocessing.queue()'s between processes pickle / unpickle every object you pass through them -- which can slow images considerably.

So, my design is evolving to use more threading and even more multiprocessing, especially in imagenode. I hope to push some of the multiprocessing stuff to the imagenode repository in the next month or so.

Per your question # 2: live streaming using imageZMQ is possible, but larger image sizes benefit from compressing to jpgs. Multiple simultaneous streams can slow down depending on number of senders, network load and the speed of the imagehub. I am using the ZMQ REQ/REP messaging pattern in all my applications, which requires the hub to send a REP acknowledgement for every frame received. That is a design choice; I want my RPi's to react to hub slowdowns. Other imageZMQ users have used the PUB / SUB messaging pattern and they have had some issues with slow subscribers growing larger & larger ZMQ queues -- see this imageZMQ issue #27 : jeffbass/imagezmq#27.

I am not streaming frames as video in my imagenode -> imageZMQ -> imagehub design. I am using the detectors of motion on the RPi's to filter and limit the number of frames sent so that my network does not bog down. I talk about this design choice a bit in my PyCon 2020 talk video: https://youtu.be/76GGZGneJZ4?t=490 (start at about 8 minutes in to see my pipeline design discussion). I think my design choices are different than yours because my use case is different. I don't ever send or record video; instead I send only "frames that matter" which are selected by the detectors running on the RPi's in order to limit the number of frames sent over the network. Because of this design choice, I send very few frames from any RPi in any given hour / day. And I don't save anything at all on the RPi SD card; I have had SD card failures when I have saved multiple large openCV binary images to RPi SD cards. The frames that are not sent (because of detector choices made at the RPi end) are lost forever (again, this is a design choice).

I would love it if you would fork one or more of my project repositories (imagenode, imageZMQ or imagehub) and give me feedback. They are all experimental programs in early stages, so I am more interested in feedback via raising issues (like this one) than I am in pull requests. ImageZMQ is a production release and is pip installable, but imagenode and imagehub both have a long way to go before they are stable.

Thanks again for your sharing your great design. Post your GitHub repo links as you push your code updates, in this thread, if you'd like to. I (and others reading these questions) can learn a lot from your work.

Jeff

(P.S. to PyCon 2020 viewers seeing this question: Please feel free to post additional comments & questions that are a follow on to this question by commenting on this issue thread. Please post a new or unrelated question by starting a new issue. Thanks!)

mivade · 2020-04-17T21:59:57Z

Great talk, @jeffbass! Just one comment on what you said about mulitprocessing:

But multiprocessing has its own drawbacks: Python objects cannot be shared directly between processes and multiprocessing.queue()'s between processes pickle / unpickle every object you pass through them -- which can slow images considerably.

This is largely true, but the multiprocessing module also has some shared memory utilities such as multiprocessing.Array or the newer multiprocessing.shared_memory module. These are fairly low level but shared arrays are pretty ideally suited to working with raw image data. I've used the former in the past in a multiprocess application that needed as little latency as possible and it worked quite well (I haven't yet used the newer multiprocessing.shared_memory module but it looks pretty similar). The only trick is you still need a way to signal between processes when the data in the array is updated using a multiprocessing.Queue, ZMQ socket, or other signaling mechanism, but since you just need a flag here the overhead is considerably lower than the pickling/unpickling of the entire image.

You may already be aware of this option, but I thought I'd pass it along just in case as I don't see too much discussion of shared memory features we get for free in the multiprocessing module. Thanks again for the talk!

jeffbass · 2020-04-18T04:02:24Z

Thanks, @mivade!
I have been looking into multiprocessing.Array and multiprocessing.shared_memory.
I have also run some experiments with using multiprocessing.sharedctypes.RawArray() as an "images queue" in a Class that inherits from multiprocessing. It is fairly easy to map OpenCV numpy arrays of uint8 values to a sharedctype of char. I'm going to share my experiments in an examples folder here soon. A RawArray() can be very large (depending on available memory) and it can be wrapped in a class so that it behaves like a very large queue with no pickling needed.

philipp-schmidt · 2020-04-18T12:34:30Z

Hi @shumwaymark,

I recently implemented a multithreaded PUB/SUB subscriber as an example in imagezmq, which enables one-to-many broadcast with the receivers doing realtime (potentially heavy load) processing.
It should work quite nicely for your use case of having a video camera "source" publishing images and multiple video "sinks" running arbitrary processing pipelines. In fact I use the implementation for a similar purpose in a private project.

Implementation here: https://github.com/philipp-schmidt/imagezmq

Open imagezmq pull request for further discussion: jeffbass/imagezmq#34

This was initially a response to the slow subscriber problem described here.

Cheers
Philipp

shumwaymark · 2020-04-19T17:13:47Z

Thanks @jeffbass and @philipp-schmidt !

I think I'll do exactly that. Am using the production imageZMQ library, but have forked both imagenode and imagehub with the intent to build from there. Initially will just flesh out my framework with minimal changes to imagenode: adding the async logging over ZMQ, and experimenting with a new detector. Once that's standing on its own, will layer in the video capture and video streaming features.

Will eventually get code posted, and will reply back on this thread with my findings,. This is a nights and weekends project, but hope to have version 0.1 working in the next month or two.

Looking forward to the challenge, and once again: a sincere thank you for the very strong leg-up!

Mark

jeffbass · 2020-05-11T02:49:25Z

Hi @shumwaymark ,

I merged @philipp-schmidt's pull request into imagezmq. He provided a great code example for implementing a multithreaded PUB/SUB subscriber. It works really well. You may want to take a look at that: PUB/SUB Multithreaded Fast Subscribers for Realtime Processing

Jeff

shumwaymark · 2020-05-31T01:24:10Z

Thanks @jeffbass,

That's one of the very first things I did, and have been very pleased with the results. It seemed to slip right into the Imagenode framework without much fuss.

I thought it should be optional by camera, with a single publisher for the node. So added a couple of settings for this.

outpost1.yaml

publish_cams is used to specify the port number for publishing, which is paired with a video setting per camera to select images for publishing.

This code then follows the initialization of the sender in the ImageNode __init__

        # if configured, bind to specified port as imageZMQ publisher
        # this provides for optional continuous image publishing by camera
        if settings.publish_cams:
            self.publisher = imagezmq.ImageSender("tcp://*:{}".format(settings.publish_cams), REQ_REP=False)

The following is at the bottom of the read_cameras() routine, after the detectors have completed. Following @philipp-schmidt's example, I just went with compression on each frame by default. Given the role, and tasks, of the Imagenode there doesn't seem to much of an argument for ever sending the full sized images over the network as a video stream. However, I imagine there could be some noticeable loss of detail when re-compressed into a video format for storage.

            if camera.video:
                ret_code, jpg_buffer = cv2.imencode(
                    ".jpg", image, [int(cv2.IMWRITE_JPEG_QUALITY), self.jpeg_quality])
                self.publisher.send_jpg(camera.text.split('|')[0], jpg_buffer)

...then again, can't help but wonder that with cabling and switches to support Gigabit Ethernet from end-to-end, maybe it would be faster to skip the compression?

This is working well and has been stable. Using the settings in the yaml file above it will stream at about 45 frames/second. This is of course, tightly paired with the throughput of the Imagenode pipeline. That includes an image flip, and massaging the images prior to running a MOG foreground background subtraction on each frame. Oh, and also sending tracked object centroids over the network via the async logger. Eliminating the image flip boosts throughput to over 62 frames/second.

Even leaving the image flip in place and increasing the image size to 800x600 it still runs close to 32 frames/second. This increases the size of that small ROI from 32K to 57.4K pixels,

The above was actually the second thing I did. Some of my initial work was to add support for the async logging, first draft of the camwatcher module, and modifying Imagehub to introduce the Imagenode log publisher to the camwatcher.

That's a separate post.

jeffbass · 2020-05-31T06:34:43Z

Thanks for sharing your work. 62 frames a second is amazing. If you have Gigabit Ethernet, then
jpeg compression is probably not even needed. Most of my imagenodes are RPi's that have relatively slow ethernet; only the newest RPi 4 has Gigabit ethernet. Keep us posted on your progress!

shumwaymark · 2020-12-16T03:24:48Z

Honestly, after playing with this for a while, keeping the frames as individual JPEG files seems the most practical. Easier on network bandwidth and the disk storage requirements are reasonable. That's not news to you, I know. There's too much overhead in saving to a video format, nor does there seem to be much of anything to gain from doing so. Analysis needs access to individual frames anyway, and the overhead for the compress/decompress of individual frames doesn't seem too onerous. Additionally, any playback might need to be able to optionally select from/merge the results of multiple vision models, including any desired labeling, timestamps, etc. Or alternatively, just presenting the original images as captured, with no labeling.

Good news to report. I have the first draft of the cam watcher functionality fleshed out and working. Still a work in progress with an obvious lack of polish, but solid. I pushed everything I have so far up to my GitHub repository.

Took your advice regarding threading to heart...

The cam watcher has a lot of I/O requirements, so elected to implement it as a set of coroutines under a single asyncio event loop. This gets most of what's needed cooperating within a single thread. The video capture is based on the design suggested by @philipp-schmidt and forks as a sub process.

One of the challenges to this design is correlating tracking data back to the captured image stream. An individual frame could contain multiple tracked objects. In the interest of keeping the image capture as lean (fast) as possible, it seemed too cumbersome to attempt to stuff all the tracking data into the message field of each published image. We may only be interested in parts, or none, of it anyway.

An example use case might be an imagenode rigged with a USB accelerator which is focused on an entryway and running a facial recognition model directly on the device. Only the unrecognized face(s) require further processing. If every face in the frame is known, no further analysis may be needed.

For the image stream subscriber, there is some lag between the captured event start time and the filesystem timestamp on the first stored frame. Last I looked the average latency was about 7 milliseconds. So based on the high frame rates I've been testing with, it seems safe to assume that at least the first 3-4 frames are being dropped before the subscriber comes up to speed, though it's likely a bit worse than that. The slight downside here is that for a motion-triggered event, the captured video begins after the event is already, and hopefully still, in progress.

The publishing frame rate is tied directly to the length of the imagnode pipeline. My laboratory example is unrealistic so would expect a velocity well under 32 frames/sec for most real world applications. I'm not running anything slower than a Raspberry Pi 4B along with wired ethernet everywhere. For cameras permanently attached to the house, I intend to avoid Wi-Fi in favor of PoE over an isolated network segment. Where feasible.

Since video playback framerate is likely much higher than the capture rate out of the image pipeline, it helps to estimate a sleep time between each frame to make the playback appear closer to real time. I'm currently dealing with this and all syncing options by estimating the elapsed time within the event to place each captured frame in perspective along with associated tracking data. Should be close to right, will know more after further testing.

I couldn't help but realize that much of functionality of the cam watcher is already served by your existing imagenode/imagehub design. In some respects, I'm clearly re-inventing the wheel here. It is worth noting that the video publisher should play well with all of the existing functionality of your design, it's just adds a final step to the existing pipeline. That's a huge plus, in my view.

I like the way this is coming together; I think it has a lot of potential for practical and fun applications. Live video monitoring for interactive display, and batch processing jobs, can tap into any desired video stream on-demand as well as replay/remodel prior stored events.

I have hopefully pulled your most recent updates to both imagenode and imagehub and merged in my changes. A diff should show the current state of affairs. The "tracker" detector is raw and purely experimental. Code added for the log and image publishing functionality should be independent of the tracker. I didn't think to put those pieces in a separate branch. That would have been smart. Your feedback on any and all of this welcome. Will take a pause to document what I have so far and then dive in to building out the next component: the inference and modeling engine.

jeffbass · 2020-12-17T06:54:25Z

This looks like great work. I think your design is well thought out. I have not used asyncio event loops and it seems like a great tool set for a cam watcher. My first reaction is that your design is a complete rethinking and replacement for my imagenode/imagehub design. I look forward to your updates and your documentation in your GitHub repository.

shumwaymark · 2021-01-24T18:38:59Z

Done.

Truth be told, I had approached your imagenode/imagehub projects with a specific design already in mind. All my thinking was centered around the concept of analyzing and storing video content not only for analysis, but also to support video playback and monitoring by a human reviewer.

I've been reading your posts in the other thread regarding the Librarian design with great interest. Only found them in late December. You already understood what only became obvious to me recently. Most vision tasks only require a few frames for analysis.

My goal is not to replace what you've built. I view my project as supplementary to yours, providing video publishing and optional real-time automated analysis of a motion event in progress. Can I teach my house to learn? While brainstorming my design, I had a lot of other computer vision ideas beyond facial recognition, many of which you've either already solved, or are actively pursuing yourself.

I'm going to remove my changes to imagehub. This was just a case of me over-thinking my design, where a camera node could be added dynamically, and introduce itself to the larger system. That doesn't really make a lot of sense to me in retrospect. Any new node added to the network will need configuration anyway, obviously. Keep it simple, right?

I don't believe the send_threading option in imagenode was in the original code I forked back then. Since my design does not start capturing video until after a motion event has started, it occurred to me that I might be able to use that to dump the cam_q as the event begins, thus capturing the scene just prior to the start of the event.

I laugh now at my comment above about having something working in the next month or two. Not much has gone as planned for 2020. My real job has kept me busy.

Thanks again Jeff.

shumwaymark · 2021-02-20T22:22:43Z

So, after getting that much of it working and documenting everything, it seemed like a perfect time to tear it all apart and rebuild it. This amounted to a complete re-factoring of changes made to the imagenode and removing all changes to imagehub. Have also moved to a data model based on CSV files for the data capture. Am now in a much better position to move forward with the rest of the project.

I wanted to reduce the blast radius of the changes made to imagenode, so have this boiled down to a single import statement and 3 lines of code.

Everything needed is now incorporated into my detector. This is contained in a separate module, along with all related modules, in a sibling folder to the imagenode\tools folder.

The hook for this can be found in the initialization code for the Detector.

        elif detector == 'outpost':
            self.outpost = Outpost(self, detectors[detector], nodename, viewname)
            self.detect_state = self.outpost.object_tracker

The initialization signature for the Outpost varies from a standard Detector. It's effectively a foreign entity masquerading as an alien trying to fit in like a native. A reference to the Detector object is passed as the first argument, along with just the subset of the configuration dictionary that defines it.

It works.

In a perfect world I suppose the Outpost would more properly be a subclass that inherits from the Detector, with access to key attributes and methods available to every instance of a Detector. However, confession time: This is my first Python project, so learning as I go.

shumwaymark · 2021-08-12T18:37:18Z

@jeffbass,

Was just re-reading one of your earlier replies in this thread

...have also run some experiments with using multiprocessing.sharedctypes.RawArray() as an "images queue" in a Class that inherits from multiprocessing. It is fairly easy to map OpenCV numpy arrays of uint8 values to a sharedctype of char. I'm going to share my experiments in an examples folder here soon...

Have ben working on building out the "Sentinel" module for my project. Which is a multi-processing design, so would be very interested in learning more about any successes and/or setbacks you've encountered along the way. Perhaps an early prototype I can review?

Thanks!

jeffbass · 2021-08-12T21:29:07Z

@shumwaymark,

I have started a complete refactoring of my librarian. While I am using multiprocessing to start independent agents (such as a backup agent, an imagenode stall-watching agent, etc.), I have put the passing of images to an independent process in memory buffers on hold. I am waiting for Python 3.8 on the Raspberry Pi, so I can use multiprocessing shared_memory. I did a few quick tests on Ubuntu and they were promising. I expect the next version of RPi OS will be released soon and it will include Python 3.8 (replacing Python 3.7 in the current RPi OS version). Sorry, but I don't have any early prototypes for you to review.

An imagenode & imagehub user @sbkirby has designed and built a completely different approach to building a librarian using a broad mix of tools in addition to Python including Node-Red, MQTT, MariaDB and OpenCV in Docker containers. He has posted it in this Github repository. I like his approach a lot, but I'm still trying to build a mostly Python approach.

Jeff

sbkirby · 2021-08-20T03:36:28Z

Hey Jeff, I look forward to testing your new version of software. Stephen ⁣Get BlueMail for Android On Aug 12, 2021, 5:29 PM, at 5:29 PM, Jeff Bass ***@***.***> wrote: ***@***.***,

…

I have started a complete refactoring of my librarian. While I am using multiprocessing to start independent agents (such as a backup agent, an imagenode stall-watching agent, etc.), I have put the passing of images to an independent process in memory buffers on hold. I am waiting for Python 3.8 on the Raspberry Pi, so I can use [multiprocessing shared_memory](https://docs.python.org/3/library/multiprocessing.shared_memory.html). I did a few quick tests on Ubuntu and they were promising. I expect the next version of RPi OS will be released soon and it will include Python 3.8 (replacing Python 3.7 in the current RPi OS version). Sorry, but I don't have any early prototypes for you to review. An imagenode & imagehub user @sbkirby has designed and built a completely different approach to building a librarian using a broad mix of tools in addition to Python including Node-Red, MQTT, MariaDB and OpenCV in Docker containers. He has posted it in this [Github repository.](https://github.com/sbkirby/imagehub-librarian) I like his approach a lot, but I'm still trying to build a mostly Python approach. Jeff -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #3 (comment)

jeffbass · 2021-09-25T06:06:00Z

I pushed the librarian-prototype to this repository. It contains the code for the commhub and the code I use to send / receive SMS texts using a Google Voice phone number via the Gmail API. The code is in the librarian-protype folder. The documentation is here.

shumwaymark · 2021-10-01T02:44:19Z

Thanks Jeff. Looking forward to setting this up, still excited about insuring my project fits and works well with everything you've been working on. I'm moving forward with a multiprocessing design for analyzing image frames in real time. The general idea is that an outpost node can employ a spyglass for closer analysis of motion events. A spyglass can employ one or more specialized lenses for different types of events. I'm probably overthinking things, as usual. The idea is to keep the pipeline running at full tilt so that publishing runs at the highest possible framerate, while processing a subset of images in a separate process for triggering events. Will be using the sharedctypes.RawArray to provide the opencv image data to the subprocess. This framework then become the foundation of the architecture needed for the sentinel node. My biggest challenge, as usual, has been finding the time to focus on it. Still making progress and excited about seeing this through to completion.

vkuehn · 2021-10-01T11:21:07Z

not sure if issue is the right place for discussion However..
Just out of curiosity, did you consider things like home assistant or node-red to connect the outside world ?

shumwaymark · 2021-10-04T04:17:42Z

The short answer, is yes. Just not quite there yet. My focus has been on designing and building the framework to suit my use cases. You're correct though: This may not be the best venue for discussions of that nature, since they aren't directly related to Jeff's projects. Please feel free to post such questions over on my project. Thanks.

jeffbass · 2021-10-05T04:06:17Z

@vkuehn , for my own projects I am using a Python only approach. I plan to continue to do that. I am very impressed with @sbkirby's node-red approach (link above). I plan on writing a web interface using flask (which is pure Python). I also think @shumwaymark's sentinalcam design is a good one. For my own farm management stuff, the ability to send / receive text messages regarding water and temperature status was my first goal. I haven't looked into linking any home assistants.

shumwaymark · 2022-02-19T00:24:57Z

So @jeffbass: an update, lessons learned, and a couple of questions.

First, the update

As promised, built my idea for a framework to support a multi-processing vision pipeline using shared memory. And then just for fun, threw it out onto the imagenode and watched it run. Because, why not? Uses a REQ/REP socket pair from your imageZMQ library for IPC. This provides for an asynchronous pattern by employing a poll on the socket to check for readiness. See the LensWire class for how I'm using this.

I really like your imageZMQ library. Convenient and easy to use/extend. Thanks for exposing the underlying socket; very considerate. You are a gentleman and a scholar. Also using this for my DataFeed/DataPump mechanism. These are sub-classes of your ImageHub and ImageSender which add support for a couple of additional payloads. The write-up for this, and the above, is on the README in my project.

A lesson learned

Do not over-publish image data with ZMQ. Bad things can happen.

I ran for months just focused on the camwatcher and fine-tuning data collection. All was well. The imagenode was happy. Then finally added that new multi-processing SpyGlass code. More demand for CPU. Ouch! Inexplicably, high CPU loads causes the publishing framerate to increase dramatically. Something going on down in the ZMQ layer that I don't understand / can't explain.

This has made me take a hard look at imagenode performance. Mostly I've been ignoring that because I was happy with the results I was seeing. I had noticed that the equivalent of 2 cores were tied up in what I was previously running, but never investigated too closely.

After adding the new code and taming the beast I had unleashed, it now idles at around 2.6 - 2.8 cores when an Intel NCS2 which lately was added to the mix. With good results. Admittedly, there is obviously quite a lot of overhead in there. But I think worth it with regard to function, and as long as the results line up with goals, I'm OK with all that.

I will eventually set aside some time to measure and chart performance against various deployment scenarios.

That multi-threaded VideoStream class in imutils is a beast. I mean, I know why they built it, its cool. It allows a Raspberry Pi developer to focus on the performance of their pipeline without worrying too much about how it's fed. Or accounting for the underlying cost. Which is very real and measurable. Mostly, this is fine. However I was publishing every time the Detector was called. For small frame sizes, your code runs fast on an RPI 4B+. Crazy fast. Insanely fast. When CPU limits are reached, stuff like that has a way of delivering a black eye before you see the punch coming. This is followed by you crawling away to your bed in tears.

Added a throttle to dial back the publishing framerate somewhat. Not perfect, but it saved the day. Logically, this should try reflect reality of the source data, without the added cost of moving multiple copies of the exact same image over the network repetitively for no benefit.

A question, or two

My first foray into actually using imagehub as intended was to add a temperature sensor. The REP_watcher tripped me up. In a couple of ways. Began seeing SIGTERM sent from fix_comm_link() followed by a hung imagenode process. This was because my forked child process doesn't get that signal.

Using systemd control commands afterwards to restart or stop/start the service resolves this. Assume it just walks the process tree and kills everything it finds.

I need to devise and test a mechanism to fix this. Wondering if something like a registration routine could be easily wired-in to provide such a facility. Perhaps just allow a child process to get on a notification list to have the same signal sent to it. Not certain what's best.
Better question: Why did the REP_watcher take action? My sole test case was sending that temperature at a minimum threshold of 15 mins. This was typically more like 30-45 minutes. You would think that the imagehub could respond in under a second.

The patience setting. I had this at 5 seconds. I think the original code would allow that long to wait for a reply? Don't remember for certain. Seeing now that this is a sleep time for the watcher itself. If there has been outgoing message, it is hard-coded to then sleep for only 1 second for a reply to arrive.

I changed this to sleep for self.patience instead of 1.0 - problem disappeared after that.

Impulse purchase: just ordered an OAK-1 camera. Looks like it is going to fit right into / add to this design. If this works as well as I think it will, it should be very cool. Mind-blowingly cool.

From where I'm sitting now, it looks like MQTT and Node-RED will be an important part of the final solution. The integration possibilities with WiFi switches, and other miscellaneous IoT gadgets open up quite a world of possibility. When someone pushes the button for the antique doorbell, a switch could relay a command to display the front entry camera on a nearby monitor (perhaps you're upstairs in the bedroom). Likewise, vision analysis could be employed to detect whether a garage door is open or closed, and send the command to the connected relay to close it.

Am going to need your Librarian too. Seems like an ideal vehicle for storing knowledge and state. Key functionality omitted
from my original design.

jeffbass · 2022-02-24T20:18:24Z

Hi @shumwaymark. Thanks for your update, questions and comments.

Regarding your update: Your sentinelcam project is a great design. Well done. I've taken a look at your LensWire and LensTasking classes and I like the way you've used the sharedctypes.RawArray as a shared frameBuffer. I experimented with some ways of doing this, but I like your code better. I will do some further reading of your code, and I look forward to further updates on your project. I have never used the ipc transport-class with ZMQ and reading your code sent me on a reading / learning path. You have inspired me to look into this. Thanks.

When you set LENS_WIRE = "/tmp/SpyGlass306", is /tmp a RAM Disk? I presume this is on a Raspberry Pi? Perhaps using tmpfs?

Regarding your lesson learned: I too have found that the imutils VideoSteam class can cause unexpected results, since it can return the same frame multiple times. There is an exchange about this in imagenode issue 15. The problem was initially raised in imagenode issue 12. You may want to take a look at the the discussion in those 2 issues (they are both closed now, but you can still read them). As an alternative to imutils VideoStream, I added a PiCameraUnthreadedStream class to imagenode, along with a threaded_read option in the cameras YAML options. Setting the threaded_read option to False reads the PiCamera directly without using imutils VideoStream. It prevents sending the same frame multiple times. I use it on some of my imagenodes. The threaded_read option defaults to True, which uses the imutils VideoStream for threaded camera reading. Putting camera reading into a separate thread or process using a shared memory buffer would eliminate the need for imutils VideoStream. But it would require a using a threading.Lock and appropriate code to make sure the buffer writing and reading were coordinated appropriately.

I think imagenode would benefit from using a shared memory buffer for transferring images from the camera to the image handling event loop. I like what you've done with sharedctypes.RawArray as a shared frameBuffer and that could be one way to go. I am planning on experimenting with the multiprocessing.shared_memory class introduced in Python 3.8. But there is a Catch-22. The RPi OS version Buster which I use in my current production imagenodes includes Python 3.7 and the Python PiCamera module. The newly released RPi OS Bullseye includes Python 3.9, BUT there is no working Python PiCamera module yet. A Python PiCamera2 module for Bullseye is in development, but won't be completely compatible with the original PiCamera module. I want imagenode to be able to work with both versions of RPi OS and both versions of Python. So I'll probably add an option for shared memory frame buffering that is RPi OS version dependent.

My current overall system design envisions using imageZMQ for passing the (text, image) tuples BETWEEN computers and using shared memory to pass images WITHIN the imagenode program. What I would like to do using shared memory is to avoid moving the image around in memory in the imagenode program at all. The camera read would put the image into a large Numpy array that uses an additional dimension for frames captured over time. The large Numpy array would be setup as a Circular Buffer / Ring Buffer that would be sized large enough for 30 seconds of frames. The detector would do motion detection using the images directly in the same Numpy array. If motion was detected, the appropriate frames would be sent via imageZMQ from the same Numpy array. The imagenode SendQueue class does threaded image sending now, but by appending images to a deque and popping those images in a sending thread. But having multiple copies of the same image in different parts of the imagenode program is something I would like to eliminate. It will be a significant refactoring of the imagenode code. An RPi 4 can have up to 8GB of memory. My images are typically 640x480x3 (RGB) & my FPS is typically 10. So 30 seconds of Numpy image frames at 10 FPS is about 300MB for the Numpy frame buffer. Up that to 30 FPS and it would need a memory buffer of about 1GB. Increase the resolution to 1920x1080 and 10 FPS needs a 2GB Numpy frame array, while 30 FPS needs about 6GB. While these are all doable with the RPi 4, your design using IPC image passing is more scalable. You have me rethinking my design and I appreciate that. Using IPC and multiprocessing eliminates the need for such a large Numpy framebuffer. Your design is a good one.

My framebuffer design would split imagenode into 3 processes: one process to read the camera and load the Numpy framebuffer; one process to analyze images in the Numpy framebuffer for events like motion detection and mark some images for sending; one process to send only marked images via imageZMQ. By having those 3 parallel tasks as separate processes, they could potentially be on different RPi cores.

Regarding your question (1): It is quite challenging to exit a Python program that uses threads and subprocesses. SIGTERM doesn't work in Python anywhere but in the main thread. And sys.exit() only functions as expected in the main thread; in threads it only exits the thread, not the program. I need to take all the SIGTERM code out of both imagenode and imagehub. I have been using systemd restart to exit imagenode; it is much more reliable. There is an example in this repository here. You may want to take a look at that. I think that systemd restart is such a "good enough" answer that using registration routines and signals is probably overkill.

Regarding your question (2): Why does REP_watcher take action? In my experience, ZMQ is very reliable, but can hang when there are transient glitches in power or WiFi signals. These hangs or glitches are more common when there are larger gaps of time between imageZMQ REQ/REP events. When it hangs, restarting the imagenode program is what has worked best for me. I'm sure there are better approaches. I think I should add pass-through options for imageZMQ so that ZMQ.LINGER, ZMQ.RCVTIMEO and ZMQ.SNDTIMEO could be specified. But I haven't gotten around to that yet. I do have an example program doing that in the imageZMQ repository here. In that example, I use constants for ZMQ.RCVTIMEO and ZMQ.SNDTIMEO to get around a ZMQ / PyZMQ requirement that "All options other than zmq.SUBSCRIBE, zmq.UNSUBSCRIBE and zmq.LINGER only take effect for subsequent socket bind/connects". I could add ZMQ options ZMQ.RCVTIMEO and ZMQ.SNDTIMEO to imageZMQ, but it would require adding a socket close / socket open after the options were set. I haven't done that yet.

Impulse purchases are a good thing! I also have an OAK-1 camera kit. I haven't gotten around to playing with it yet, but I want to incorporate it into imagenode. I also have a Google Coral Board and Coral Camera Module that I want to incorporate into imagenode. So many fun projects & so little time ;-)

I will be watching your sentinelcam project as it progresses. Thanks!
Jeff

shumwaymark · 2022-04-13T23:28:07Z

Thanks @jeffbass

That LENS_WIRE = "/tmp/SpyGlass306" is just a name I chose. The documentation indicated that this needed to be a name compatible with the filesystem. Could be anything really. My guess is that this is implemented on top of the POSIX shm_open() shared memory API, so is a memory-mapped file. I do see a file system object created with that name, which must represent backing storage, It has an "s" in the first byte (type) of the file attributes on Linux ...so, "socket"? Interesting.

I got there in a round about way, incrementally. The initial prototype for the SplyGlass was implemented using IPC with a multiprocessing.Pipe(), which performed poorly on the Raspberry Pi. There is pickling involved. This led me on a quest for lower latency. It occurred to me that ZMQ might work really well while providing for custom marshalling techniques, so implemented it with a tcp://localhost binding. I had hoped that the OS-level interprocess communication library might be even faster than riding the TCP/IP stack. Not certain that is a true statement, but casual observation seems to indicate it is. I wanted to use your library for this, since this keeps everything in the same ZMQ Context. Very happy with the results I'm seeing.

Also, thanks for the tips. I will experiment with the non-threaded camera read soon. Currently, the OAK-1 has me distracted. Will be following through with your systemd restart suggestion. Planning to develop a health-check protocol between the Outpost/SpyGlass and camwatcher to handle the most common problems. For this project, this includes the need to catch failures at start-up. With an Intel NCS2 plugged into a USB3 port, it's initialization may fall behind the Spy Glass, causing it to fail on startup. Might take a couple of restarts for everything to settle down and cooperate. So that's not smooth either.

I think your plans for imagenode refactoring are sound, and should provide for a noticeable improvement in efficiency. And yes, that looks like some real work. No doubt. I'm quickly learning that thoughtful design definitely helps on these little computers.

First test of the OAK-1

Will be implementing support for the OAK-1 as a new Camera for the imagenode.

Sensor resolution scaled back from full 4K down to1080P
Image size from color camera set to 640 x 360
Set FPS to 30

Producing 3 outputs:

MobileNet object detection
The 640x360 RGB image data ready for OpenCV
The same image data as an JPEG encoded frame

The OAK-1 can feed the Raspberry Pi all of the above at 30 frames / second. Object detection on every frame, the captured image, and the compressed image. With no heavy-lifting on the Raspberry Pi, leaving it free for further analysis. Or whatever else is needed.

Cool stuff.

jeffbass · 2022-04-14T16:05:09Z

Thanks for the update. Keep me posted on your progress. Especially the OAK-1 learnings.

shumwaymark · 2022-04-21T00:50:27Z

Here's a quick update @jeffbass

I have this roughed-in, and working well enough so as not to be embarrassing. See the depthai.yaml file in the root folder of my project. It's a draft prototype for an, incomplete, targeted use case. Not production code. The goal is flexible evolution. Don't yet know where this might lead me. The OAK camera is intriguing, and this integration prototype just scratches at the surface of what's possible.

There is no documentation beyond this, and the Python code in the project.

Three lines added to tools/imaging.py for the Camera initialization of an OAK device.

     # --------8<------snip-------8<---------

            self.cam_type = 'PiCamera'
        elif camera[0].lower() == 'o':  # OAK camera
            self.cam = OAKcamera(self.viewname)
            self.cam_type = 'OAKcamera'
        else:  # this is a webcam (not a picam)

     # --------8<------snip-------8<---------

Everything else is in the sentinelcam/outpost.py module. See the setups at the bottom.

It's really just a set of queues that need to be managed, and consumed. Currently you will only see hard-coded attributes specified as a starting point. Once the camera was un-boxed, and time invested in thoroughly scrutinizing the API, my mind exploded.

Consider that the pipeline will run constantly once started so, initially, those queues fill up quickly. By the time the Outpost might decide to begin consuming data, there is quite a lot to chew on. You'll see some logic that attempts to discard some of it just to things reasonable. Don't yet understand the best design pattern for this. Much depends on how the DepthAI Pipeline is configured. There are a numerous avenues to follow.

Yes, I get that storing those OAK camera queues as a dictionary at the Outpost class level is not ideal, and a somewhat quirky idiocentric design pattern, but works well with imagenode architecture.

Seems that this is clearly pushing the limits well beyond what looks like your original intent behind the camera/detector architecture.

Everything I said in the previous post is true. All of that at 30 FPS. With a blocking read on the image queue, you can count on a 30 FPS rate. Processing the data of course, requires time. Can the imagenode keep up with just a read loop?

It does. What I've learned though, is there is more to this than meet the eye. What you'll currently find in the implementation is only breaking the ground. Still don't have the final structure fully conceived,

Fair weather is upon me. My attention is soon to be consumed with outdoor projects. So wanted to get this posted to wrap-up the cabin fever season here on the eastern edge of the Great Plains.

More later. Thanks again.

shumwaymark · 2023-03-19T20:03:00Z

Hey @jeffbass

Your thoughts on a design for a ring buffer to streamline image processing struck a chord with me. So worked up my version of that into the design of the sentinel.

This just builds upon what is already proven with signaling between the Outpost and SpyGlass. My use cases here are different than yours, since this is consuming previously stored data over the network. There is no concern with losing frames, each task can consume at its own rate. At first blush, a buffer length of 5 seems to be adequate.

Since a dedicated IO thread in the main process is primarily focused on JPEG retrieval over the the network, image decompression, and ring buffer signaling, it runs fast. Much faster than the slower analytical tasks. So from a task perspective, there is not much risk of starvation, the rings just stay full all the time. Could probably even get by with a shorter length.

It might be wise to move ring buffer signaling into a separate thread, and/or call for JPEG files over the network a little less often. With two parallel tasks each running MobileNetSSD directly in the CPU, that ate up about 101.99% of the available processing capacity for all 4 cores. Looked maxed-out. Was running OK though, the 4GB box was still responsive. Although the frame rate for each parallel task was noticeably slower than if it had been running alone, the overall throughput was faster since it was running two tasks at once.

Image sizes here were either 640x480 or 640x360

With just a single task, running MobileNetSSD on an Intel NCS2, it looked like a happy little box. There is only about maybe 1.5 cores in use, at most, with a throughput of 20-22 frames per second. That's better than what I was seeing with the datapump JPEG retrieval and decompression directly within the pipeline loop. That was maybe 13-15 frames/second at best.

Even with a second parallel task also running MobileNetSSD, but in the CPU, the task engine provisioned with the VPU was still cranking out results at 20 frames/second. Box looked maxed out, though again seemed mostly happy. The task running in the CPU was running as well as could be expected.

For the Intel NCS2 tasks, when the total image count processed by the task dips under 200 or a little less, the overall frame rate per second begins reducing. There is some overhead. Sets with a length of less than 100 images seemed to run closer to 10/second. Needs more comprehensive testing, but very encouraged with the results I'm seeing.

I may open an issue for discussion on your imagezmq project.

I'm concerned about potentially creating multiple 0MQ Context() objects within the same process. When the JobManager needs an additional datapump connection, it simply instantiates my sub class.

  class DataFeed(imagezmq.ImageSender)

I suspect your underlying SerializingContext may simply provide a new Context for any sockets created. I don't know for certain if that's what's actually happening, or if this is even an area of concern.

I'm wondering if adaptions to your imagezmq to support shadowing an already existing Context could be helpful, or if this topic had ever previously come up for discussion?

When a TaskEngine() is forked, a new Context is established by the sub process when it starts up. So no worries there. I'm also careful to reuse the existing Context for opening the additional socket() connections from the task engine as needed. You thoughtfully exposed the underlying Context object.

   feed = DataFeed(taskpump)                    # useful for task-specific datapump access
   ringWire = feed.zmq_context.socket(zmq.REQ)  # IPC signaling for ring buffer control
   publisher = feed.zmq_context.socket(zmq.PUB) # job result publication

The main process for the sentinel uses a single context for all of the asynchronous sockets and ring buffer signaling.

   import zmq
   from zmq.asyncio import Context as AsyncContext

   ctxAsync = AsyncContext.instance()
   ctxBlocking = zmq.Context.shadow(ctxAsync.underlying)

However, for my datapump connections, that's a different story. Was wondering if you had any thoughts on this?

Hope all is well with you and yours.

Sincerely,
Mark

jeffbass · 2023-03-20T05:26:45Z

Hi @shumwaymark,
Your new work on your project looks great. I, too, am finding ring buffers passing images between processes uses the resources of the Pi 4 very well. I am impressed with your results.

I have not tried using multiple 0MQ contexts in the same process. My own imagenode --> imagehub architecture is less ambitious than your design and hasn't needed it. I find your design intriguing, and will spend some time looking at your code.

I have not used zmq.asyncio yet, so I don't have any thoughts on how it might affect your datapump connections. If you get something running using zmq.asyncio, let me know how it works for you. Does it speed things up or change the way the multiple cores are used / loaded? I'm interested in what you learn.

I'd also like to hear about your experience using the OAK camera and its hardware image processing. Any learnings to share? My own OAK-D is still sitting in its box, but it is on my ToDo list!

Thanks for sharing your project's progress. I'm learning a lot from it. I think your design descriptions / drawings are really well done.
Jeff

shumwaymark · 2023-03-20T14:34:06Z

Thanks @jeffbass

That would be ideal, wouldn't it? And is probably what's missing from the sentinel design.

If those sockets created under your SerializingContext could return asyncio.Future objects, the sentinel JobManager could potentially employ asynchronous I/O logic, eliminating those blocking reads from the datapump.

Seems worthwhile, but currently have no clue as to what's involved in making that actually work. Perhaps an item for my ToDo list.

I need to measure the aggregate latency of the ring buffer signaling from the task engines to reveal how much of an issue that really is. Will also probably eliminate the use of the multiprocessing.Manager. Had never used that before, and thought it might be beneficial. But then saw how much CPU the sentinel was consuming. So don't want want the overhead. There is no real need for the Value object anyway, and plan to strike that. Will just use a multiprocessing.Queue for the job queue, which is only used for two simple messages, (1) starting a task, and (2) confirming ring buffer readiness if the task engine requests images from a different event.

Still having fun with my Covid project, 3 years later. I'll get there. This piece of it is the last major hurdle.

Mark

madjxatw · 2023-07-24T11:25:25Z

Hi all, sorry for bothering you but I really need some suggestions from experienced developers. I have a webcam that could output both mjpeg and h264. I built a RTSP server on it to stream the h264 video to remote clients over WiFi for monitoring, and got the almost negligible latency. Now I wanted to do some object detection (e.g. yolo) over the video stream on the remote client, what is the best practice to recommend? Should I directly use OpenCV to capture the RTSP frames on the client or use zmq for compressed jpeg transmission between server and client? It seems that OpenCV's implementation of capturing and decoding RTSP stream over WiFi has considerable latency, but compressed jpeg may be slow than h264. Further more, I also need to share image date between python and c++ applications, zmq ipc protocol is fairly well and quite easy to write the code. However, shared memory might be better in terms of speed, I am not sure how many efforts I have to make to have shared memory work between these two languages?

jeffbass · 2023-07-27T23:56:45Z

@madjxatw , I hope others will chime in as I don't have much experience with video codecs. My own projects use RPi computers to send a small number of still image frames in OpenCV format . The images are sent in small, relatively infrequent batches of images. For example, my water meter cam (running imagenode) uses a motion detector class to watch continuously captured images from a pi camera and then sends a few images (over the network using imagezmq) only when the water meter needle starts or stops moving. There is no continuous webcam video stream like mjpeg or h264. Just small batches of still images.

If you are using a webcam streaming mjpeg or h264 then you will need to separate that stream into individual images on the receiving computer that is doing object detection. The load on the wifi network is between the webcam and the computer receiving the images. The network load will be continuous (because of the nature of webcam streaming of mjpeg or h264). The choice of mjpeg or h264 depends on many factors. There is a good comparison of them here. But most webcams cannot be modified to send small batches of individual images.

Processing your video stream on the receiving computer can be done in 2 different ways: 1) using OpenCV or other software to split the video stream into individual OpenCV images and then performing YOLO or other object detection on the individual images or 2) performing object detection like YOLO5 directly on the video stream (there is a discussion of that here. I'm sure there are other ways as well).

I have not used the ZMQ ipc protocol. My guess is that shared memory would be better in terms of speed. I have been experimenting with Python's multiprocessing.shared_memory class (new in Python 3.8). It works very well for passing OpenCV images (which are just Numpy arrays) between Python processes. I have not used it with C++ but it is likely that some web searching will find code that does it.

I also experimented with multiprocessing "shared c-types" Objects documented here. I have found the multiprocessing.shared_memory class easier to use. I don't know which one of these alternatives would be easier to use in sharing images between Python and C++.

There is a good tutorial article on sharing Numpy Arrays (which is what OpenCV images are) between Python processes here. It might help you think about your choices.

madjxatw · 2023-07-29T13:54:02Z

@jeffbass, thanks a lot for your enlightening sharing, especially for those useful links! I've finally used GStreamer to implement my own RTSP stream grabber, and ZMQ pub/sub with ipc protocol for inter-process communication between python and c++ applications. The overall speed is satisfied although the cpu usage is a bit high (around 12% ~ 15%).

OpenCV's VideoCapture supports multiple backends, and I've found that GStreamer backend (with a well constructed pipeline) is considerably faster than the default ffmpeg backend. However, it seems that only Ubuntu packaged OpenCV (installed via apt not pip) has GStreamer support, which means that we have to compile OpenCV with GStramer support enabled if we are not able to use Ubuntu host environment. For easy deployment, my application is running in a virtual environment or even in a docker container without Ubuntu stuff, this is why I decided to write a stream grabber directly using GStreamer APIs.

ZMQ ipc protocol on Linux actually uses UNIX domain socket internally. ZMQ is pretty easy and flexible to use in cases where speed is not extremely critical.

Really appreciate the article about numpy arrays inter-process sharing, I will read it soon and try out the shared memory solution.

shumwaymark · 2023-09-09T16:35:50Z

@madjxatw,

Glad to learn that you've found a workable solution to your problem. I've been thinking about support for h264 video for my own project. Though for my case, I'm working with the OAK cameras from Luxonis. These can not only stream the video, they can execute configurable ML workloads, such as inference, directly on the camera. The receiving system can then simultaneously collect both video and analysis results.

The video format is attractive mainly due to its reduced storage requirements. My target platform is an embedded system using low voltage single board computers, primarily the Raspberry Pi, and this tends to drive all the design decisions.

Have been using shared memory for inter-process access to image data. This includes the use of ZMQ over IPC for signaling purposes. Allows for running parallel analysis tasks on the capture/receiver. Collecting as much as data as possible, while analyzing a subset of those frames at the same time.

Currently, for post processing, individual frames as JPG image data are transported between systems over imageZMQ, then uncompressed directly into shared memory.

I suspect the slow speeds you're experiencing using ZMQ over IPC is due to the volume of data you're sending through the Unix OS socket. For my use case, the latency over this protocol is very low. The following results are summarized across about 20 tasks running MobileNetSSD inference.

Total number of frames
Total elapsed time
Time spent in neural net
Time spent waiting on frames
Wait time per frame
Frames per second

frames	elpased	nn_time	ring_latency	frame_wait	frames/sec
242	9.76	9.43	0.1234750	0.000510	24.80
206	8.12	7.85	0.0982950	0.000477	25.37
217	8.57	8.27	0.1053310	0.000485	25.32
341	13.60	13.22	0.1692010	0.000496	25.07
347	13.96	13.59	0.1713290	0.000494	24.86
482	18.84	18.39	0.2328130	0.000483	25.58
323	12.69	12.33	0.1540180	0.000477	25.45
353	13.90	13.53	0.1699520	0.000481	25.40
363	14.59	14.21	0.1812640	0.000499	24.88
360	14.35	14.02	0.1793920	0.000498	25.09
406	16.00	15.59	0.1962730	0.000483	25.38
319	12.83	12.48	0.1576570	0.000494	24.86
480	18.70	18.28	0.2315010	0.000482	25.67
480	18.71	18.27	0.2307130	0.000481	25.65
254	10.83	10.51	0.1233430	0.000486	23.45
98	5.79	5.54	0.0513630	0.000524	16.93
149	6.78	6.52	0.0715480	0.000480	21.98
311	16.66	16.31	0.1516820	0.000488	18.67
194	12.49	12.20	0.0932300	0.000481	15.53
313	20.73	20.37	0.1561880	0.000499	15.10
192	14.38	14.09	0.0910510	0.000474	13.35

For ZMQ over IPC, the average latency per frame here is 0.000489 - I'm cheating of course, these are very small payloads using MessagePack for marshalling.

My approach to video analysis would be much the same. Would rely heavily on OpenCV for getting image frames out of the stream, then copy into shared memory. Or ideally pre-allocating the NumPy array in shared memory, and just using that for the OpenCV storage.

I believe there should be lots of examples of using shared memory between Python and C/C++ since these are the same underlying libraries. Have never tried that though.

Good luck! Your architecture sounds well thought-through.

Honest disclaimer Do not always get performance figures that high. Those numbers are due to adding an Intel NCS2 stick for the neural net. Have seeing varying performance stats, and the frame rates above are probably higher than average. I'm usually happy to see facial detection, manipulation, and embeddings collected at around 10-12 frames per second.

shumwaymark · 2024-09-12T16:05:51Z

Hey @jeffbass,

Just looking back at that first post. April 2020. Over four and half years. The genesis of my Covid Lockdown project, and haven't given up yet. The primary use case for just the facial recognition and learning is essentially complete. With the wall console more or less working now, the infrastructure to run this is finally all in place. All I have left is to workout the refinements and overall strategy for the machine learning life cycle. This has been a more challenging problem than I fully realized when I started. I knew it wouldn't be easy. Knew it would take a while.

And now, that it's almost finally working, it looks like the OS I've been using (Raspbian Buster) is basically end-of-life. I think I'm inclined to just keep on keeping on for now. Not sure how practical that really is.

Looks like you've been keeping your library current. How are you approaching the planning of all of your Ying Yang Ranch deployments? Do you upgrade? Not upgrade? Replace them when and if they finally break?

I'm also looking at the toys I've got. Have a few of the Intel NCS2 sticks, have those working. A couple of Google Coral accelerators that I've been planning to get to, but haven't yet. The Luxonis camera that I picked up is working with what is basically the same software as the NCS2. Also picked up an Arducam PiNSIGHT. Truth is, I've spent a lot of time just crafting infrastructure. It's felt good that I was actually able to build something that met my expectations. Some of the coolest stuff I've ever written. It's a hobby project that has seen months of inactivity.

Now that I finally have it built, it's time to get it fully working. Designing the machine learning lifecycle for it, and building and teaching all the clever ideas I have in mind. A couple of the aforementioned hardware toys are now kinda dated. And will become more difficult to support in the future.

A conundrum? In some respects, it seems smart to keep using the software that's working. Sigh . It's even smarter though to keep everything current, to be able to easily take advantage of current technology.

What are your thoughts? Some of these libraries are large, and tough to get built and running and working together on a Raspberry Pi.

Mark

jeffbass · 2024-09-14T19:05:33Z

Hi @shumwaymark!

Great to hear that your projects are moving forward. I've been continuing to build and refine mine as well. The short answer to your questions is that, Yes, I've been upgrading all my Yin Yang Ranch deployments. The old versions were working fine, but reaching end of life on all the packages I was dependent on.

Many of my project improvements have been related to becoming current with newer versions of Python, Raspberry Pi OS and Picamera software. My Yin Yang Ranch system was working fine with the older versions, but there are lots of reasons to try to stay (mostly) current. Security and bug fixes in the new Raspberry Pi OS versions have been substantial. Picamera2 is worth the upgrade frictions. The RPiOS Bookworm defaults to a 64 bit OS rather than the older 32 bit OS. This is very helpful with memory buffers and with Picamera2 codecs and packages.

The biggest incentive for me was the significant improvements in Picamera2 camera software versus the older Picamera software. My imagenode Raspberry Pi cameras are a key part of my overall system and were running fine on Raspbian Buster. But Picamera stopped being supported. Buster support was dropped as Raspberry Pi moved on to Bullseye and then to Bookworm. Picamera2, although still in Beta test, is now stable and maintained. Also, the newer Bookworm RPiOS has Python v 3.11 as its system Python. The improvements in all of these made it worth the time and effort to upgrade all my software. PiCamera2 is based on the open source libcamera Linux package versus the older Broadcom proprietary package. It is a big improvement. More on Picamera2 library is here. More on libcamera is here. The new features of Python in versions 3.8 through 3.11 have been very helpful. OpenCV has also had a number of improvements, but the OpenCV camera interface is still based on Linux V4L2 rather than libcamera.org.

My package imageZMQ was the first that I upgraded and tested using Python 3.11, RPiOS Bookworm and Picamera2. The time spent upgrading has been worth it for me and may be for you as well. I am currently testing upgrades to imagenode, imagehub and my librarian packages. I have yet to push my librarian to GitHub (my early librarian prototype is in the librarian-prototype folder in this repo). The design has changed to incorporate newer Python dataclasses and Pydantic improvements to dataclasses. Also, I am using FastAPI, which is a much better communications event loop alternative compared to my original librarian event loop.

My hardware toys are also being upgraded. The newer RPi 4 and RPi 5 computers are faster and have lots more memory, which is easier to take advantage of using the new Python versions (e.g. some of the Numpy memory buffer Python modules). Newer Python versions and Picamera2 use the RPiOS 64 bit version. RPiOS 64 bit also enables newer & better versions of many software packages. The new PiCameras are great: Picamera HQ & Picamera Global Shutter cameras even allow different interchangeable lenses. And libcamera / Picamera2 can take advantage of more settings for exposure control which have been very helpful for my critter cams. I have been using Google Coral accelerators, but haven't tried the Intel NCS2 sticks. I am also playing with a couple of Luxonis cameras, but I haven't got any of the stereo vision software working reliably. I want to upgrade imagenode to make use of them, but haven't done it yet. I haven't looked at the Arducam PiNSIGHT, but I notice it needs the Raspberry Pi 5, so we aren't the only ones upgrading to newer toys.

So, for me, the upgrades to the latest software & OS versions have been worth it. Upgrading these packages has also helped me think about streamlining my designs to make future upgrades smoother and easier. I built my prototypes without thinking about that very much. Lesson learned.

I have not made any progress on any of my machine learning stuff. I'm glad to hear that your are making some progress there. As always, I look forward to learning from you.

Jeff

shumwaymark · 2024-09-18T04:50:42Z

Thanks @jeffbass,

I new that was the correct answer I suppose. Working with and helping to manage a distributed data system, on a much larger scale, is my day job. So duh. Laziness will only get me in trouble. What got me to this point was a pyimagesearch course purchase which came with a ready-to-run Raspberry Pi SD card fully loaded with software. That's been my base since the start. So have been more than lazy in that regard. It worked great though. I've known all along that the good times would one day be over, and I would be forced to roll my own eventually. No time like the present.

If you need a tester for imagenode/imagehub. I'm available.

By the way, finally got around to fixing my ImageSender subclass, (DataFeed), to support an async recv() with a timeout, close, and reconnect. I didn't realize there was an existing fork on your library which achieved a similar result. I just used that same polling operation I've been using for IPC for a non-blocking read. Then added @philipp-schmidt's trick with the wait on an event and raised timeout from his PUB/SUB example. Glad he shared that. Great idea.

A little surprised to be back looking at this in late summer. This project has been mostly a wintertime pursuit. Building the watchtower was an unexpected summer sideshow that I've made some time for this year. Didn't realize that was the big missing piece, until I built it. Funny, huh. Lack of a plan, and still got there.

Mark

jeffbass · 2024-09-19T22:35:28Z

Thanks, @shumwaymark, for your offer to help test the next versions of imagenode & imagehub. Might be a while before I get to those, but I'll let you know when I do.

sbkirby mentioned this issue Aug 3, 2020

YAML settings to detect birds in flight #5

Closed

Where are the pitfalls for adding a live stream zmq publisher service to imagenode? #3

Where are the pitfalls for adding a live stream zmq publisher service to imagenode? #3

Comments

shumwaymark commented Apr 17, 2020

jeffbass commented Apr 17, 2020

mivade commented Apr 17, 2020

jeffbass commented Apr 18, 2020

philipp-schmidt commented Apr 18, 2020 • edited Loading

shumwaymark commented Apr 19, 2020 • edited Loading

jeffbass commented May 11, 2020

shumwaymark commented May 31, 2020 • edited Loading

jeffbass commented May 31, 2020

shumwaymark commented Dec 16, 2020 • edited Loading

jeffbass commented Dec 17, 2020

shumwaymark commented Jan 24, 2021

shumwaymark commented Feb 20, 2021 • edited Loading

shumwaymark commented Aug 12, 2021

jeffbass commented Aug 12, 2021

sbkirby commented Aug 20, 2021 via email

jeffbass commented Sep 25, 2021

shumwaymark commented Oct 1, 2021

vkuehn commented Oct 1, 2021 • edited Loading

shumwaymark commented Oct 4, 2021

jeffbass commented Oct 5, 2021 • edited Loading

shumwaymark commented Feb 19, 2022 • edited Loading

jeffbass commented Feb 24, 2022

shumwaymark commented Apr 13, 2022 • edited Loading

First test of the OAK-1

jeffbass commented Apr 14, 2022

shumwaymark commented Apr 21, 2022

shumwaymark commented Mar 19, 2023 • edited Loading

jeffbass commented Mar 20, 2023

shumwaymark commented Mar 20, 2023

madjxatw commented Jul 24, 2023 • edited Loading

jeffbass commented Jul 27, 2023

madjxatw commented Jul 29, 2023 • edited Loading

shumwaymark commented Sep 9, 2023

shumwaymark commented Sep 12, 2024 • edited Loading

jeffbass commented Sep 14, 2024

shumwaymark commented Sep 18, 2024

jeffbass commented Sep 19, 2024

philipp-schmidt commented Apr 18, 2020 •

edited

Loading

shumwaymark commented Apr 19, 2020 •

edited

Loading

shumwaymark commented May 31, 2020 •

edited

Loading

shumwaymark commented Dec 16, 2020 •

edited

Loading

shumwaymark commented Feb 20, 2021 •

edited

Loading

vkuehn commented Oct 1, 2021 •

edited

Loading

jeffbass commented Oct 5, 2021 •

edited

Loading

shumwaymark commented Feb 19, 2022 •

edited

Loading

shumwaymark commented Apr 13, 2022 •

edited

Loading

shumwaymark commented Mar 19, 2023 •

edited

Loading

madjxatw commented Jul 24, 2023 •

edited

Loading

madjxatw commented Jul 29, 2023 •

edited

Loading

shumwaymark commented Sep 12, 2024 •

edited

Loading