Is there a health check of any kind on logspout?? #207

MaxFlanders · 2016-07-06T15:39:40Z

I have logspout set to publish its logs to a port, and typically I can view its output by curling that port, however I am currently getting no response. If I restart the container, it will typically solve this problem, but I'm trying to find a root cause. Is there a way to health check an existing logspout container to see why it is not responding before I restart it??

josegonzalez · 2016-07-06T20:10:11Z

Is there no response from docker logs CID?

MaxFlanders · 2016-07-07T12:41:08Z

There was a response from docker logs in the cases that I saw.

rosskukulinski · 2016-07-11T04:22:09Z

Also now watching this issue. I'm seeing similar behavior. Available and happy to help debug if you can suggest a place for me to start.

Noticed I'm using an old version -- will be updating to latest release tomorrow and seeing what happens.

chrishiestand · 2016-07-11T19:36:35Z

I've been having (probably) the same problem for about 2 weeks. I've been collecting data otherwise I would have submitted the issue sooner. I'm using gliderlabs/logspout as the image for a kubernetes v1.2.4 daemonset and it runs fine for some number of days before logs cease. There are no error messages, logspout just seems to stop working (though the container and processes stay running). I've got plenty of space left on papertrail, and if I restart the containers/pods logging resumes normally so I think we can rule out papertrail as the source problem.

I will try and leave the broken pod in the current broken state so I can collect more information from it or the node, please send me any requests for info you have.

When a container breaks, the most noticeable issue is that logs cease to be sent to papertrail. The container stays running. If I port forward to the pod so I can connect to the container's http server my curl command just waits and never outputs anything

kubectl port-forward logspout-papertrail-rsjok 9002:80

curl localhost:9002/logs # curl stays running, no output

The above commands work fine and output logs when I connect to a working logspout container.

I can log into the kubernetes node and strace the logspout pid:

root@gke-cluster-default-pool-4efa2519-rpot:~# strace -p 27341 -ff
Process 27341 attached with 7 threads - interrupt to quit
[pid 27348] futex(0xc82000d310, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 27351] epoll_wait(4,  <unfinished ...>
[pid 26309] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 27352] futex(0xc8200d8590, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 27347] futex(0xc82000ce90, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 27341] futex(0xabbf90, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 27346] restart_syscall(<... resuming interrupted call ...> <unfinished ...>
[pid 26309] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection timed out)
[pid 26309] clock_gettime(CLOCK_MONOTONIC, {1525060, 176301571}) = 0
[pid 26309] futex(0xabb660, FUTEX_WAKE, 1) = 1
[pid 27346] <... restart_syscall resumed> ) = 0
[pid 26309] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
[pid 27346] sched_yield( <unfinished ...>
[pid 26309] <... clock_gettime resumed> {1525060, 176491143}) = 0
[pid 27346] <... sched_yield resumed> ) = 0
[pid 26309] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
[pid 27346] futex(0xabb5a0, FUTEX_WAKE, 1 <unfinished ...>
[pid 26309] <... clock_gettime resumed> {1468265339, 321153236}) = 0
[pid 27346] <... futex resumed> )       = 0
[pid 26309] futex(0xc8200d8590, FUTEX_WAKE, 1 <unfinished ...>
[pid 27346] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
[pid 26309] <... futex resumed> )       = 1
[pid 27352] <... futex resumed> )       = 0
[pid 27346] <... clock_gettime resumed> {1525060, 176702791}) = 0
[pid 26309] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
[pid 27352] select(0, NULL, NULL, NULL, {0, 100} <unfinished ...>
[pid 27346] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
[pid 26309] <... clock_gettime resumed> {1525060, 176861054}) = 0
[pid 27346] <... clock_gettime resumed> {1468265339, 321421026}) = 0
[pid 26309] clock_gettime(CLOCK_MONOTONIC,  <unfinished ...>
[pid 27346] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
...

logspout just seems to be polling and not noticing any log events.

chrishiestand · 2016-07-11T19:46:06Z

The docker image ID of my current broken container is d36a70e7d3e0b189815821631e38065f638b4ce35c632fae74998ea1af2ab67a

josegonzalez · 2016-07-11T19:51:06Z

Thats interesting. Do you notice any other issues around the time it stops shipping data?

We could/should probably add statsd support so we can pinpoint how much data/when things stop being shipped. I'd happily look at such a pr and merge it for our next release.

chrishiestand · 2016-07-11T20:10:19Z

@josegonzalez No other issues around the time. This cluster is a low-traffic cluster so it should be pretty easy to tell if something is spiking or otherwise not working correctly.

chrishiestand · 2016-07-11T20:13:00Z

In other words, I think this is an issue of basic functionality breaking, not a resource related edge case.

josegonzalez · 2016-07-11T21:02:21Z

Are you using latest master or something else? We may have fixed your problem here #191.

chrishiestand · 2016-07-11T21:03:43Z

I've been using :latest and not :master - I'll switch and see if that fixes the problem thanks.

chrishiestand · 2016-07-15T23:46:49Z

@josegonzalez So I've switched to docker master tag (image id: e3a7c3c77162f1fa920194aeb890e4f5ea073b66d9202e2b5db258eae6bef3ab) and I'm still having problems though they might be different. The end result is similar, logs are not getting from the containers to papertrail.

The behavior I'm seeing is a small number of containers' logs get through. As I'm writing this only one container's logs, from one host, are getting into papertrail. If I connect directly via http to the logspout container I see the same behavior. If I use kubectl to read the docker logs then I can read them as expected - logging is still happening but logspout doesn't seem to know about it.

I have recently returned to papertrail at my new job and I'm surprised to be having these issues. I never had a single problem with the now deprecated progrium/logspout container.

chrishiestand · 2016-07-22T23:42:48Z

I switched back to progrium/logspout and the problems went away. I'll stop contributing to this thread since it's not directly related to the original topic. Sorry for hijacking it.

MaxFlanders · 2016-07-25T16:47:44Z

I am still seeing it intermittently unfortunately. Today I saw a new error though, I tried to curl to follow the log output of a container and got this response:

curl: (18) transfer closed with outstanding read data remaining

which was new. Restarting the logspout container fixed it though.

We are considering pursuing an alternate method of using logback to write logs directly to kafka due to these issues unfortunately :/

josegonzalez · 2016-07-25T17:11:57Z

@chrishiestand interesting. gliderlabs/logspout is essentially progrium/logspout, except further along in development. I'm not personally sure when things "broke" in such a way that we'd no longer be handling those cases, but unfortunately test coverage isn't great. There are a few PRs that should help ameliorate the issue, but they are still in a state of flux.

If someone would like to fund the work to investigate/fix any such issues, I'd be happy to talk, but otherwise this will have to take a backseat to other, paid work that the maintainers have.

edude03 · 2016-09-06T16:02:20Z

I'm using logspout on marathon and having this would alleviate #215 for me (especially combined with #201 ) anyone interested in a bounty?

ebr · 2016-09-08T22:01:10Z

From struggling with suddenly missing logs for the past several weeks, I'm quite sure at least one of the causes is log rotation when using the json-file driver. we switched to journald logging driver for docker, and the issue went away.

@edude03 give the brodsky/logspout:latest fork a try until #201 is merged - it's essentially a build of #201, which I just rebased.

BertHartm · 2017-05-23T18:36:57Z

I'm hitting this issue when using logspout on kubernetes. I switched to UDP instead of TCP and the issue seems to have been mitigated some. It's gone from seeming to stop within 24 hours to having been up for a few days. I'm wondering if it's volume related since I don't normally hit this on quieter machines, but the kubernetes setup I'm using is much more active.

I had been using the syslog+tls scheme on the path passed in as an arg, and changed it to be syslog://

michaelshobbs · 2017-06-07T17:13:56Z

Seems like this issue has meandered a bit and looks like a few things are discussed here. I'm going to close this and those that are still having a specific issue, please open a dedicated issue for that.

Note: there is a known issue for log flows halting if using docker's --log-opt max-size= option

wkruse · 2018-03-16T13:59:08Z

Related to #216.

jmarcos-cano · 2018-09-18T23:23:53Z

I have created this gist to show how I solved this by checking if logspout is sending outgoing traffic (TCP or UDP, which are my current needs).. im using this as a swarm healthcheck

michaelshobbs closed this as completed Jun 7, 2017

gliderlabs locked as off-topic and limited conversation to collaborators Sep 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a health check of any kind on logspout?? #207

Is there a health check of any kind on logspout?? #207

MaxFlanders commented Jul 6, 2016

josegonzalez commented Jul 6, 2016

MaxFlanders commented Jul 7, 2016

rosskukulinski commented Jul 11, 2016 •

edited

Loading

chrishiestand commented Jul 11, 2016

chrishiestand commented Jul 11, 2016

josegonzalez commented Jul 11, 2016

chrishiestand commented Jul 11, 2016

chrishiestand commented Jul 11, 2016

josegonzalez commented Jul 11, 2016

chrishiestand commented Jul 11, 2016

chrishiestand commented Jul 15, 2016 •

edited

Loading

chrishiestand commented Jul 22, 2016

MaxFlanders commented Jul 25, 2016

josegonzalez commented Jul 25, 2016 •

edited

Loading

edude03 commented Sep 6, 2016

ebr commented Sep 8, 2016

BertHartm commented May 23, 2017

michaelshobbs commented Jun 7, 2017

wkruse commented Mar 16, 2018

jmarcos-cano commented Sep 18, 2018 •

edited

Loading

Is there a health check of any kind on logspout?? #207

Is there a health check of any kind on logspout?? #207

Comments

MaxFlanders commented Jul 6, 2016

josegonzalez commented Jul 6, 2016

MaxFlanders commented Jul 7, 2016

rosskukulinski commented Jul 11, 2016 • edited Loading

chrishiestand commented Jul 11, 2016

chrishiestand commented Jul 11, 2016

josegonzalez commented Jul 11, 2016

chrishiestand commented Jul 11, 2016

chrishiestand commented Jul 11, 2016

josegonzalez commented Jul 11, 2016

chrishiestand commented Jul 11, 2016

chrishiestand commented Jul 15, 2016 • edited Loading

chrishiestand commented Jul 22, 2016

MaxFlanders commented Jul 25, 2016

josegonzalez commented Jul 25, 2016 • edited Loading

edude03 commented Sep 6, 2016

ebr commented Sep 8, 2016

BertHartm commented May 23, 2017

michaelshobbs commented Jun 7, 2017

wkruse commented Mar 16, 2018

jmarcos-cano commented Sep 18, 2018 • edited Loading

rosskukulinski commented Jul 11, 2016 •

edited

Loading

chrishiestand commented Jul 15, 2016 •

edited

Loading

josegonzalez commented Jul 25, 2016 •

edited

Loading

jmarcos-cano commented Sep 18, 2018 •

edited

Loading