Discussion:
Cygwin port of Guile 2.2
(too old to reply)
Mike Gran
2017-04-04 16:45:43 UTC
Permalink
Hello All,

I push a branch that mostly makes Guile 2.2 work on Cygwin.

In the end, I had a few issues.

Cygwin's uselocale is broken and can segrault. I fell back
to the standard nl-langinfo. I understand the issue and may
push a patch upstream to Cygwin sometime.

Also Cygwin's langinfo is incomplete with regards to LC
MONETARY. This also is fixable upstream in Cygwin.

Cygwin requires stability on both ends of a pipe being used
for inter-thread communication. If one tries to read from a
pipe that is closing on the other end, Cygwin may fault. I
worked around this by replacing pipes with a
producer/consumer pattern of semaphores and mutex-protected
variables. A dirty workaround.

Lastly, Cygwin's fork just doesn't play well with Guile's
new GC and threading model. Despite a copy of days in the
guts of it all, I don't understand the specifics of the
issue. So, in the branch, I disabled forking altogether,
and by extension, popen and friends.

While I couldn't get fork to work, the other threading
primitives seem fine.

The branch is wip-cygwin-guile-2.2.

Thanks,

Mike Gran
Eli Zaretskii
2017-04-04 17:03:48 UTC
Permalink
Date: Tue, 04 Apr 2017 09:45:43 -0700
Lastly, Cygwin's fork just doesn't play well with Guile's
new GC and threading model. Despite a copy of days in the
guts of it all, I don't understand the specifics of the
issue. So, in the branch, I disabled forking altogether,
and by extension, popen and friends.
For popen, you could use the MinGW code, I think.
Andy Wingo
2017-04-14 08:35:34 UTC
Permalink
Post by Mike Gran
I push a branch that mostly makes Guile 2.2 work on Cygwin.
Neat. LGTM with some nits, with the exception of the "(throw
'unresolved)" patch -- the corresponding catch is in
test-suite/test-suite/lib.scm:344.
Post by Mike Gran
Cygwin's uselocale is broken and can segrault. I fell back
to the standard nl-langinfo. I understand the issue and may
push a patch upstream to Cygwin sometime.
That would be great. Can you add a comment to this effect to the line
where you force the nl_langinfo path? It would be nice to be able to
remove this in the future.
Post by Mike Gran
Also Cygwin's langinfo is incomplete with regards to LC
MONETARY. This also is fixable upstream in Cygwin.
I didn't recall seeing a patch here but if there was, please note this
situation there.
Post by Mike Gran
Cygwin requires stability on both ends of a pipe being used
for inter-thread communication. If one tries to read from a
pipe that is closing on the other end, Cygwin may fault. I
worked around this by replacing pipes with a
producer/consumer pattern of semaphores and mutex-protected
variables. A dirty workaround.
This was the part I liked least, as you can imagine :) Oh well
though. I think it is fine as it is, if you add a comment as to why
it's there. Of course if you can find a solution that works with pipes,
that would be ideal!
Post by Mike Gran
Lastly, Cygwin's fork just doesn't play well with Guile's
new GC and threading model. Despite a copy of days in the
guts of it all, I don't understand the specifics of the
issue. So, in the branch, I disabled forking altogether,
and by extension, popen and friends.
I guess the new thing being the finalizer thread? In theory before the
fork, Guile will stop the finalizer thread. If that's not working, that
could be it.

Another option would be to disable fork() but only if Guile is built
with threads. Actually this is probably better -- I bet there are Guile
people that expect to be able to fork on Cygwin and would be OK if their
Guile had no threads.

Please feel free to push to master once there are adequate comments.
Someone will come later and want to enable fork() on Cygwin with threads
and will need to know why it's disabled and what they could do to fix it
:)

Thanks!

Andy
Derek Upham
2017-04-14 13:41:33 UTC
Permalink
Post by Andy Wingo
I guess the new thing being the finalizer thread? In theory before the
fork, Guile will stop the finalizer thread. If that's not working, that
could be it.
Another option would be to disable fork() but only if Guile is built
with threads. Actually this is probably better -- I bet there are Guile
people that expect to be able to fork on Cygwin and would be OK if their
Guile had no threads.
Please feel free to push to master once there are adequate comments.
Someone will come later and want to enable fork() on Cygwin with threads
and will need to know why it's disabled and what they could do to fix it
:)
There’s the finalizer thread, but there’s also the signal delivery
thread. The “sigaction” code ensures that signal delivery thread is
running. And note that “primitive-fork” is supposed to display a
warning if you are forking with outstanding threads, but it explicitly
ignores the signal delivery thread during its check.

I have found that what actually hangs after a fork are the mutexes
supporting the threads: they are kernel-level resources, referenced by
ID, and end up being shared between parent and child.

I’m baking a fix that wipes out user signals before calling
“primitve-fork”, and then restores them separately in the parent and
child processes. Doing so gives them different mutexes. It required
some support in scmsigs.c to:

- Have “restore-signals” return a description of the signal handlers,
that we can use to restore those handlers.
- Have “restore-signals” close the signal pipe (which kills the signal
delivery thread).
- When closing the signal pipe, wait for the signal delivery thread to
exit.

I don’t think there’s any safe way to restore the finalizer thread and
support SCSH-style (begin ...) process forms. Shutting down the
finalizer thread is the best we can do.

All of this is on GNU/Linux, of course, not Cygwin.

Derek
--
Derek Upham
***@blarg.net
Andy Wingo
2017-04-17 08:04:42 UTC
Permalink
Post by Derek Upham
There’s the finalizer thread, but there’s also the signal delivery
thread. The “sigaction” code ensures that signal delivery thread is
running. And note that “primitive-fork” is supposed to display a
warning if you are forking with outstanding threads, but it explicitly
ignores the signal delivery thread during its check.
Yep, though the signal delivery thread doesn't run unless you have
installed a signal handler. I think currently the combination of
signals + guile-with-threads + primitive-fork (rather than open-process)
isn't supported. It warns in 2.2 but the situation was the same in
2.0. I am not sure how it can be made to work (though I did try!) and
would be interested to see your fix :)
Post by Derek Upham
I have found that what actually hangs after a fork are the mutexes
supporting the threads: they are kernel-level resources, referenced by
ID, and end up being shared between parent and child.
Which ones, precisely?
Post by Derek Upham
I don’t think there’s any safe way to restore the finalizer thread and
support SCSH-style (begin ...) process forms. Shutting down the
finalizer thread is the best we can do.
The finalizer thread should be restored as needed, the next time GC
calls notify_finalizers_to_run.

I think also that if you are most interested in a system in which
primitive-fork plays a large role, then probably you want a Guile
without threads (including the GC mark threads). Threads + fork is not
a recipe for success :)

Andy
Derek Upham
2017-04-17 15:05:35 UTC
Permalink
Post by Andy Wingo
Post by Derek Upham
I have found that what actually hangs after a fork are the mutexes
supporting the threads: they are kernel-level resources, referenced by
ID, and end up being shared between parent and child.
Which ones, precisely?
GDB shows the hanging child process waiting on a mutex down in the
BDW-GC garbage collection library. The parent has the mutex, and is
blocked waiting to read data from the child.
Post by Andy Wingo
Post by Derek Upham
I don’t think there’s any safe way to restore the finalizer thread and
support SCSH-style (begin ...) process forms. Shutting down the
finalizer thread is the best we can do.
The finalizer thread should be restored as needed, the next time GC
calls notify_finalizers_to_run.
If you have two, initially identical Guile processes running, each of
which depend on the same external resource needing finalization, how do
you control cleanup in a way that’s safe for both processes? If either
the parent or the child finalizes, the other process may still be
depending on the external resource. The correct policy seems to be
“Don’t use finalizers with primitive-fork” in the same way that we say
“Don’t use threads with primitive-fork”. It’s a problem with
(begin ...) process forms, not with situations where the process execs
another program.
Post by Andy Wingo
I think also that if you are most interested in a system in which
primitive-fork plays a large role, then probably you want a Guile
without threads (including the GC mark threads). Threads + fork is not
a recipe for success :)
Understood. However, saying “primitive-fork requires a separate,
threadless Guile installation” would be very limiting. I’d like to be
in a position where Guile will manage any threads that it creates
under the hood; then it’s up to the program owner to guarantee that
there are no other threads running at fork time. Right now the
under-the-hood threads appear to be:

1. The finalizer thread.
2. The signal delivery thread.

(If you can think of any others, let me know.)

For SCSH-style process forms, this comes up when trying to implement
SCSH’s “early” child process auto-reap mechanism (section 3.4.1 in the
SCSH manual). The “early” policy works by setting up a handler for
SIGCHLD. We are guaranteed to have the signal delivery thread in that
situation. (The “late” policy hooks into the garbage collector, so it
avoids this problem; it can have pathological edge cases for
long-running programs, however.)

Derek
--
Derek Upham
***@blarg.net
Derek Upham
2017-05-01 20:48:13 UTC
Permalink
Post by Derek Upham
Post by Andy Wingo
I think also that if you are most interested in a system in which
primitive-fork plays a large role, then probably you want a Guile
without threads (including the GC mark threads). Threads + fork is not
a recipe for success :)
Understood. However, saying “primitive-fork requires a separate,
threadless Guile installation” would be very limiting. I’d like to be
in a position where Guile will manage any threads that it creates
under the hood; then it’s up to the program owner to guarantee that
there are no other threads running at fork time. Right now the
1. The finalizer thread.
2. The signal delivery thread.
(If you can think of any others, let me know.)
For SCSH-style process forms, this comes up when trying to implement
SCSH’s “early” child process auto-reap mechanism (section 3.4.1 in the
SCSH manual). The “early” policy works by setting up a handler for
SIGCHLD. We are guaranteed to have the signal delivery thread in that
situation. (The “late” policy hooks into the garbage collector, so it
avoids this problem; it can have pathological edge cases for
long-running programs, however.)
It turns out that, in the current Guile implementation, the thread cleanup code has inherent race conditions. Running pthread_join() on a thread only guarantees that the thread has returned an exit value. It doesn’t say anything about whether the thread has finally terminnated (which for Guile means running the thread-local variable cleanup using on_thread_exit()). If there is a delay in running the thread-local variable cleanup (which happens occasionally), we end up forking a child that doesn’t run the thread-local variable cleanup. In that case, the child process never updates the global list of threads and doesn’t unregister the thread from the garbage collector.

In these sorts of situations, the standard operating procedure is to extract most of the on_thread_exit() code into its own function, which the parent can call synchronously before the fork. For safety when doing explicit cleanup, we would:

- Use the thread admin mutex to provide cross-thread protection.
- Use the thread struct “exited” field to prevent double-freeing.

(To make things more interesting, on_thread_exit() can end up shutting down the signal delivery thread. We will have to handle or avoid a recursive mutex lock.)

However, the on_thread_exit() function has the comment

/* This handler is executed in non-guile mode. */

at the top. Can someone explain what that means and what the implications are? How can I safely invoke the contained code from other contexts?

GDB also shows that Guile is spawning two threads at startup, and those don’t show up in the list of threads known to the system.

Thanks,

Derek
--
Derek Upham
***@blarg.net
Andy Wingo
2017-05-02 19:35:59 UTC
Permalink
Post by Derek Upham
Running pthread_join() on a thread only guarantees that the thread has
returned an exit value.
Would you mind providing a reference please? It is not that I don't
believe you but I think it's important to know whether this is a bug in
Guile or in the pthreads implementation.
Post by Derek Upham
the on_thread_exit() function has the comment
/* This handler is executed in non-guile mode. */
at the top. Can someone explain what that means and what the
implications are? How can I safely invoke the contained code from
other contexts?
https://www.gnu.org/software/guile/manual/html_node/Initialization.html

Basically this code is not automatically traced by GC, it can't
allocate, and it can't call (most) libguile functions.

Andy
Derek Upham
2017-05-03 03:18:42 UTC
Permalink
Post by Andy Wingo
Post by Derek Upham
Running pthread_join() on a thread only guarantees that the thread has
returned an exit value.
Would you mind providing a reference please? It is not that I don't
believe you but I think it's important to know whether this is a bug in
Guile or in the pthreads implementation.
It’s not explicit, but it’s heavily implied by the pthread_exit(3) man
page:

The pthread_exit() function terminates the calling thread and
returns a value via retval that (if the thread is joinable) is
available to another thread in the same process that calls
pthread_join(3).

Any clean-up handlers established by pthread_cleanup_push(3) that
have not yet been popped, are popped (in the reverse of the order
in which they were pushed) and executed. If the thread has any
thread-specific data, then, after the clean-up handlers have been
executed, the corresponding destructor functions are called, in
an unspecified order.

The retval becomes available to anyone joining. Then the clean-up
handlers run. Then the thread-specific destructors run.

I think the flaw is that Guile is using the thread-specific destructors
to update global values. The thread is still effectively “live” (has
visible external effects) until the destructors finish running, even
though it appears to be dead if you were to ask pthreads about it. If
the destructor were just freeing heap memory (for example), then no one
would notice the delayed actions.

Derek
--
Derek Upham
***@blarg.net
Andy Wingo
2017-05-03 09:24:30 UTC
Permalink
Post by Derek Upham
Post by Andy Wingo
Post by Derek Upham
Running pthread_join() on a thread only guarantees that the thread has
returned an exit value.
Would you mind providing a reference please? It is not that I don't
believe you but I think it's important to know whether this is a bug in
Guile or in the pthreads implementation.
It’s not explicit, but it’s heavily implied by the pthread_exit(3) man
A specific implementation's man page is interesting but not the
specification. The specification is "The Open Group Base Specifications
Issue 7 IEEE Std 1003.1-2008, 2016 Edition" and can be found here:

http://pubs.opengroup.org/onlinepubs/9699919799/

Note that pthread_exit() is not the normal way for a Guile thread to
exit. I thought we removed uses of this but I see that there are a
couple more that need to be removed. Are you using it? It's certainly
not one that we suggest. We don't use pthread_cleanup_push either --
the lifetime of the thread from Guile's POV isn't limited to a dynamic
extent of scm_with_guile, as the thread could be created by the user, as

https://lists.gnu.org/archive/html/guile-devel/2011-04/msg00133.html

Anyway, I digress. After looking at the specification of pthread_join
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_join.html),
things do not seem clear to me. Does the thread "terminate" before or
after the pthread_key destructors run? Given that:

* pthread_key destructors must run in the thread AFAIU; to be able to
call pthread_setspecific in a destructor you need a fresh
or from a fresh new thread, and spawning a new thread to run cleanup
seems unlikely

* pthread_key destructors can call any function

I think there's an argument that a thread doesn't "terminate" until its
thread-local key destructors have finished running, and therefore
pthread_join doesn't return until after the key destructors have run.
This is my understanding of what happens from reading NPTL. Do I
understand correctly that you are on Cygwin? Could it be a cygwin
pthreads incompatibility?

Andy
szgyg
2017-05-03 09:39:28 UTC
Permalink
Post by Andy Wingo
Post by Derek Upham
Post by Andy Wingo
Post by Derek Upham
Running pthread_join() on a thread only guarantees that the thread has
returned an exit value.
Would you mind providing a reference please? It is not that I don't
believe you but I think it's important to know whether this is a bug in
Guile or in the pthreads implementation.
It’s not explicit, but it’s heavily implied by the pthread_exit(3) man
A specific implementation's man page is interesting but not the
specification. The specification is "The Open Group Base Specifications
http://pubs.opengroup.org/onlinepubs/9699919799/
Note that pthread_exit() is not the normal way for a Guile thread to
exit. I thought we removed uses of this but I see that there are a
couple more that need to be removed. Are you using it? It's certainly
not one that we suggest. We don't use pthread_cleanup_push either --
the lifetime of the thread from Guile's POV isn't limited to a dynamic
extent of scm_with_guile, as the thread could be created by the user, as
https://lists.gnu.org/archive/html/guile-devel/2011-04/msg00133.html
Anyway, I digress. After looking at the specification of pthread_join
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_join.html),
things do not seem clear to me. Does the thread "terminate" before or
* pthread_key destructors must run in the thread AFAIU; to be able to
call pthread_setspecific in a destructor you need a fresh
or from a fresh new thread, and spawning a new thread to run cleanup
seems unlikely
* pthread_key destructors can call any function
I think there's an argument that a thread doesn't "terminate" until its
thread-local key destructors have finished running, and therefore
pthread_join doesn't return until after the key destructors have run.
This is my understanding of what happens from reading NPTL. Do I
understand correctly that you are on Cygwin? Could it be a cygwin
pthreads incompatibility?
Andy
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09
2.9.5 Thread Cancellation
Thread Cancellation Cleanup Handlers

#v+
After all cancellation cleanup handlers and thread-specific data
destructors have returned, thread execution is terminated.
#v-

szgyg
Derek Upham
2017-05-03 14:21:32 UTC
Permalink
Post by Andy Wingo
I think there's an argument that a thread doesn't "terminate" until its
thread-local key destructors have finished running, and therefore
pthread_join doesn't return until after the key destructors have run.
This is my understanding of what happens from reading NPTL. Do I
understand correctly that you are on Cygwin? Could it be a cygwin
pthreads incompatibility?
GNU/Linux, Debian unstable, 4.9.x kernel. I can’t get too fancy with the stderr tracing in a multi-threaded environment, but I think I can demonstrate the delay.

Here’s a successful run in GDB:

Catchpoint 1 (load)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 2 at 0x7ffff7b7937c: file posix.c, line 1239.
[New Thread 0x7ffff6197700 (LWP 2225)]
[New Thread 0x7ffff5996700 (LWP 2226)]
[New Thread 0x7ffff5195700 (LWP 2227)]
[New Thread 0x7ffff44ff700 (LWP 2228)]
[New Thread 0x7ffff39fe700 (LWP 2229)]
checking device "/dev/disk/by-id/ata-WDC_WD1001FALS-00E8B0_WD-WMATV4103981-part1"
scm_join_thread start
on_thread_exit start
scm_join_thread end
scm_fork #threads (start) = 2
on_thread_exit end
on_thread_exit start
on_thread_exit end
scm_fork #threads (end) = 1
[Thread 0x7ffff39fe700 (LWP 2229) exited]
[Thread 0x7ffff44ff700 (LWP 2228) exited]
got timestamp string: "1493715602" "/dev/disk/by-id/ata-WDC_WD1001FALS-00E8B0_WD-WMATV4103981-part1"
using device: "/dev/disk/by-id/ata-WDC_WD1001FALS-00E8B0_WD-WMATV4103981-part1"
[Thread 0x7ffff5195700 (LWP 2227) exited]
[Thread 0x7ffff6197700 (LWP 2225) exited]
[Thread 0x7ffff7fcb740 (LWP 2218) exited]
[Inferior 1 (process 2218) exited normally]

The scm_join_thread function starts, then on_thread_exit starts, then scm_join_thread ends. (The underlying join operation completed.) Notice that we have already gotten into scm_fork before the cleanup finishes. Those are all for the signal delivery thread. Between the scm_fork start and end traces, we run the pre-fork finalizer cleanup, which causes the second on_thread_exit pair.

Unsuccessful run:

Catchpoint 1 (load)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 2 at 0x7ffff7b7937c: file posix.c, line 1239.
[New Thread 0x7ffff6197700 (LWP 2239)]
[New Thread 0x7ffff5996700 (LWP 2240)]
[New Thread 0x7ffff5195700 (LWP 2241)]
[New Thread 0x7ffff44ff700 (LWP 2242)]
[New Thread 0x7ffff39fe700 (LWP 2243)]
checking device "/dev/disk/by-id/ata-WDC_WD1001FALS-00E8B0_WD-WMATV4103981-part1"
scm_join_thread start
scm_join_thread end
scm_fork #threads (start) = 3
on_thread_exit start
on_thread_exit end
scm_fork #threads (end) = 2
[Thread 0x7ffff44ff700 (LWP 2242) exited]
on_thread_exit start
on_thread_exit end
[Thread 0x7ffff39fe700 (LWP 2243) exited]

Thread 1 "guile" hit Breakpoint 2, scm_fork () at posix.c:1239
1239 scm_display

We join on the signal delivery thread and then set scm_i_signal_delivery_thread to NULL. But we reach the start of scm_fork without any call to on_thread_exit. The finalizer thread exits and we clean it up. At that point scm_all_threads reports three threads, because (a) nothing has cleaned up the active thread list, and (b) none of the threads in the list matches scm_i_signal_delivery_thread (which is NULL). We reach the scm_fork warning block, where I have a breakpoint. The signal delivery thread cleanup finally happens between the breakpoint detection and the prompt.

Another successful run:

scm_join_thread start
on_thread_exit start
scm_join_thread end
on_thread_exit end
scm_fork #threads (start) = 2
on_thread_exit start
on_thread_exit end
scm_fork #threads (end) = 1
[Thread 0x7ffff39fe700 (LWP 2212) exited]
[Thread 0x7ffff44ff700 (LWP 2211) exited]

We join before on_thread_exit completes, but on_thread_exit completes before we start forking.

Another successful run:

scm_join_thread start
on_thread_exit start
on_thread_exit end
scm_join_thread end
scm_fork #threads (start) = 2
[Thread 0x7ffff39fe700 (LWP 2198) exited]
on_thread_exit start
on_thread_exit end
scm_fork #threads (end) = 1
[Thread 0x7ffff44ff700 (LWP 2197) exited]

on_thread_exit completes before we join. (This is the happy case.)

Two things I just noticed:

1. Even in the successful cases there are delays between on_thread_exit completing and GDB reporting that the thread has exited. Look at the second successful run (third in the overall list). That’s a huge time gap. It’s circumstantial evidence, but it’s suggestive.

2. In the first successful case (first it the overall list), the finalizer cleanup doesn’t start until the signal delivery thread cleanup completes. There’s nothing at the code level enforcing that. I wonder whether there is a single, hidden, background thread that coordinates the cleanup. That’s pure speculation.

Derek
--
Derek Upham
***@blarg.net
Andy Wingo
2017-05-09 19:08:08 UTC
Permalink
Post by Derek Upham
scm_join_thread start
scm_join_thread isn't actually implemented in terms of
scm_i_pthread_join any more. Probably that's what's going wrong here --
and probably that should be fixed to ensure that we actually join the
thread. (Otherwise it would be a memory leak too AFAIU.) Bcc'ing
bug-guile to create a bug for that.

Andy
Derek Upham
2017-05-12 14:13:06 UTC
Permalink
Post by Andy Wingo
scm_join_thread isn't actually implemented in terms of
scm_i_pthread_join any more. Probably that's what's going wrong here --
and probably that should be fixed to ensure that we actually join the
thread. (Otherwise it would be a memory leak too AFAIU.) Bcc'ing
bug-guile to create a bug for that.
I noticed that scm_join_thread was calling back into Scheme-land. Are these statements all correct?

- We are using call-with-new-thread underneath the hood.
- call-with-new-thread is documented to return a Scheme object from a thunk/handler. Any underlying pthreads should be implementation details.
- The spawned thread sends the Scheme object to the condition variable as soon as the user thunk exits. Any number of operations can happen afterwards; the thread is still running in Scheme-land at this point, in call-with-new-thread’s wrapping thunk.
- join-thread waits on the condition variable only.

So at the end of join-thread we need to add a call to scm_i_pthread_join (which we implement in threads.c) to ensure that the pthread is completely gone before that join-thread returns. Is that accurate? We can also update stop_finalization_thread to use the new scm_i_pthread_join.

Unfortunately, I think the GC threads are going to end up being immovable objects in the path to full process-form support.

Derek
--
Derek Upham
***@blarg.net
Andy Wingo
2017-05-15 20:06:27 UTC
Permalink
Greets,
Post by Derek Upham
Post by Andy Wingo
scm_join_thread isn't actually implemented in terms of
scm_i_pthread_join any more. Probably that's what's going wrong here --
and probably that should be fixed to ensure that we actually join the
thread. (Otherwise it would be a memory leak too AFAIU.) Bcc'ing
bug-guile to create a bug for that.
I noticed that scm_join_thread was calling back into Scheme-land. Are these statements all correct?
- We are using call-with-new-thread underneath the hood.
Underneath the hood of what? :)
Post by Derek Upham
- call-with-new-thread is documented to return a Scheme object from a
thunk/handler. Any underlying pthreads should be implementation
details.
Correct. In practice call-with-new-thread will create a pthread but I
can imagine circumstances in which it might (in the future) spawn an
auxiliary pthread for some reason, and I wouldn't want to rule that out.
Post by Derek Upham
- The spawned thread sends the Scheme object to the condition variable
as soon as the user thunk exits. Any number of operations can happen
afterwards; the thread is still running in Scheme-land at this point,
in call-with-new-thread’s wrapping thunk.
- join-thread waits on the condition variable only.
These are implementation details. They are correct but probably the
implementation should change to do the scm_i_pthread_join and we should
guarantee that after the join, the thread is really dead. This is bug
26858.
Post by Derek Upham
So at the end of join-thread we need to add a call to
scm_i_pthread_join (which we implement in threads.c) to ensure that
the pthread is completely gone before that join-thread returns. Is
that accurate?
Well... yes, but we have to ensure that we call scm_i_pthread_join at
most once. I think calling pthread_join twice on a thread is
undefined. So there are some gnarlies here. Need to fix this.
Post by Derek Upham
Unfortunately, I think the GC threads are going to end up being
immovable objects in the path to full process-form support.
You can disable marker threads with the GC_MARKERS environment variable,
and the finalization thread should come and go as needed. Probably this
is not a blocker from your POV. Signal handling is probably the most
serious issue; perhaps we can avoid the thread somehow, since we handle
signals asynchronously anyway..

Andy

zv
2017-05-04 05:21:57 UTC
Permalink
Post by Derek Upham
Post by Andy Wingo
Post by Derek Upham
Running pthread_join() on a thread only guarantees that the thread has
returned an exit value.
Would you mind providing a reference please? It is not that I don't
believe you but I think it's important to know whether this is a bug in
Guile or in the pthreads implementation.
It’s not explicit, but it’s heavily implied by the pthread_exit(3) man
The pthread_exit() function terminates the calling thread and
returns a value via retval that (if the thread is joinable) is
available to another thread in the same process that calls
pthread_join(3).
Any clean-up handlers established by pthread_cleanup_push(3) that
have not yet been popped, are popped (in the reverse of the order
in which they were pushed) and executed. If the thread has any
thread-specific data, then, after the clean-up handlers have been
executed, the corresponding destructor functions are called, in
an unspecified order.
The retval becomes available to anyone joining. Then the clean-up
handlers run. Then the thread-specific destructors run.
I think the flaw is that Guile is using the thread-specific destructors
to update global values. The thread is still effectively “live” (has
visible external effects) until the destructors finish running, even
though it appears to be dead if you were to ask pthreads about it. If
the destructor were just freeing heap memory (for example), then no one
would notice the delayed actions.
Derek
I believe Andy is spot-on here. I'm not familiar with a pthreads implementation
where pthread_join() doesn't wait until the referenced tid is finished (unless
you have reused a tid) (there may be a POSIX specification page dictating these
sort of standards however)
Continue reading on narkive:
Loading...