summary: lilypond, lambda, and local-eval

Discussion:

summary: lilypond, lambda, and local-eval

Andy Wingo

2011-12-15 10:21:18 UTC

Hi,

The "delayed evaluation" thread is a bit long and confusing, so I would
like to try to summarize it.

Lilypond has a way to embed Lilypond code in Scheme, and Scheme code in
Lilypond code. The former uses a reader macro, #{#}. The latter uses
specially-marked variables and expressions, specifically, those preceded
by $ or #.

For example, the following Scheme expression is an embedded lilypond
code fragment:

#{ \displayLilyMusic $p #}

which expands out at read-time to:

(#<procedure embedded-lilypond (parser lily-string filename line closures)>
parser
" \\displayLilyMusic $p "
#f
0
(list (cons 20 (lambda () p))))

Here the procedure is embedded in the output of the reader macro. We
can see that the $p is translated to (cons 20 (lambda () p)). Whenever
$p is evaluated, that lambda is run.

Embedded Scheme (via $ or #) has access to Scheme's environment.
Variables in lilypond are in a separate environment (implemented using
modules), so we don't have to be concerned about lilypond accessing or
defining Scheme lexicals.

David hacks on Lilypond. He notes that the above expression used to
expand out to:

(#<procedure embedded-lilypond (parser lily-string filename line closures)>
parser
" \\displayLilyMusic $p "
#f
0
(the-environment))

in 1.8. Then, whenever lilypond needed to evaluate an embedded Scheme
value, it would use `local-eval' with the captured environment. It is
clearly much more convenient for lilypond hackers than having to scan
for embedded Scheme beforehand and build up closures for each embedded
Scheme expression. David noted that while "the closure solution" works
for him, he wondered if there was something better to use.

It took some time for everyone to understand the problem. In the end,
there were four workable possibilities.

1) Keep using closures.

2) Incorporate local-eval and the-environment into Guile 2.0.

3) Have lilypond use its own evaluator or compiler.

4) Have lilypond make the embedded lilypond code expand out to actual
Scheme. (It was unclear whether the lilypond grammar allowed
this.)

Mark and Noah looked at implementing local-eval, and I recommended
staying with the closure solution. Ludovic noted success with method
(3) in the Skribilo context.

I would like to start a new thread around local-eval, but besides that,
we should probably agree on the summary first. So please do send any
corrections to this summary to the list. Thanks :)

Andy

--
http://wingolog.org/

David Kastrup

2011-12-15 14:46:15 UTC

Post by Andy Wingo
It took some time for everyone to understand the problem. In the end,
there were four workable possibilities.
1) Keep using closures.
2) Incorporate local-eval and the-environment into Guile 2.0.
3) Have lilypond use its own evaluator or compiler.
4) Have lilypond make the embedded lilypond code expand out to actual
Scheme. (It was unclear whether the lilypond grammar allowed
this.)

It is pretty clear that the Lilypond grammar will not allow this. It
has ambiguities that can only be resolved at runtime and impact both the
parse tree as well as the tokenizing process.

Post by Andy Wingo
Mark and Noah looked at implementing local-eval, and I recommended
staying with the closure solution. Ludovic noted success with method
(3) in the Skribilo context.

Mark basically showed a sketch that worked by switching off the compiler
of the current top form when the-environment got used.

With LilyPond, my basic statement was that performance of the Scheme
part was a non-issue when compared against the Lilypond parts. As an
afterthought, this may not be totally correct: if you have code of the
kind

(define-music-function ... (x y z)
(do a complex calculation purely in Scheme)
(return results using #{ ... #}))

then the speed of the "calculation" can conceivably govern the runtime
after all, so switching off all compilations in the enclosing top form
might have runtime effects not masked by the costs of #{ ... #}. It is,
however, untypical to have costly calculations in the front end of
Lilypond. The runtime is usually spent elsewhere, mostly in C++ code.

Post by Andy Wingo
I would like to start a new thread around local-eval, but besides
that, we should probably agree on the summary first. So please do
send any corrections to this summary to the list. Thanks :)

A note to point 4) if LilyPond were a language for which expansion to
Scheme would be feasible, compilation of the whole nested constructs
would have to happen in one humongous piece since there is no other way
to share lexical environments.

And a note to point 1) the Scheme compiler is more likely to complain
about (lambda () arbitrary-garbage-sexp) than the interpreter when the
preliminary closure-building step picks up line noise from inside of
Lilypond comments or strings.

In comparison to the previous code version running under Guile 1.8,
robustness, cleanliness and manageability of the code have taken a step
backwards. It's not a mile, but not trivial either.

--
David Kastrup

Hans Aberg

2011-12-15 16:52:03 UTC

Post by Andy Wingo
The "delayed evaluation" thread is a bit long and confusing, so I would
like to try to summarize it.
Lilypond has a way to embed Lilypond code in Scheme, and Scheme code in
Lilypond code. The former uses a reader macro, #{#}. The latter uses
specially-marked variables and expressions, specifically, those preceded
by $ or #.

...

Post by Andy Wingo
It took some time for everyone to understand the problem. In the end,
there were four workable possibilities.
1) Keep using closures.

When doing a parser on top of Guile, I noticed one must first build an unevaluated closure, and only thereafter call the evaluator. Scheme has some restrictions forcing this, for example, the lambda cannot appear as a free symbol, though it is possible in Guile using scm_sym_lambda.

It might be useful with a variation of scm_eval_string() that only parses and builds the closure, but not calls the evaluator.

Hans

David Kastrup

2011-12-15 17:24:16 UTC

Post by Hans Aberg

Post by Andy Wingo
The "delayed evaluation" thread is a bit long and confusing, so I would
like to try to summarize it.
Lilypond has a way to embed Lilypond code in Scheme, and Scheme code in
Lilypond code. The former uses a reader macro, #{#}. The latter uses
specially-marked variables and expressions, specifically, those preceded
by $ or #.

...

Post by Andy Wingo
It took some time for everyone to understand the problem. In the end,
there were four workable possibilities.
1) Keep using closures.

When doing a parser on top of Guile, I noticed one must first build an
unevaluated closure, and only thereafter call the evaluator. Scheme
has some restrictions forcing this, for example, the lambda cannot
appear as a free symbol, though it is possible in Guile using
scm_sym_lambda.
It might be useful with a variation of scm_eval_string() that only
parses and builds the closure, but not calls the evaluator.

I am not sure what you mean with "closure" here, but just "read" parses
a form.

I was actually surprised playing around with the dirty call/cc hack I
posted about just how many uses of my-eval and my-env appear to do what
you would expect to in Guilev1:

(define (my-eval form env)
(call-with-current-continuation
(lambda (x)
(env (list x form)))))

(define-macro (my-env)
(call-with-current-continuation identity))

(define (xxx)
(let* ((x 2))
(set! x (+ x 3))
(my-env)))

(format #t "~a\n"
(my-eval
'(begin (set! x (+ x 5))
x)
(xxx)))

So far, just fooling around in the expectation that things will break
has not succeeded. One probably should do a more analytic attempt of
breakage. I have little doubt that using a compiler would make it much
less likely to get working results, but haven't tried.

Of course, exporting a half-finished evaluator is much less likely to
work reliably and with tolerable amount of stored information than a
properly exported environment would.

--
David Kastrup

Hans Aberg

2011-12-15 17:52:35 UTC

Post by David Kastrup

Post by Hans Aberg

Post by Andy Wingo
The "delayed evaluation" thread is a bit long and confusing, so I would
like to try to summarize it.
Lilypond has a way to embed Lilypond code in Scheme, and Scheme code in
Lilypond code. The former uses a reader macro, #{#}. The latter uses
specially-marked variables and expressions, specifically, those preceded
by $ or #.

...

Post by Andy Wingo
It took some time for everyone to understand the problem. In the end,
there were four workable possibilities.
1) Keep using closures.

When doing a parser on top of Guile, I noticed one must first build an
unevaluated closure, and only thereafter call the evaluator. Scheme
has some restrictions forcing this, for example, the lambda cannot
appear as a free symbol, though it is possible in Guile using
scm_sym_lambda.
It might be useful with a variation of scm_eval_string() that only
parses and builds the closure, but not calls the evaluator.

I am not sure what you mean with "closure" here, but just "read" parses
a form.

I build them from scratch, using the scm_x calls, with a parser. This way, it is possible to a more than within the Scheme paradigm itself. For example,
scm_list_x(scm_sym_lambda, ...)
gives an unevaluated lambda-expression. Some symbols are not available in Guile, so for example, I have in my init():
SCM list_ = scm_from_locale_symbol("list");

After an expression has been built, the parser calls scm_primitive_eval(). However, suppose one wants to insert some Scheme code in to the expression, then scm_eval_string() will call the evaluator, enforcing that only complete Scheme expressions can be inserted:

Without that restriction it would be possible to have an expression like
f := x |-> scheme "(+ x 1)"
where only the quote is the actual scheme code and the other handled by the external parser, and where the external "x" binds to the one in the Scheme code string "(+ x 1)". However, when scm_eval_string() calls the evaluator, one gets an error, because "x" is not bound.

Hans

Mark H Weaver

2011-12-16 07:35:48 UTC

Hello all,

Although it has not yet been decided whether `local-eval' will be
accepted into Guile 2, I've decided to proceed with a proper
implementation that is fully integrated into the compiler.

I hope to demonstrate that this feature can be implemented easily
without creating any significant maintenance burden.

Here's an outline of my plan:

How to compile (the-environment)
================================

Most passes of the compiler will pretend that (the-environment) is
replaced by tree-il code corresponding to the following standard scheme
code:

(make-lexical-environment ;; a normal procedure (constructor)
(case-lambda ;; general dispatcher to access or mutate any lexical
(() #f)
((name-XXX) ;; name-XXX is a gensym
(case name-XXX
((<var>) <var>) ;; one clause for each lexical
...))
((name-XXX value-XXX) ;; name-XXX and value-XXX are gensyms
(case name-XXX
((<var>) (set! <var> value-XXX)) ;; one clause for each lexical
...)))
(quote <compiler-environment>) ;; simple list structure
(quote <expander-environment>) ;; simple list structure
<module>) ;; XXX not sure how best to represent this

The <expander-environment> is extracted from the tree-il node
corresponding to (the-environment), created by the macro expander.
I've already implemented this part in my previous patch.

The <compiler-environment> would include a list of lexical variable
names (<var> ...), which must exactly correspond to the closure slots of
the `case-lambda', in order to implement `local-eval'. We _might_ also
need to include some information about how those variables are stored,
e.g. a flag telling whether they are boxed. (Usually they will all be
boxed, but

Note that general-purpose dispatcher will not actually be used by
`local-eval'. Its purpose is primarily to force all of the visible
lexicals to be boxed and included within the closure. They are also
there to make any fancy optimizers do the right thing.

KEY POINT: Since the general-purpose dispatcher would be sufficient to
implement `local-eval' in standard scheme, it communicates exactly the
right facts to the code analyzers, no matter how clever they are, and it
does so without the analyzers having to know anything about these new
primitives!

How to implement `local-eval'
=============================

When `local-eval' is called on a lexical environment that was created by
compiled code, it will do the following:

* Macroexpand the local expression within <expander-environment>.

* Compile the expanded expression within <compiler-environment>.
(I'll explain how to do this below)

* Make a copy of the closure from the lexical environment object, but
replace its code (the dispatcher) with the newly compiled code.

* Call the newly created closure.

So, the trick is to make sure that the newly compiled code accesses the
closure variables in exactly the same way as the general-purpose
dispatcher.

I haven't yet dug deeply enough into the compiler internals to know the
best way to do this. Ideally I would create a new procedure to handle
this simply and robustly, making use of <compiler-environment>.

For now, I will describe a method that I suspect would do the right
thing without any new compiler interfaces, though not as efficiently or
robustly: Simply compile the same general-purpose dispatcher as before,
except replace the #f (from the first case-lambda clause) with the
expanded local expression:

(lambda (<var> ...) ;; list of lexicals from <compiler-environment>
(case-lambda
(() <expanded local expression>)
((name-XXX) ;; name-XXX is a gensym
(case name-XXX
((<var>) <var>) ;; one clause for each lexical
...))
((name-XXX value-XXX) ;; name-XXX and value-XXX are gensyms
(case name-XXX
((<var>) (set! <var> value-XXX)) ;; one clause for each lexical
...))))

and then extract the code from the inner `case-lambda'. As described
above, this code would replace the code portion of the captured closure.

NOTE: This assumes that case-lambda creates only one closure for all
cases. I confess that I haven't yet verified this suspicion. If this
is not true, one could replace the case-lambda with the equivalent using
standard scheme.

Again, I don't expect to actually use this hack. I expect to simply
introduce a new internal compiler interface that essentially does the
same thing, but in a less fragile way.

However, I hope that these draft notes will give confidence that it is
possible to implement `the-environment' and `local-eval' without adding
any additional complexity to the compiler. No changes should be needed
to the analysis or optimization passes, aside from adding some clauses
to `record-case' for `the-environment' tree-il nodes, and a new
procedure or two to handle `local-eval'.

I intend to implement this soon. Comments and suggestions solicited.

Best,
Mark

Mark H Weaver

2011-12-16 08:08:35 UTC

Post by Mark H Weaver
The <compiler-environment> would include a list of lexical variable
names (<var> ...), which must exactly correspond to the closure slots of
the `case-lambda', in order to implement `local-eval'. We _might_ also
need to include some information about how those variables are stored,

Sorry, this paragraph should have ended here. The following unfinished

Post by Mark H Weaver
e.g. a flag telling whether they are boxed. (Usually they will all be
boxed, but

I was thinking about an edge case where the result of (the-environment)
never escapes the local block, and thus a clever optimizer might be able
to avoid boxing some of the lexicals. However, I later realized that
this could only happen if the returned environment was never passed to
`local-eval', in which case none of this matters anyway.

Mark

Mark H Weaver

2011-12-16 08:49:09 UTC

Post by Mark H Weaver
For now, I will describe a method that I suspect would do the right
thing without any new compiler interfaces, though not as efficiently or
robustly: Simply compile the same general-purpose dispatcher as before,
except replace the #f (from the first case-lambda clause) with the

Although this should result in the same set of captured lexicals, it
does not necessarily guarantee that the closure slots will be in the
same order. This could be perhaps be solved by always sorting the
captured lexicals by name, but that would slow down the compiler.
Depending on how the compiler works, it might be sufficient to move the
<expanded-local-expression> case to end of the case-lambda, but that's
definitely fragile.

So, I guess this all shows that `local-eval' really shouldn't be
implemented this way, but rather by creating a new internal interface to
the compiler that ensures that the closure slots are exactly the same as
before.

Post by Mark H Weaver
Most passes of the compiler will pretend that (the-environment) is
replaced by tree-il code corresponding to the following standard scheme

I should also mention that perhaps, instead of simply "pretending", it
might make sense to actually replace (the-environment) with the standard
scheme code I gave as early as possible, so that later passes will never
even see `the-environment' tree-il nodes. It need only be late enough
so that the list of visible lexical variable names is known at that
point.

Apologies for sending multiple messages so quickly.
Obviously this is a work-in-progress :)

Mark

David Kastrup

2011-12-16 09:16:43 UTC

Post by Mark H Weaver

Post by Mark H Weaver
For now, I will describe a method that I suspect would do the right
thing without any new compiler interfaces, though not as efficiently or
robustly: Simply compile the same general-purpose dispatcher as before,
except replace the #f (from the first case-lambda clause) with the

Although this should result in the same set of captured lexicals, it
does not necessarily guarantee that the closure slots will be in the
same order. This could be perhaps be solved by always sorting the
captured lexicals by name, but that would slow down the compiler.
Depending on how the compiler works, it might be sufficient to move the
<expanded-local-expression> case to end of the case-lambda, but that's
definitely fragile.
So, I guess this all shows that `local-eval' really shouldn't be
implemented this way, but rather by creating a new internal interface to
the compiler that ensures that the closure slots are exactly the same as
before.

Post by Mark H Weaver
Most passes of the compiler will pretend that (the-environment) is
replaced by tree-il code corresponding to the following standard scheme

I should also mention that perhaps, instead of simply "pretending", it
might make sense to actually replace (the-environment) with the standard
scheme code I gave as early as possible, so that later passes will never
even see `the-environment' tree-il nodes. It need only be late enough
so that the list of visible lexical variable names is known at that
point.
Apologies for sending multiple messages so quickly.
Obviously this is a work-in-progress :)

And I consider this _very_ exciting work in progress. One of the things
that the current development line of GCC markets is "compiler plugins".
Here GUILE has an opportunity to offer similar functionality in a
natural, Scheme-like manner with little complexity exposed to the user
of this feature, and apparently not all that much complexity needed to
get added to the compiler: it is more a matter of factoring the
complexity that has to be there anyway in a proper way. Which actually
might make the compiler code easier to understand und modify even if you
don't end up using local-eval.

Being able to employ this in Lilypond to simplify things would certainly
be a nice side benefit, but this has the potential to simplify and
facilitate quite more complex scenarios with simple tools.

It would be _much_ _much_ simpler to use than GCC plugins. And the
better it integrates with the compiler as a whole, the less reason would
be there _not_ to use it whenever it might be useful.

--
David Kastrup

Mark H Weaver

2011-12-18 07:11:41 UTC

Post by Mark H Weaver

Post by Mark H Weaver
Most passes of the compiler will pretend that (the-environment) is
replaced by tree-il code corresponding to the following standard scheme

I should also mention that perhaps, instead of simply "pretending", it
might make sense to actually replace (the-environment) with the standard
scheme code I gave as early as possible, so that later passes will never
even see `the-environment' tree-il nodes. It need only be late enough
so that the list of visible lexical variable names is known at that
point.

So, it turns out that the best place to transform (the-environment) into
standard scheme code is in the macro expander itself. Indeed, only the
macro expander has enough information to generate an optimal list of
"reachable lexicals", i.e. lexical variables that are accessible using
normal symbols (as opposed to syntax objects) [more on this below].

This is great news, because it means that `the-environment' will no
longer require its own tree-il type, and the compiler will only see the
standard scheme code that it expands into.

However, we will need a robust way to either (A) specify, (B) discover,
or (C) predict the order of the captured lexicals in a closure.

Option A would be ideal, because it would allow `local-eval' to easily
and robustly create a new closure that is "compatible" with the
general-dispatcher closure created by (the-environment), using only
standard compiler interfaces. By "compatible", I mean that their
captured lexicals have the same names and are stored in the same order
in the closure, so that code from the new closure can be spliced into
the existing general-dispatcher closure.

Option B would require `local-eval' to use a special interface to the
compiler to compile new code that is compatible with the existing
general-dispatcher closure.

Option C is similar to option B but probably more fragile, because
future changes to the compiler might invalidate the predictions and thus
break `local-eval'. However, this fragility could perhaps be mitigated
by prominent warnings in the comments of the code that determines
closure slot order.

I have yet to decide which option to take. Suggestions welcome.

* * * * *

There's also another issue that has come to my attention. If we must
support arbitrary syntax-objects to be passed to `local-eval', in many
(most?) cases this will greatly increase the number of lexicals that
must be captured, and thus boxed, by (the-environment).

Consider the following example:

(let* ((t 42)
(t (1+ t))
(t (1+ t))
(t (1+ t)))
(or #f (the-environment))))

If syntax-objects are never passed to `local-eval', then only one
lexical variable is reachable from (the-environment), namely the last
`t' in the `let*', and therefore that's the only variable that must be
boxed.

However, if we must support arbitrary syntax-objects, then not only are
all four of the above `t's reachable, but the hidden `t' from the
expansion of the `or' is reachable as well! (or at least I see no way
for the implementation of `the-environment' to prove that the user does
not possess such syntax-objects, given the way psyntax is implemented).

It would be a shame to force all of these shadowed and hidden variables
to be boxed. If we can minimize the number of captured variables, we
can reduce the performance impact of `the-environment'.

So, I'm thinking that (the-environment) should only capture lexical
variables that are reachable using normal symbols. I've already written
code for psyntax that generates this list of reachable lexicals. This
should suffice in the overwhelming majority of cases.

If needed, we could easily support a variant: (the-full-environment) or
(the-environment #:full), that captures _all_ lexicals reachable using
arbitrary syntax-objects.

What do you think?

Mark

Andy Wingo

2011-12-18 11:27:42 UTC

Post by Mark H Weaver
So, it turns out that the best place to transform (the-environment) into
standard scheme code is in the macro expander itself.

Are you certain that it is desirable to transform it into standard
Scheme code?

The statement that the-environment acts "as if it were the expression,
(case-lambda ....)" doesn't mean that you have to implement it that
way.

Post by Mark H Weaver
Indeed, only the macro expander has enough information to generate an
optimal list of "reachable lexicals", i.e. lexical variables that are
accessible using normal symbols (as opposed to syntax objects) [more
on this below].

Are you certain that you want to restrict the set of identifiers?

To me it sounds like an optimization, perhaps premature.

What do you think about having a <the-environment> tree-il form have a
field for the names and a field for the gensyms of captured lexicals?

Post by Mark H Weaver
This is great news, because it means that `the-environment' will no
longer require its own tree-il type, and the compiler will only see the
standard scheme code that it expands into.
However, we will need a robust way to either (A) specify, (B) discover,
or (C) predict the order of the captured lexicals in a closure.

We could compile `the-environment' so that at runtime it yields a record
containing a vector of syntax objects and a vector of corresponding
variable objects. (When the compiler boxes a variable, it does so in a
variable object, as from make-variable.) Then you don't introduce
cross-cutting assumptions to the compiler and runtime.

Post by Mark H Weaver
I have yet to decide which option to take. Suggestions welcome.

WDYT about mine? :)

Post by Mark H Weaver
There's also another issue that has come to my attention. If we must
support arbitrary syntax-objects to be passed to `local-eval', in many
(most?) cases this will greatly increase the number of lexicals that
must be captured, and thus boxed, by (the-environment).

If a tree-il `the-environment' form takes a list of names and gensyms,
then we can provide the possibility in the future to limit the set of
captured bindings.

Post by Mark H Weaver
So, I'm thinking that (the-environment) should only capture lexical
variables that are reachable using normal symbols.

I think I disagree here. It is strictly less useful to capture a subset
of bindings, and it would only be done for efficiency, and it probably
doesn't matter.

So yeah, I guess my arguments here depend on a tree-il the-environment
form. I wonder if that is the right thing, though; maybe there is a
lower-level concept at work. The only thing that you need that tree-il
doesn't give you right now is the ability to declare a variable as
boxed, and to capture its identity.

Maybe what we need is a <lexical-capture> form that evaluates to the
variable corresponding to a bound lexical. Then `the-environment' could
expand out to

(make-struct/no-tail <lexical-environment>
'(name ...)
(list (capture-lexical name) ...))

You would still need support from the expander to get the set of
currently-bound names, but maybe that is a new primitive that we could
add.

Could we do it all with two new low-level primitives? And then, could
we actually put `the-environment', environment accessors, and everything
else into a module?

Andy

--
http://wingolog.org/

Noah Lavine

2011-12-18 15:32:15 UTC

Hello,

Post by Andy Wingo

Post by Mark H Weaver
Indeed, only the macro expander has enough information to generate an
optimal list of "reachable lexicals", i.e. lexical variables that are
accessible using normal symbols (as opposed to syntax objects) [more
on this below].

Are you certain that you want to restrict the set of identifiers?
To me it sounds like an optimization, perhaps premature.

If I understand correctly, Mark wants to restrict the set of variables
you can access to those you could access through normal Scheme code.
This is an issue because psyntax happens to provide a way to access
more variables than standard Scheme. If this is the case, I think we
should absolutely restrict it to standard Scheme, because letting you
access everything psyntax can access a) is not Scheme and b) restricts
our future implementation choices.

Post by Andy Wingo
What do you think about having a <the-environment> tree-il form have a
field for the names and a field for the gensyms of captured lexicals?

Post by Mark H Weaver
This is great news, because it means that `the-environment' will no
longer require its own tree-il type, and the compiler will only see the
standard scheme code that it expands into.

This actually seems bad to me, although I'm just guessing. Because the
thing is, it's not *really* that Scheme code that you wrote, and so
the compiler is seeing something wrong. It has the same
variable-capture properties as that code, but it's not actually that.
My instinct is that compiling non-accurate code is going to be more
trouble than it's worth, but that's just a guess.

In general, this thread has been very, very impressive. Thanks a lot
to everyone who has been working so hard on this.

Noah

David Kastrup

2011-12-18 16:19:34 UTC

Let me first state that this thread is arguing at a depth where the only
contributions that remain for me to make are syllogisms without an
actual knowledge of what I am talking about.

In order not to appear ungrateful, I will do that, but there will be
little point in expecting me to be of assistance in judging their merit.

Post by Noah Lavine
If I understand correctly, Mark wants to restrict the set of variables
you can access to those you could access through normal Scheme code.
This is an issue because psyntax happens to provide a way to access
more variables than standard Scheme. If this is the case, I think we
should absolutely restrict it to standard Scheme, because letting you
access everything psyntax can access a) is not Scheme and b) restricts
our future implementation choices.

If psyntax accesses more than Scheme can access while doing a task that
is worth doing, what chance would there be in giving Scheme the power to
access what psyntax can?

If exporting enough of the compiler environment to be useful for
implementing multiple compilation units sharing lexical environments is
feasible, what are the implications for splitting a compilation into
separate units when the extent of psyntax is not similarly involved?

In short: where should one draw the line in a way that makes best use of
already done compilations without having to redo too much to be of
practical use?

Post by Noah Lavine
In general, this thread has been very, very impressive. Thanks a lot
to everyone who has been working so hard on this.

Agreed.

--
David Kastrup

Noah Lavine

2011-12-18 21:24:53 UTC

Hello,

Post by David Kastrup

Post by Noah Lavine
If I understand correctly, Mark wants to restrict the set of variables
you can access to those you could access through normal Scheme code.
This is an issue because psyntax happens to provide a way to access
more variables than standard Scheme. If this is the case, I think we
should absolutely restrict it to standard Scheme, because letting you
access everything psyntax can access a) is not Scheme and b) restricts
our future implementation choices.

If psyntax accesses more than Scheme can access while doing a task that
is worth doing, what chance would there be in giving Scheme the power to
access what psyntax can?

Let me explain what I mean more. Here is a code example:

(let ((x 3))
(let ((x (* x 5)))
(the-environment)))

At the point of the-environment, there is one Scheme variable around,
'x', and its value is 15. The old value of x is shadowed by the new
one.

But psyntax works by assigning each of these x values a gensym, and
they get different gensyms. So in theory, you could access the outer
value of x (x = 3) if you had a handle on the right gensym.

Supporting this seems weird to me, because I view psyntax as an
implementation detail of Guile, and not something that should affect
programmers. I also think that not supporting it is fine because if
you want to be able to access both values, you can always give your
variables different names. So this isn't giving you more power, but
just a weird way to access it.

Post by David Kastrup
If exporting enough of the compiler environment to be useful for
implementing multiple compilation units sharing lexical environments is
feasible, what are the implications for splitting a compilation into
separate units when the extent of psyntax is not similarly involved?
In short: where should one draw the line in a way that makes best use of
already done compilations without having to redo too much to be of
practical use?

I do not know enough to respond to the rest of your email.

Noah

Mark H Weaver

2011-12-19 09:13:26 UTC

Post by Andy Wingo
Are you certain that it is desirable to transform it into standard
Scheme code?
The statement that the-environment acts "as if it were the expression,
(case-lambda ....)" doesn't mean that you have to implement it that
way.

Yes of course, but nonetheless it was tempting because it would have
made the compiler implementation of `the-environment' and `local-eval'
much simpler.

However, having thought about it some more, I agree that this is the
wrong approach. It would have worked well with the particular
implementation of closures that we use in the compiler, but I haven't
figured out how to make it work with the evaluator. Furthermore, I can
imagine more advanced implementations of closures in a native compiler
that would break this approach as well.

Post by Andy Wingo

Post by Mark H Weaver
Indeed, only the macro expander has enough information to generate an
optimal list of "reachable lexicals", i.e. lexical variables that are
accessible using normal symbols (as opposed to syntax objects) [more
on this below].

Are you certain that you want to restrict the set of identifiers?
To me it sounds like an optimization, perhaps premature.

I can certainly believe that it might potentially be useful to capture
the entire environment, in case someone wants to do something clever
using a mixture of syntax-case and `the-environment'.

However, I think that for many (most?) usage scenarios, the expression
passed to `local-eval' will never reference these shadowed or hidden
identifiers, and this could make a noticeable difference in what the
optimizer is able to do.

It's easy to support both modes. If you feel strongly that the default
(the-environment) should capture everything, I'm probably okay with
that, but I'd like to support the other mode too.

Ideally I'd like to allow the user to specify a precise list of
variables to capture, and perhaps the other two variants should be
implemented in terms of this one.

Post by Andy Wingo
What do you think about having a <the-environment> tree-il form have a
field for the names and a field for the gensyms of captured lexicals?

Yes, I think this is the right approach. It will need an
<expander-environment> as well, of course.

Post by Andy Wingo
We could compile `the-environment' so that at runtime it yields a record
containing a vector of syntax objects and a vector of corresponding
variable objects. (When the compiler boxes a variable, it does so in a
variable object, as from make-variable.) Then you don't introduce
cross-cutting assumptions to the compiler and runtime.

I like the idea of explicitly including vectors of variable objects and
variable names instead of creating a closure. This decouples the
implementation of `local-eval' from the details of how closures are
represented.

However, I'm not sure about the syntax objects. What would we use them
for? The first thing `local-eval' does is to macroexpand the new
expression within the captured <expander-environment>, and henceforth
the compiler will see only gensyms. So it seems to me that those
gensyms are what we need.

Post by Andy Wingo
Maybe what we need is a <lexical-capture> form that evaluates to the
variable corresponding to a bound lexical. Then `the-environment' could
expand out to
(make-struct/no-tail <lexical-environment>
'(name ...)
(list (capture-lexical name) ...))

This is a very interesting idea. Note that we'd need to capture the
<expander-environment> as well. Also, the quoted list of names would
need to be the gensyms created by the expander, not the source names.
(Maybe that's what you meant, but I wasn't sure).

Post by Andy Wingo
You would still need support from the expander to get the set of
currently-bound names, but maybe that is a new primitive that we could
add.

This is easy enough. I've already written code for psyntax that
generates a list of lexicals that are reachable using normal symbols.
Creating a list of lexicals reachable by arbitrary syntax objects is
even easier.

Post by Andy Wingo
Could we do it all with two new low-level primitives? And then, could
we actually put `the-environment', environment accessors, and everything
else into a module?

This is a very interesting idea. If we used this approach, we would
also need a third primitive to capture the expander-environment, perhaps
called (the-expander-environment).

This again raises the problem that what to do when (the-environment), or
in this case (the-expander-environment), is found within a macro
definition. I think I finally have a good answer: do what datum->syntax
does. Take an extra syntax-object parameter, and use the lexical
context associated with that syntax-object.

(the-expander-environment <for-syntax>) would expand to (quote
<expander-environment>), where <expander-environment> corresponds to the
lexical context of the syntax-object <for-syntax>.

I'm not yet certain, but I think I like the idea of using these
primitives to implement (the-environment).

* * * * *

By the way, two other complications have come to my attention:

First of all, when we compile a file containing (the-environment),
gensym names will be serialized into the .go file as constants. When
`local-eval' later macroexpands a local expression within an
<expander-environment> that had been serialized from an earlier Guile
session, it needs to ensure that new gensyms do not collide with the
older ones. An easy solution would be for the local macroexpander (the
one that resumes from a captured <expander-environment>) to first make
sure the gensym counter is larger than any of the gensyms stored in that
environment.

The second complication is that an <expander-environment> captured
within a `let-syntax' form contains the lexically-bound syntax
transformer _procedures_, which are obviously not serializable and
therefore cannot be stored in a .go file. I guess the simple solution
here would be to document that (the-environment) within `let-syntax',
`letrec-syntax' or equivalent is not fully supported, and currently
cannot be compiled to a .go file.

I'll have to think about whether this limitation could be lifted. In
theory, it should be possible to serialize syntax-rules macros at least,
though even that might be a pain. I'm not hopeful that it can be done
for procedural macros in the general case. However, I don't see this as
a show-stopper in any case. It's no more serious than the existence of
procedures to the feasibility of `read' and `write'. We simply document
the limitation.

What do you think?

Mark

David Kastrup

2012-01-09 14:44:43 UTC

Post by Mark H Weaver
Ideally I'd like to allow the user to specify a precise list of
variables to capture, and perhaps the other two variants should be
implemented in terms of this one.

If you have to specify a precise _list_ of variables, the-environment is
pretty pointless since you can always write down an explicit list of
variables yourself. It would be different if you were able to specify
the _classes_ of variable to capture.

--
David Kastrup

Andy Wingo

2011-12-16 09:28:36 UTC

Hi Mark,

This is an interesting plan. I still have some doubts, but perhaps you
can make it work.

Post by Mark H Weaver
(quote <compiler-environment>) ;; simple list structure
(quote <expander-environment>) ;; simple list structure

Perhaps syntax-objects can give you what you need here.

Post by Mark H Weaver
<module>) ;; XXX not sure how best to represent this

The expander needs to serialize representations of modules in syntax
objects, and for that purpose it uses the module name. "Anonymous"
modules actually have generated names, lazily. Use `module-name'.

I like the "modeling the-environment as a scheme expression" approach;
it gives `the-environment' some semantics. I used to ignore people who
yammered on about semantics and languages, but I have grown to respect
that approach to language development. So thank you there.

Also, note that forms that `set!' all bound lexicals in the
`the-environment' will cause them all to be boxed, so there is no boxed
bitvector needed.

Post by Mark H Weaver
How to implement `local-eval'
=============================
When `local-eval' is called on a lexical environment that was created by
* Macroexpand the local expression within <expander-environment>.
* Compile the expanded expression within <compiler-environment>.
(I'll explain how to do this below)
* Make a copy of the closure from the lexical environment object, but
replace its code (the dispatcher) with the newly compiled code.
* Call the newly created closure.

We are really far from considering efficiency here :) Would you always
use the compiler for this task? (I think I would.) But otherwise, this
sounds sensible, with a caveat:: hygiene, nested definitions, and the
macro expander.

What does local-eval really mean?

What are the meanings of these expressions:

;; Toplevel
(local-eval '(define foo 42) (the-environment))

;; Lexical, tail context
(local-eval '(define foo 42) (let ((x 100)) (the-environment)))

;; Lexical, tail context -- but with a definition
(local-eval '(begin (define foo 42) foo) (let ((x 100)) (the-environment)))

;; Lexical, tail context -- but with a definition, and nested reference
(local-eval '(begin (define foo 42) (bar))
(let ((x 100)) (define (bar) foo) (the-environment)))

;; Lexical, not a definition context
(local-eval '(define foo 42) (let ((x 100)) not-a-definition (the-environment)))

What about this one:

;; Keeping in mind that `or' expands to (let ((t ...)) (if t t ...)),
;; hygienically
(local-eval 't '(let ((t 42)) (or #f (the-environment))))

Can you pass syntax objects into `local-eval'?

Are let-syntax / letrec-syntax / nested define-syntax forms present in a
local environment? Currently these things never leave the expander. I
suppose as long as they are understood to be opaque, it would be OK.

That's all I really wanted to write about local-eval, I think, so no
promised other mail. Thanks for thinking about these things :)

Andy

--
http://wingolog.org/

David Kastrup

2011-12-16 09:59:44 UTC

I found it amusing to see what my definitions using
with-current-continuation will produce here.

Post by Andy Wingo
;; Toplevel
(local-eval '(define foo 42) (the-environment))

guile> (my-eval '(define foo 42) (my-env))
guile> foo
42

Post by Andy Wingo
;; Lexical, tail context
(local-eval '(define foo 42) (let ((x 100)) (the-environment)))

guile> (my-eval '(define foo 42) (let ((x 100)) (my-env)))

Backtrace:
In standard input:
1: 0* [my-eval (define foo 42) ...
1: 1* (let* ((x 100)) (#<continuation 1401 @ 9253970> (define foo 42)))
1: 2 (begin #<continuation 1581 @ 9252000>)
1: 3 [#<continuation 1401 @ 9253970> ...
1: 4* (define foo 42)

standard input:1:11: In procedure memoization in expression (define foo 42):
standard input:1:11: In file "standard input", line 0: Bad define placement (define foo 42).
ABORT: (syntax-error)
guile>

Post by Andy Wingo
;; Lexical, tail context -- but with a definition
(local-eval '(begin (define foo 42) foo) (let ((x 100)) (the-environment)))

guile> (my-eval '(begin (define foo 42) foo) (let ((x 100)) (my-env)))

Backtrace:
In standard input:
6: 0* [my-eval (begin (define foo 42) foo) ...
6: 1* (let* ((x 100)) (#<continuation 1401 @ 9219b38> (begin # foo)))
6: 2 (begin #<continuation 1581 @ 9259a80>)
6: 3 [#<continuation 1401 @ 9219b38> ...
6: 4* (begin (define foo 42) foo)
6: 5* (define foo 42)

standard input:6:18: In procedure memoization in expression (define foo 42):
standard input:6:18: In file "standard input", line 5: Bad define placement (define foo 42).
ABORT: (syntax-error)
guile>

Post by Andy Wingo
;; Lexical, tail context -- but with a definition, and nested reference
(local-eval '(begin (define foo 42) (bar))
(let ((x 100)) (define (bar) foo) (the-environment)))

guile> (my-eval '(begin (define foo 42) (bar)) (let ((x 100)) (define (bar) foo) (my-env)))

Backtrace:
In standard input:
13: 0* [my-eval (begin (define foo 42) (bar)) ...
13: 1* (let* ((x 100)) (letrec (#) (# #)))
In unknown file:
?: 2 (letrec ((bar #)) (#<continuation 1401 @ 925e7c0> (begin # #)))
In standard input:
...
13: 3 [#<continuation 1401 @ 925e7c0> ...
13: 4* (begin (define foo 42) (bar))
13: 5* (define foo 42)

standard input:13:18: In procedure memoization in expression (define foo 42):
standard input:13:18: In file "standard input", line 12: Bad define placement (define foo 42).
ABORT: (syntax-error)

Post by Andy Wingo
;; Lexical, not a definition context
(local-eval '(define foo 42) (let ((x 100)) not-a-definition (the-environment)))

guile> (my-eval '(define foo 42) (let ((x 100)) "hello" (my-env)))

Backtrace:
In standard input:
12: 0* [my-eval (define foo 42) ...
12: 1* (let* ((x 100)) "hello" (#<continuation 1401 @ 9253970> (define foo 42)))
12: 2 (begin #<continuation 1581 @ 9252000>)
12: 3 [#<continuation 1401 @ 9253970> ...
12: 4* (define foo 42)

standard input:12:11: In procedure memoization in expression (define foo 42):
standard input:12:11: In file "standard input", line 11: Bad define placement (define foo 42).
ABORT: (syntax-error)
guile>

Post by Andy Wingo
;; Keeping in mind that `or' expands to (let ((t ...)) (if t t ...)),
;; hygienically
(local-eval 't '(let ((t 42)) (or #f (the-environment))))

Assuming that the second quote mark is a typo.

guile> (my-eval 't (let ((t 42)) (or #f (my-env))))
42
guile>

Now of course, the continuation based approach that just hijacks the
expander and jumps in and out of it is not really a measure of how
things should work. But it makes clear that (the-environment) is a bit
of a chimera: it captures content at a level conceptually relevant for
(define), but returns a value and has to be placed accordingly, like in
a function call or at the providing side of a binding construct.

If those different syntactic aspects prove to be too hard to conciliate,
it might help to look at the kind of interface that some other chimeras
like call-with-values or call-with-current-continuation have taken.

--
David Kastrup

Mark H Weaver

2011-12-16 10:33:09 UTC

Hi Andy, thanks for the quick feedback. I'll respond to the rest of

Post by Andy Wingo
;; Toplevel
(local-eval '(define foo 42) (the-environment))
;; Lexical, tail context
(local-eval '(define foo 42) (let ((x 100)) (the-environment)))
;; Lexical, tail context -- but with a definition
(local-eval '(begin (define foo 42) foo) (let ((x 100)) (the-environment)))
;; Lexical, tail context -- but with a definition, and nested reference
(local-eval '(begin (define foo 42) (bar))
(let ((x 100)) (define (bar) foo) (the-environment)))
;; Lexical, not a definition context
(local-eval '(define foo 42) (let ((x 100)) not-a-definition (the-environment)))

All of these will raise errors, because my current implementation of
`local-eval' (I just posted a new version of the evaluator-only patch)
uses `expand' instead of `expand-top-sequence' to expand the local
expression, therefore definitions are not allowed.

Post by Andy Wingo
;; Keeping in mind that `or' expands to (let ((t ...)) (if t t ...)),
;; hygienically
(local-eval 't '(let ((t 42)) (or #f (the-environment))))

Yes, even my first patch handles this properly (though you must use
`primitive-eval', since the compiler currently barfs if it encounters
`the-environment'):

scheme@(guile-user)> (local-eval 't (primitive-eval '(let ((t 42)) (or #f (the-environment)))))
$1 = 42

Post by Andy Wingo
Can you pass syntax objects into `local-eval'?

I believe so, but I haven't verified that this works as it should.
I don't have much experience using syntax-case.

The expression passed to `local-eval' simply gets passed along to
`expand' within psyntax.scm, using the saved "expander environment":
i.e. the values of the `r', `w', and `mod' in psyntax, at the point
where (the-environment) was expanded. This "expander environment" is
stored in the tree-il representation of (the-environment).

Here's what I currently see:

scheme@(guile-user)> (local-eval #'t (primitive-eval '(let ((t 42)) (or #f (the-environment)))))
ERROR: In procedure memoize-variable-access!:
ERROR: Unbound variable: t

This is the correct behavior, no?

Post by Andy Wingo
Are let-syntax / letrec-syntax / nested define-syntax forms present in a
local environment?

Yes:

scheme@(guile-user)> (define env1 (primitive-eval '(let-syntax ((foo (syntax-rules () ((foo x) (quote x))))) (let ((x 1) (y 2)) (the-environment)))))
scheme@(guile-user)> (local-eval '(foo (1 2)) env1)
$3 = (1 2)
scheme@(guile-user)> (define env2 (local-eval '(let-syntax ((bar (syntax-rules () ((bar x) (foo x))))) (let ((x 1) (z 3)) (the-environment))) env1))
scheme@(guile-user)> (local-eval '(bar (1 2)) env2)
$5 = (1 2)
scheme@(guile-user)> (local-eval '(foo (1 2)) env2)
$6 = (1 2)

Post by Andy Wingo
Currently these things never leave the expander. I suppose as long as
they are understood to be opaque, it would be OK.

Agreed. However, it should be noted that if we compile a form
containing (the-environment) into a .go file, the "expander environment"
will have to be stored within the .go file as an embedded constant.
This means that if the representations of r, w, or mod within psyntax
changes, any .go files containing uses of (the-environment) will have to
be recompiled, or else `local-eval' could fail ungracefully.

This is indeed an unpleasant complication. Automatic recompilation
could be achieved using a versioning scheme, using a separate
expander-environment-version number. With some care, we could perhaps
avoid needless recompilation of .go files that do not use
(the-environment) when the expander environment version changes, though
it might be a bit of a pain to do the necessary bookkeeping to determine
whether a .go file contains any expander environments. I'll have to
think more about this.

Mark

Hans Aberg

2011-12-16 12:13:38 UTC

Post by Mark H Weaver
ERROR: Unbound variable: t
This is the correct behavior, no?

This is what I get when I play around with the following variation of David's code in Guile 2.0.3:
(define (xxx)
(let* ((x 2))
(set! x (+ x 3))
(interaction-environment)))

(eval '(begin (set! x (+ x 5)) x) (xxx))

My guess (correct?) is that one wants some variation of (interaction-environment) that can cause x in the eval expression to bind to the environment returned by (xxx).

Might eval be changed to accommodate for that (without introducing the name local-eval)?

Hans

David Kastrup

2011-12-16 12:43:08 UTC

Post by David Kastrup

Post by Mark H Weaver
(or #f (the-environment)))))
ERROR: Unbound variable: t
This is the correct behavior, no?

(define (xxx)
(let* ((x 2))
(set! x (+ x 3))
(interaction-environment)))
(eval '(begin (set! x (+ x 5)) x) (xxx))
My guess (correct?) is that one wants some variation of
(interaction-environment) that can cause x in the eval expression to
bind to the environment returned by (xxx).
Might eval be changed to accommodate for that (without introducing the name local-eval)?

It would likely help with unasking the question of what to do when
(current-module) is different at the time of local-eval. I don't know,
however, what the _lexical_ effects of switching the current module are
supposed to be. If it is supposed to be a noop, then lexical
environments and modules are presumably orthogonal, and eval should
likely be allowed to take both (currently, local-eval is like taking a
lexical environment and using primitive-eval in it).

--
David Kastrup

Hans Aberg

2011-12-16 14:57:26 UTC

Post by David Kastrup

Post by David Kastrup

Post by Mark H Weaver
(or #f (the-environment)))))
ERROR: Unbound variable: t
This is the correct behavior, no?

(define (xxx)
(let* ((x 2))
(set! x (+ x 3))
(interaction-environment)))
(eval '(begin (set! x (+ x 5)) x) (xxx))
My guess (correct?) is that one wants some variation of
(interaction-environment) that can cause x in the eval expression to
bind to the environment returned by (xxx).
Might eval be changed to accommodate for that (without introducing the name local-eval)?

It would likely help with unasking the question of what to do when
(current-module) is different at the time of local-eval. I don't know,
however, what the _lexical_ effects of switching the current module are
supposed to be. If it is supposed to be a noop, then lexical
environments and modules are presumably orthogonal, and eval should
likely be allowed to take both (currently, local-eval is like taking a
lexical environment and using primitive-eval in it).

Would you not it work as though you inserted code in the place where then environment? - Then the syntactical rules should be captured as well.

In addition, there should be a way to communicate with the surrounding environment, wherefrom the code is inserted. The only truly safe way would be to make that explicit somehow, if not merely returning a value suffices.

Hans

Ian Hulin

2011-12-21 10:32:39 UTC

Hi Andy and Guile developers,

As the original OP about (local-eval) getting withdrawn please accept
my thanks for the excellent work going on in these threads. I still
follow it even though I barely understand a lot of the stuff that's
being discussed.

Thanks a million, guys, and compliments of the season,

Cheers,
Ian Hulin (LilyPond Guile V2 Migration troll)

22 Replies
42 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Andy Wingo 2011-12-15 10:21:18 UTC

David Kastrup 2011-12-15 14:46:15 UTC

Hans Aberg 2011-12-15 16:52:03 UTC

David Kastrup 2011-12-15 17:24:16 UTC

Hans Aberg 2011-12-15 17:52:35 UTC

Mark H Weaver 2011-12-16 07:35:48 UTC

Mark H Weaver 2011-12-16 08:08:35 UTC

Mark H Weaver 2011-12-16 08:49:09 UTC

David Kastrup 2011-12-16 09:16:43 UTC

Mark H Weaver 2011-12-18 07:11:41 UTC

Andy Wingo 2011-12-18 11:27:42 UTC

Noah Lavine 2011-12-18 15:32:15 UTC

David Kastrup 2011-12-18 16:19:34 UTC

Noah Lavine 2011-12-18 21:24:53 UTC

Mark H Weaver 2011-12-19 09:13:26 UTC

David Kastrup 2012-01-09 14:44:43 UTC

Andy Wingo 2011-12-16 09:28:36 UTC

David Kastrup 2011-12-16 09:59:44 UTC

Mark H Weaver 2011-12-16 10:33:09 UTC

Hans Aberg 2011-12-16 12:13:38 UTC

David Kastrup 2011-12-16 12:43:08 UTC

Hans Aberg 2011-12-16 14:57:26 UTC

Ian Hulin 2011-12-21 10:32:39 UTC

about - legalese

Loading...