Thursday, August 4, 2011

Lazy initialization of Shared Variable Bindings

One problem we encounter with package management in Squeak/Pharo, but also in other Smalltalk dialects is the initialisation of shared variables.

Generally a class initialization message is defined like this:
MyClass class>>initialize
    MyShared1 := SomeOtherClass new.
    self snipLotOfMoreCode
And by convention, this initialize method is executed at package load.

But what if the initialization of a package require another package which is not yet loaded (for example containing SomeOtherClass or less trivially, containing a message sent during the initialization process).

One work around often encountered  is to use a message indirection coupled with a lazy initialization to access the shared variable value, like this:
MyClass class>>someSharedInfo
    ^SomeSharedInfo ifNil: [self initializeSomeSharedInfo]

But why putting pressure on coder when the system could do it by itself ?
The idea is to create a new class for holding uninitialized variable bindings or value and associate an initialization method to be executed when first access will be attempted.

Associating an initialization method could be performed through use of pragmas, for example:
MyClass class>>initializeSomeSharedVariables
    #<initialize>
    MyShared1 := 5.
    MyShared2 := #yourself.

In Smalltalk, a shared variable is a binding (an Association) with the variable name as key (generally a Symbol) and a value (the value of the variable). The Smalltalk Compiler arrange so that binding is unique across all method accessing the variable, thus the value can effectively be shared.

Now, what happens if we access MyShared1 ? In Smalltalk, shared variable are generally not accessed by sending value/value: messages to the binding. Instead, the compiler produces a byte code that directly fetch the second inst.var. for reading and another similar byte code for writing the value. 
But this can be controlled in Squeak/Pharo by a compiler hook: just define your own binding class and two methods:
YourOwnSharedVariableBinding>>isSpecialReadBinding
    "Return true if this variable binding is read protected, e.g., should not be accessed primitively but rather by sending #value messages"
    ^true
YourOwnSharedVariableBinding>>isSpecialWriteBinding
    "Return true if this variable binding is write protected, e.g., should not be accessed primitively but rather by sending #value: messages"
    ^true

So we now have many options to implement a lazy initialization...

Solution 1) instead of initializing value with nil, create an UnitializedSharedValue class pointing to the binding and an initialization message to be performed
And implement doesNotUnderstand: like this
ProtoObject subclass: #UnitializedSharedValue
    instanceVariableNames: 'binding initializer'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Utility'
setInitializer: anEvaluator
    "Set the evaluator responsible of initializing variable value.
    anEvaluator should understand the message value."
    initializer :=  anEvaluator
setBinding: aVariableBinding
    binding :=  aVariableBinding
doesNotUnderstand: aMessage
    initializer
        ifNil:
            [binding error: 'This variable binding does not have any known initializer'.
            ^nil].
    initializer value.
    binding value == self
        ifTrue:
            [binding error: 'The initializer failed to initialize this binding'.
            ^nil].
    ^binding value perform: aMessage selector withArguments: aMessage arguments

The problem with this kind of trick is that it is hard to catch all message sends (in Squeak/Pharo, class is not a true message send for example). And if you catch them, you then can't debug easily, because the debugger uses message sends too.

Solution 2) create AutoInitializedSharedVariableBinding that is Special Read Binding
and use an indirection to access the value:
LookupKey subclass: #AutoInitializedSharedVariableBinding
    instanceVariableNames: 'value undefined initializer'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Collections-Support'
initialize
    value := undefined := Object new.
isVariableBinding
    ^true
isSpecialReadBinding
    ^true
setInitializer: anEvaluator
    initializer :=  anEvaluator
value
    ^value == undefined
        ifTrue: [value]
        ifFalse: [self initializeValue]
initializeValue
    initializer
        ifNil:
            [self error: 'This variable binding does not have any known initializer'.
            ^nil].
    initializer value.
    value == undefined
        ifTrue:
            [self error: 'The initializer failed to initialize this binding'.
            ^nil].
    ^value

This will cost an indirection (the #value send) and one test at each access. Not worse than manual lazy initialization we saw first.

Solution 3) avoid the value == undefined test by using a become: trick. There will still be a value/value: indirection cost though, so it might not be worth the complications...
LookupKey subclass: #UninitializedVariableBinding
    instanceVariableNames: 'value'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Collections-Support'
key: aKey value: anEvaluator
    key := aKey.
    value := anEvaluator
setInitializer: anEvaluator
    value :=  anEvaluator
isVariableBinding
    ^true
isSpecialReadBinding
    ^true
isSpecialWriteBinding
    ^true
value
    ^value
        ifNil: [UnitializedVariableException signal]
        ifNotNil: [self value: value value]
value: anObject
    self become: (InitializedVariableBinding key: key value: anObject).
    ^anObject

LookupKey subclass: #InitializedVariableBinding
    instanceVariableNames: 'value'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Collections-Support'
key: aKey value: anObject
    key := aKey.
    value := anObject
setInitializer: anEvaluator
    self become: (UninitializedVariableBinding key: key value: anEvaluator).
    ^self
isVariableBinding
    ^true
isSpecialReadBinding
    ^true
isSpecialWriteBinding
    ^true
value
    ^value
value: anObject
    ^value := anObject

Note the return in last method so that we can chain MyShared1 := MyShared2 := someValue...

Solution 4) like in Cincom VisualWorks, let all SharedVariableBinding carry their own initialization code (which is like a clean block). But this does not solve the initialization order problems and this requires tools support (and also source code management support for storing/retrieving/executing the code associated to the shared variable binding).

The pragma handling common to all solutions:
Just create this class
InstructionClient subclass: #SharedInitializationFinder
    instanceVariableNames: 'usedBeforeDefined initializedBindingsByFetchingValue initializedBindingsBySendingValue stack stackPerPC pc'
    classVariableNames: ''
    poolDictionaries: ''
    category: 'Utility'`

With these class methods:
initialize
    "SharedInitializationFinder initialize."
    self registerForEvents
registerForEvents
    SystemChangeNotifier uniqueInstance noMoreNotificationsFor: self.
    SystemChangeNotifier uniqueInstance notify: self ofAllSystemChangesUsing: #initializeEvent:.
scanMethod: aCompiledMethod
    "Answer with a list of variable bindings initialized by aCompiledMethod via sending #value: message."
    ^self new scanMethod: aCompiledMethod
initializeEvent: anEvent
    "Check if this system event defines or removes an automatic initialization."
    | aClass aSelector method |
    (anEvent itemKind = SystemChangeNotifier classKind and: [anEvent isRemoved])
        ifTrue: ["We should track initializations bound to this class..."].
    anEvent itemKind = SystemChangeNotifier methodKind ifTrue: [
        aClass := anEvent itemClass.
        aClass isMeta ifFalse: [^self]. "ignore instance methods"
        aClass := aClass theNonMetaClass.
        aSelector := anEvent itemSelector.
        aSelector numArgs > 0 ifTrue: [^self]. "ignore methods with arguments"
        (anEvent isRemoved) ifTrue: [
            "We should track initializations bound to this class and method..."].
        (anEvent isAdded or: [anEvent isModified]) ifTrue: [
            method := anEvent item.
            method pragmas do: [:pragma |
                | shared |
                pragma keyword == #initialize ifTrue: [
                    shared := self scanMethod: method.
                    shared do: [:binding | binding setInitializer: (MessageSend receiver: aClass selector: aSelector)]]]]].
 
On instance side, let the class process the bytecodes decoding (most are omitted), the most important being:
initializeBindings
    usedBeforeDefined := Set new.
    initializedBindingsByFetchingValue := Set new.
    initializedBindingsBySendingValue := Set new.
initializedBindingsByFetchingValue
    "Answer with the list of variable bindings being initialized by target CompiledMethod by directly storing into value."
    ^initializedBindingsByFetchingValue
initializedBindingsBySendingValue
    "Answer with the list of variable bindings being initialized by target CompiledMethod by directly sending value: message."
    ^initializedBindingsBySendingValue
pushLiteralVariable: anAssociation
    "Push Contents Of anAssociation On Top Of Stack bytecode."
    stack addLast: anAssociation
popIntoLiteralVariable: anAssociation
    "Remove Top Of Stack And Store Into Literal Variable bytecode."
    initializedBindingsByFetchingValue add: anAssociation.
    stack removeLast
send: aSelector super: supered numArgs: numberArguments
    | theReceiver |
    1 to: numberArguments do: [:i | stack removeLast].
    theReceiver := stack removeLast.
    theReceiver ifNotNil: [
        aSelector == #value ifTrue: [usedBeforeDefined add: theReceiver].
        aSelector == #value: ifTrue: ["Protect against trivial infinite loops..."
            (usedBeforeDefined includes: theReceiver)
                ifFalse: [initializedBindingsBySendingValue add: theReceiver]]].
    stack addLast: nil
scanMethod: method
    | scanner end |
    self initializePC.
    scanner := InstructionStream on: method.
    end := method endPC.
    [(pc := scanner pc) <= end] whileTrue:
        [stackPerPC at: pc ifPresent: [:restoredStack | stack := restoredStack].
        scanner interpretNextInstructionFor: self].
    ^self initializedBindingsBySendingValue

Last, we need to define a hook, such that any binding could become an uninitialized or auto initialized variable binding, for example, solution 3) would be:
LookUpKey>>setInitializer: anEvaluator
    "Set the evaluator responsible of initializing variable value.
    anEvaluator should understand the message value."
    self become: (UninitializedVariableBinding key: key value: anEvaluator).
    ^self

The nice thing with above hook is that each time you change the initialization method, the variables will be reset by the setInitializer: mechanism. No more manual doIt : self initializeThisOrThat.
Of course, there are more things to handle before getting a fully functioning framework:
1) what if you remove an initializer method/class ?
2) what if you define two or more initializer methods ?
Especially, if you define a second initialization method then, oops, remove it, the variables should switch back to the first initialization...
3) detecting unchanged initialization method and avoiding unecessary re-initialization would be smart.
But these are boring little details... (where the devil is hidden as you may know).

1 comment: