VMS/VMS_Implementations/VMS_impls/VMS__MC_shared_impl: AnimationMaster.c annotate

annotate AnimationMaster.c @ 230:f2a7831352dc

changed SchedulingMaster.c to AnimationMaster.c and cleaned up all comments

author	Some Random Person <seanhalle@yahoo.com>
date	Thu, 15 Mar 2012 20:31:41 -0700
parents
children	88fd85921d7f

rev	line source
seanhalle@230	1 /*
seanhalle@230	2 * Copyright 2010 OpenSourceStewardshipFoundation
seanhalle@230	3 *
seanhalle@230	4 * Licensed under BSD
seanhalle@230	5 */
seanhalle@230	6
seanhalle@230	7
seanhalle@230	8
seanhalle@230	9 #include <stdio.h>
seanhalle@230	10 #include <stddef.h>
seanhalle@230	11
seanhalle@230	12 #include "VMS.h"
seanhalle@230	13
seanhalle@230	14
seanhalle@230	15 //========================= Local Fn Prototypes =============================
seanhalle@230	16 void inline
seanhalle@230	17 stealWorkInto( SchedSlot currSlot, VMSQueueStruc readyToAnimateQ,
seanhalle@230	18 SlaveVP *masterVP );
seanhalle@230	19
seanhalle@230	20 //===========================================================================
seanhalle@230	21
seanhalle@230	22
seanhalle@230	23
seanhalle@230	24 /*The animationMaster embodies most of the animator of the language. The
seanhalle@230	25 * animator is what emodies the behavior of language constructs.
seanhalle@230	26 * As such, it is the animationMaster, in combination with the plugin
seanhalle@230	27 * functions, that make the language constructs do their behavior.
seanhalle@230	28 *
seanhalle@230	29 *Within the code, this is the top-level-function of the masterVPs, and
seanhalle@230	30 * runs when the coreController has no more slave VPs. It's job is to
seanhalle@230	31 * refill the animation slots with slaves.
seanhalle@230	32 *
seanhalle@230	33 *To do this, it scans the animation slots for just-completed slaves.
seanhalle@230	34 * Each of these has a request in it. So, the master hands each to the
seanhalle@230	35 * plugin's request handler.
seanhalle@230	36 *Each request represents a language construct that has been encountered
seanhalle@230	37 * by the application code in the slave. Passing the request to the
seanhalle@230	38 * request handler is how that language construct's behavior gets invoked.
seanhalle@230	39 * The request handler then performs the actions of the construct's
seanhalle@230	40 * behavior. So, the request handler encodes the behavior of the
seanhalle@230	41 * language's parallelism constructs, and performs that when the master
seanhalle@230	42 * hands it a slave containing a request to perform that construct.
seanhalle@230	43 *
seanhalle@230	44 *On a shared-memory machine, the behavior of parallelism constructs
seanhalle@230	45 * equals control, over order of execution of code. Hence, the behavior
seanhalle@230	46 * of the language constructs performed by the request handler is to
seanhalle@230	47 * choose the order that slaves get animated, and thereby control the
seanhalle@230	48 * order that application code in the slaves executes.
seanhalle@230	49 *
seanhalle@230	50 *To control order of animation of slaves, the request handler has a
seanhalle@230	51 * semantic environment that holds data structures used to hold slaves
seanhalle@230	52 * and choose when they're ready to be animated.
seanhalle@230	53 *
seanhalle@230	54 *Once a slave is marked as ready to be animated by the request handler,
seanhalle@230	55 * it is the second plugin function, the Assigner, which chooses the core
seanhalle@230	56 * the slave gets assigned to for animation. Hence, the Assigner doesn't
seanhalle@230	57 * perform any of the semantic behavior of language constructs, rather
seanhalle@230	58 * it gives the language a chance to improve performance. The performance
seanhalle@230	59 * of application code is strongly related to communication between
seanhalle@230	60 * cores. On shared-memory machines, communication is caused during
seanhalle@230	61 * execution of code, by memory accesses, and how much depends on contents
seanhalle@230	62 * of caches connected to the core executing the code. So, the placement
seanhalle@230	63 * of slaves determines the communication caused during execution of the
seanhalle@230	64 * slave's code.
seanhalle@230	65 *The point of the Assigner, then, is to use application information during
seanhalle@230	66 * execution of the program, to make choices about slave placement onto
seanhalle@230	67 * cores, with the aim to put slaves close to caches containing the data
seanhalle@230	68 * used by the slave's code.
seanhalle@230	69 *
seanhalle@230	70 *==========================================================================
seanhalle@230	71 *In summary, the animationMaster scans the slots, finds slaves
seanhalle@230	72 * just-finished, which hold requests, pass those to the request handler,
seanhalle@230	73 * along with the semantic environment, and the request handler then manages
seanhalle@230	74 * the structures in the semantic env, which controls the order of
seanhalle@230	75 * animation of slaves, and so embodies the behavior of the language
seanhalle@230	76 * constructs.
seanhalle@230	77 *The animationMaster then rescans the slots, offering each empty one to
seanhalle@230	78 * the Assigner, along with the semantic environment. The Assigner chooses
seanhalle@230	79 * among the ready slaves in the semantic Env, finding the one best suited
seanhalle@230	80 * to be animated by that slot's associated core.
seanhalle@230	81 *
seanhalle@230	82 *==========================================================================
seanhalle@230	83 *Implementation Details:
seanhalle@230	84 *
seanhalle@230	85 *There is a separate masterVP for each core, but a single semantic
seanhalle@230	86 * environment shared by all cores. Each core also has its own scheduling
seanhalle@230	87 * slots, which are used to communicate slaves between animationMaster and
seanhalle@230	88 * coreController. There is only one global variable, _VMSMasterEnv, which
seanhalle@230	89 * holds the semantic env and other things shared by the different
seanhalle@230	90 * masterVPs. The request handler and Assigner are registered with
seanhalle@230	91 * the animationMaster by the language's init function, and a pointer to
seanhalle@230	92 * each is in the _VMSMasterEnv. (There are also some pthread related global
seanhalle@230	93 * vars, but they're only used during init of VMS).
seanhalle@230	94 *VMS gains control over the cores by essentially "turning off" the OS's
seanhalle@230	95 * scheduler, using pthread pin-to-core commands.
seanhalle@230	96 *
seanhalle@230	97 *The masterVPs are created during init, with this animationMaster as their
seanhalle@230	98 * top level function. The masterVPs use the same SlaveVP data structure,
seanhalle@230	99 * even though they're not slave VPs.
seanhalle@230	100 *A "seed slave" is also created during init -- this is equivalent to the
seanhalle@230	101 * "main" function in C, and acts as the entry-point to the VMS-language-
seanhalle@230	102 * based application.
seanhalle@230	103 *The masterVPs shared a single system-wide master-lock, so only one
seanhalle@230	104 * masterVP may be animated at a time.
seanhalle@230	105 *The core controllers access _VMSMasterEnv to get the masterVP, and when
seanhalle@230	106 * they start, the slots are all empty, so they run their associated core's
seanhalle@230	107 * masterVP. The first of those to get the master lock sees the seed slave
seanhalle@230	108 * in the shared semantic environment, so when it runs the Assigner, that
seanhalle@230	109 * returns the seed slave, which the animationMaster puts into a scheduling
seanhalle@230	110 * slot then switches to the core controller. That then switches the core
seanhalle@230	111 * over to the seed slave, which then proceeds to execute language
seanhalle@230	112 * constructs to create more slaves, and so on. Each of those constructs
seanhalle@230	113 * causes the seed slave to suspend, switching over to the core controller,
seanhalle@230	114 * which eventually switches to the masterVP, which executes the
seanhalle@230	115 * request handler, which uses VMS primitives to carry out the creation of
seanhalle@230	116 * new slave VPs, which are marked as ready for the Assigner, and so on..
seanhalle@230	117 *
seanhalle@230	118 *On animation slots, and system behavior:
seanhalle@230	119 * A request may linger in a animation slot for a long time while
seanhalle@230	120 * the slaves in the other slots are animated. This only becomes a problem
seanhalle@230	121 * when such a request is a choke-point in the constraints, and is needed
seanhalle@230	122 * to free work for other cores. To reduce this occurance, the number
seanhalle@230	123 * of animation slots should be kept low. In balance, having multiple
seanhalle@230	124 * animation slots amortizes the overhead of switching to the masterVP and
seanhalle@230	125 * executing the animationMaster code, which drives for more than one. In
seanhalle@230	126 * practice, the best balance should be discovered by profiling.
seanhalle@230	127 */
seanhalle@230	128 void animationMaster( void initData, SlaveVP masterVP )
seanhalle@230	129 {
seanhalle@230	130 //Used while scanning and filling animation slots
seanhalle@230	131 int32 slotIdx, numSlotsFilled;
seanhalle@230	132 SchedSlot currSlot, *schedSlots;
seanhalle@230	133 SlaveVP *assignedSlaveVP; //the slave chosen by the assigner
seanhalle@230	134
seanhalle@230	135 //Local copies, for performance
seanhalle@230	136 MasterEnv *masterEnv;
seanhalle@230	137 SlaveAssigner slaveAssigner;
seanhalle@230	138 RequestHandler requestHandler;
seanhalle@230	139 void *semanticEnv;
seanhalle@230	140 int32 thisCoresIdx;
seanhalle@230	141
seanhalle@230	142 //======================== Initializations ========================
seanhalle@230	143 masterEnv = (MasterEnv*)_VMSMasterEnv;
seanhalle@230	144
seanhalle@230	145 thisCoresIdx = masterVP->coreAnimatedBy;
seanhalle@230	146 schedSlots = masterEnv->allSchedSlots[thisCoresIdx];
seanhalle@230	147
seanhalle@230	148 requestHandler = masterEnv->requestHandler;
seanhalle@230	149 slaveAssigner = masterEnv->slaveAssigner;
seanhalle@230	150 semanticEnv = masterEnv->semanticEnv;
seanhalle@230	151
seanhalle@230	152
seanhalle@230	153 //======================== animationMaster ========================
seanhalle@230	154 while(1){
seanhalle@230	155
seanhalle@230	156 MEAS__Capture_Pre_Master_Point
seanhalle@230	157
seanhalle@230	158 //Scan the animation slots
seanhalle@230	159 numSlotsFilled = 0;
seanhalle@230	160 for( slotIdx = 0; slotIdx < NUM_SCHED_SLOTS; slotIdx++)
seanhalle@230	161 {
seanhalle@230	162 currSlot = schedSlots[ slotIdx ];
seanhalle@230	163
seanhalle@230	164 //Check if newly-done slave in slot, which will need request handld
seanhalle@230	165 if( currSlot->workIsDone )
seanhalle@230	166 {
seanhalle@230	167 currSlot->workIsDone = FALSE;
seanhalle@230	168 currSlot->needsSlaveAssigned = TRUE;
seanhalle@230	169
seanhalle@230	170 MEAS__startReqHdlr;
seanhalle@230	171
seanhalle@230	172 //process the requests made by the slave (held inside slave struc)
seanhalle@230	173 (*requestHandler)( currSlot->slaveAssignedToSlot, semanticEnv );
seanhalle@230	174
seanhalle@230	175 MEAS__endReqHdlr;
seanhalle@230	176 }
seanhalle@230	177 //If slot empty, hand to Assigner to fill with a slave
seanhalle@230	178 if( currSlot->needsSlaveAssigned )
seanhalle@230	179 { //Call plugin's Assigner to give slot a new slave
seanhalle@230	180 assignedSlaveVP =
seanhalle@230	181 (*slaveAssigner)( semanticEnv, currSlot );
seanhalle@230	182
seanhalle@230	183 //put the chosen slave into slot, and adjust flags and state
seanhalle@230	184 if( assignedSlaveVP != NULL )
seanhalle@230	185 { currSlot->slaveAssignedToSlot = assignedSlaveVP;
seanhalle@230	186 assignedSlaveVP->schedSlot = currSlot;
seanhalle@230	187 currSlot->needsSlaveAssigned = FALSE;
seanhalle@230	188 numSlotsFilled += 1;
seanhalle@230	189 }
seanhalle@230	190 }
seanhalle@230	191 }
seanhalle@230	192
seanhalle@230	193
seanhalle@230	194 #ifdef SYS__TURN_ON_WORK_STEALING
seanhalle@230	195 /If no slots filled, means no more work, look for work to steal. /
seanhalle@230	196 if( numSlotsFilled == 0 )
seanhalle@230	197 { gateProtected_stealWorkInto( currSlot, readyToAnimateQ, masterVP );
seanhalle@230	198 }
seanhalle@230	199 #endif
seanhalle@230	200
seanhalle@230	201 MEAS__Capture_Post_Master_Point;
seanhalle@230	202
seanhalle@230	203 masterSwitchToCoreCtlr(animatingSlv);
seanhalle@230	204 flushRegisters();
seanhalle@230	205 }//MasterLoop
seanhalle@230	206
seanhalle@230	207
seanhalle@230	208 }
seanhalle@230	209
seanhalle@230	210
seanhalle@230	211 //=========================== Work Stealing ==============================
seanhalle@230	212
seanhalle@230	213 /*This is first of two work-stealing approaches. It's not used, but left
seanhalle@230	214 * in the code as a simple illustration of the principle. This version
seanhalle@230	215 * has a race condition -- the core controllers are accessing their own
seanhalle@230	216 * animation slots at the same time that this work-stealer on a different
seanhalle@230	217 * core is..
seanhalle@230	218 *Because the core controllers run outside the master lock, this interaction
seanhalle@230	219 * is not protected.
seanhalle@230	220 */
seanhalle@230	221 void inline
seanhalle@230	222 stealWorkInto( SchedSlot currSlot, VMSQueueStruc readyToAnimateQ,
seanhalle@230	223 SlaveVP *masterVP )
seanhalle@230	224 {
seanhalle@230	225 SlaveVP *stolenSlv;
seanhalle@230	226 int32 coreIdx, i;
seanhalle@230	227 VMSQueueStruc *currQ;
seanhalle@230	228
seanhalle@230	229 stolenSlv = NULL;
seanhalle@230	230 coreIdx = masterVP->coreAnimatedBy;
seanhalle@230	231 for( i = 0; i < NUM_CORES -1; i++ )
seanhalle@230	232 {
seanhalle@230	233 if( coreIdx >= NUM_CORES -1 )
seanhalle@230	234 { coreIdx = 0;
seanhalle@230	235 }
seanhalle@230	236 else
seanhalle@230	237 { coreIdx++;
seanhalle@230	238 }
seanhalle@230	239 //TODO: fix this for coreCtlr scans slots
seanhalle@230	240 // currQ = _VMSMasterEnv->readyToAnimateQs[coreIdx];
seanhalle@230	241 if( numInVMSQ( currQ ) > 0 )
seanhalle@230	242 { stolenSlv = readVMSQ (currQ );
seanhalle@230	243 break;
seanhalle@230	244 }
seanhalle@230	245 }
seanhalle@230	246
seanhalle@230	247 if( stolenSlv != NULL )
seanhalle@230	248 { currSlot->slaveAssignedToSlot = stolenSlv;
seanhalle@230	249 stolenSlv->schedSlot = currSlot;
seanhalle@230	250 currSlot->needsSlaveAssigned = FALSE;
seanhalle@230	251
seanhalle@230	252 writeVMSQ( stolenSlv, readyToAnimateQ );
seanhalle@230	253 }
seanhalle@230	254 }
seanhalle@230	255
seanhalle@230	256 /*This algorithm makes the common case fast. Make the coreloop passive,
seanhalle@230	257 * and show its progress. Make the stealer control a gate that coreloop
seanhalle@230	258 * has to pass.
seanhalle@230	259 *To avoid interference, only one stealer at a time. Use a global
seanhalle@230	260 * stealer-lock, so only the stealer is slowed.
seanhalle@230	261 *
seanhalle@230	262 *The pattern is based on a gate -- stealer shuts the gate, then monitors
seanhalle@230	263 * to be sure any already past make it all the way out, before starting.
seanhalle@230	264 *So, have a "progress" measure just before the gate, then have two after it,
seanhalle@230	265 * one is in a "waiting room" outside the gate, the other is at the exit.
seanhalle@230	266 *Then, the stealer first shuts the gate, then checks the progress measure
seanhalle@230	267 * outside it, then looks to see if the progress measure at the exit is the
seanhalle@230	268 * same. If yes, it knows the protected area is empty 'cause no other way
seanhalle@230	269 * to get in and the last to get in also exited.
seanhalle@230	270 *If the progress measure at the exit is not the same, then the stealer goes
seanhalle@230	271 * into a loop checking both the waiting-area and the exit progress-measures
seanhalle@230	272 * until one of them shows the same as the measure outside the gate. Might
seanhalle@230	273 * as well re-read the measure outside the gate each go around, just to be
seanhalle@230	274 * sure. It is guaranteed that one of the two will eventually match the one
seanhalle@230	275 * outside the gate.
seanhalle@230	276 *
seanhalle@230	277 *Here's an informal proof of correctness:
seanhalle@230	278 *The gate can be closed at any point, and have only four cases:
seanhalle@230	279 * 1) coreloop made it past the gate-closing but not yet past the exit
seanhalle@230	280 * 2) coreloop made it past the pre-gate progress update but not yet past
seanhalle@230	281 * the gate,
seanhalle@230	282 * 3) coreloop is right before the pre-gate update
seanhalle@230	283 * 4) coreloop is past the exit and far from the pre-gate update.
seanhalle@230	284 *
seanhalle@230	285 * Covering the cases in reverse order,
seanhalle@230	286 * 4) is not a problem -- stealer will read pre-gate progress, see that it
seanhalle@230	287 * matches exit progress, and the gate is closed, so stealer can proceed.
seanhalle@230	288 * 3) stealer will read pre-gate progress just after coreloop updates it..
seanhalle@230	289 * so stealer goes into a loop until the coreloop causes wait-progress
seanhalle@230	290 * to match pre-gate progress, so then stealer can proceed
seanhalle@230	291 * 2) same as 3..
seanhalle@230	292 * 1) stealer reads pre-gate progress, sees that it's different than exit,
seanhalle@230	293 * so goes into loop until exit matches pre-gate, now it knows coreloop
seanhalle@230	294 * is not in protected and cannot get back in, so can proceed.
seanhalle@230	295 *
seanhalle@230	296 *Implementation for the stealer:
seanhalle@230	297 *
seanhalle@230	298 *First, acquire the stealer lock -- only cores with no work to do will
seanhalle@230	299 * compete to steal, so not a big performance penalty having only one --
seanhalle@230	300 * will rarely have multiple stealers in a system with plenty of work -- and
seanhalle@230	301 * in a system with little work, it doesn't matter.
seanhalle@230	302 *
seanhalle@230	303 *Note, have single-reader, single-writer pattern for all variables used to
seanhalle@230	304 * communicate between stealer and victims
seanhalle@230	305 *
seanhalle@230	306 *So, scan the queues of the core controllers, until find non-empty. Each core
seanhalle@230	307 * has its own list that it scans. The list goes in order from closest to
seanhalle@230	308 * furthest core, so it steals first from close cores. Later can add
seanhalle@230	309 * taking info from the app about overlapping footprints, and scan all the
seanhalle@230	310 * others then choose work with the most footprint overlap with the contents
seanhalle@230	311 * of this core's cache.
seanhalle@230	312 *
seanhalle@230	313 *Now, have a victim want to take work from. So, shut the gate in that
seanhalle@230	314 * coreloop, by setting the "gate closed" var on its stack to TRUE.
seanhalle@230	315 *Then, read the core's pre-gate progress and compare to the core's exit
seanhalle@230	316 * progress.
seanhalle@230	317 *If same, can proceed to take work from the coreloop's queue. When done,
seanhalle@230	318 * write FALSE to gate closed var.
seanhalle@230	319 *If different, then enter a loop that reads the pre-gate progress, then
seanhalle@230	320 * compares to exit progress then to wait progress. When one of two
seanhalle@230	321 * matches, proceed. Take work from the coreloop's queue. When done,
seanhalle@230	322 * write FALSE to the gate closed var.
seanhalle@230	323 *
seanhalle@230	324 */
seanhalle@230	325 void inline
seanhalle@230	326 gateProtected_stealWorkInto( SchedSlot *currSlot,
seanhalle@230	327 VMSQueueStruc *myReadyToAnimateQ,
seanhalle@230	328 SlaveVP *masterVP )
seanhalle@230	329 {
seanhalle@230	330 SlaveVP *stolenSlv;
seanhalle@230	331 int32 coreIdx, i, haveAVictim, gotLock;
seanhalle@230	332 VMSQueueStruc *victimsQ;
seanhalle@230	333
seanhalle@230	334 volatile GateStruc *vicGate;
seanhalle@230	335 int32 coreMightBeInProtected;
seanhalle@230	336
seanhalle@230	337
seanhalle@230	338
seanhalle@230	339 //see if any other cores have work available to steal
seanhalle@230	340 haveAVictim = FALSE;
seanhalle@230	341 coreIdx = masterVP->coreAnimatedBy;
seanhalle@230	342 for( i = 0; i < NUM_CORES -1; i++ )
seanhalle@230	343 {
seanhalle@230	344 if( coreIdx >= NUM_CORES -1 )
seanhalle@230	345 { coreIdx = 0;
seanhalle@230	346 }
seanhalle@230	347 else
seanhalle@230	348 { coreIdx++;
seanhalle@230	349 }
seanhalle@230	350 //TODO: fix this for coreCtlr scans slots
seanhalle@230	351 // victimsQ = _VMSMasterEnv->readyToAnimateQs[coreIdx];
seanhalle@230	352 if( numInVMSQ( victimsQ ) > 0 )
seanhalle@230	353 { haveAVictim = TRUE;
seanhalle@230	354 vicGate = _VMSMasterEnv->workStealingGates[ coreIdx ];
seanhalle@230	355 break;
seanhalle@230	356 }
seanhalle@230	357 }
seanhalle@230	358 if( !haveAVictim ) return; //no work to steal, exit
seanhalle@230	359
seanhalle@230	360 //have a victim core, now get the stealer-lock
seanhalle@230	361 gotLock =__sync_bool_compare_and_swap( &(_VMSMasterEnv->workStealingLock),
seanhalle@230	362 UNLOCKED, LOCKED );
seanhalle@230	363 if( !gotLock ) return; //go back to core controller, which will re-start master
seanhalle@230	364
seanhalle@230	365
seanhalle@230	366 //====== Start Gate-protection =======
seanhalle@230	367 vicGate->gateClosed = TRUE;
seanhalle@230	368 coreMightBeInProtected= vicGate->preGateProgress != vicGate->exitProgress;
seanhalle@230	369 while( coreMightBeInProtected )
seanhalle@230	370 { //wait until sure
seanhalle@230	371 if( vicGate->preGateProgress == vicGate->waitProgress )
seanhalle@230	372 coreMightBeInProtected = FALSE;
seanhalle@230	373 if( vicGate->preGateProgress == vicGate->exitProgress )
seanhalle@230	374 coreMightBeInProtected = FALSE;
seanhalle@230	375 }
seanhalle@230	376
seanhalle@230	377 stolenSlv = readVMSQ ( victimsQ );
seanhalle@230	378
seanhalle@230	379 vicGate->gateClosed = FALSE;
seanhalle@230	380 //======= End Gate-protection =======
seanhalle@230	381
seanhalle@230	382
seanhalle@230	383 if( stolenSlv != NULL ) //victim could have been in protected and taken
seanhalle@230	384 { currSlot->slaveAssignedToSlot = stolenSlv;
seanhalle@230	385 stolenSlv->schedSlot = currSlot;
seanhalle@230	386 currSlot->needsSlaveAssigned = FALSE;
seanhalle@230	387
seanhalle@230	388 writeVMSQ( stolenSlv, myReadyToAnimateQ );
seanhalle@230	389 }
seanhalle@230	390
seanhalle@230	391 //unlock the work stealing lock
seanhalle@230	392 _VMSMasterEnv->workStealingLock = UNLOCKED;
seanhalle@230	393 }

Mercurial > cgi-bin > hgwebdir.cgi > VMS > VMS_Implementations > VMS_impls > VMS__MC_shared_impl

annotate AnimationMaster.c @ 230:f2a7831352dc