| SIM_ASYNCH_IO | |
| Theory of operation. | |
| Features. | |
| - Optional Use. Build with or without SIM_ASYNCH_IO defined and | |
| simulators will still build and perform correctly when run. | |
| Additionmally, a simulator built with SIM_ASYNCH_IO defined can | |
| dynamically disable and reenable asynchronous operation with | |
| the scp commands SET NOASYNCH and SET ASYNCH respectively. | |
| - Consistent Save/Restore state. The state of a simulator saved | |
| on a simulator with (or without) Asynch support can be restored | |
| on any simulator of the same version with or without Asynch | |
| support. | |
| - Optimal behavior/performance with simulator running with or | |
| without CPU idling enabled. | |
| - Consistent minimum instruction scheduling delays when operating | |
| with or without SIM_ASYNCH_IO. When SIM_ASYNCH_IO is emabled, | |
| any operation which would have been scheduled to occurr in 'n' | |
| instructions will still occur (from the simulated computer's | |
| point of view) at least 'n' instructions after it was initiated. | |
| Benefits. | |
| - Allows a simulator to execute simulated instructions concurrently | |
| with I/O operations which may take numerous milliseconds to perform. | |
| - Allows a simulated device to potentially avoid polling for the | |
| arrival of data. Polling consumes host processor CPU cycles which | |
| may better be spent executing simulated instructions or letting | |
| other host processes run. Measurements made of available | |
| instruction execution easily demonstrate the benefits of parallel | |
| instruction and I/O activities. A VAX simulator with a process | |
| running a disk intensive application in one process was able to | |
| run (in another process) 11 times the number of Dhrystone operations | |
| with Asynch I/O enabled vs not enabled. | |
| - Allows simulator clock ticks to track wall clock was precisely as | |
| possible under varying I/O load and activities. | |
| SimH Libraries which provide Asynch I/O support: | |
| sim_disk | |
| sim_tape | |
| sim_ether | |
| sim_console | |
| sim_tmxr | |
| Requirements to use: | |
| The Simulator's instruction loop needs to be modified to include a single | |
| line which checks for asynchronouzly arrived events. The vax_cpu.c | |
| module added the following line indicated by >>>: | |
| /* Main instruction loop */ | |
| for ( ;; ) { | |
| [...] | |
| >>> AIO_CHECK_EVENT; | |
| if (sim_interval <= 0) { /* chk clock queue */ | |
| temp = sim_process_event (); | |
| if (temp) | |
| ABORT (temp); | |
| SET_IRQL; /* update interrupts */ | |
| } | |
| A global variable (sim_asynch_latency) is used to indicate the "interrupt | |
| dispatch latency". This variable is the number of nanoseconds between checks | |
| for completed asynchronous I/O. The default value is 4000 (4 usec) which | |
| corresponds reasonably with simulated hardware. This variable controls | |
| the computation of sim_asynch_inst_latency which is the number of simulated | |
| instructions in the sim_asynch_latency interval. We are trying to avoid | |
| checking for completed asynchronous I/O after every instruction since the | |
| actual checking every instruction can slow down execution. Periodic checks | |
| provide a balance which allows response similar to real hardware while also | |
| providing minimal impact on actual instruction execution. Meanwhile, if | |
| maximal response is desired, then the value of sim_asynch_latency can be | |
| set sufficiently low to assure that sim_asynch_inst_latency computes to 1. | |
| The sim_asynch_inst_latency is dynamically updated once per second in the | |
| sim_rtcn_calb routine where clock to instruction execution is dynamically | |
| determined. A simulator would usually add register definitions | |
| to enable viewing and setting of these variables via scp: | |
| #if defined (SIM_ASYNCH_IO) | |
| { DRDATA (LATENCY, sim_asynch_latency, 32), PV_LEFT }, | |
| { DRDATA (INST_LATENCY, sim_asynch_inst_latency, 32), PV_LEFT }, | |
| #endif | |
| Programming Disk and Tape devices to leverage Asynch I/O | |
| Asynch disk and tape I/O is provided through a callback model. The callback | |
| is invoked when the desired I/O operation has completed. | |
| Naming conventions: | |
| All of the routines implemented in sim_disk and sim_tape have been kept | |
| in place. All routines which perform I/O have a variant routine available | |
| with a "_a" appended to the the routine name with the addition of a single | |
| parameter which indicates the asynch completion callback routine. For | |
| example there now exists the routines: | |
| t_stat sim_tape_rdrecf (UNIT *uptr, uint8 *buf, t_mtrlnt *bc, t_mtrlnt max); | |
| t_stat sim_tape_rdrecf_a (UNIT *uptr, uint8 *buf, t_mtrlnt *bc, t_mtrlnt max, TAPE_PCALLBACK callback); | |
| The Purpose of the callback function is to record the I/O completion status | |
| and then to schedule the activation of the unit. | |
| Considerations: | |
| Avoiding multiple concurrent users of the unit structure. While asynch | |
| I/O is pending on a Unit, the unit should not otherwise be on the event | |
| queue. The I/O completion will cause the Unit to be scheduled to run | |
| immediately to actually dispatch control flow to the callback routine. | |
| The callback routine is always called in the same thread which is | |
| executing instructions. Since all simulator device data structures are | |
| only referenced from this thread there are no host multi-processor cache | |
| coherency issues to be concerned about. | |
| Arguments to the callback routine: | |
| UNIT *, and IO Status | |
| Requirements of the Callback routine. | |
| The callback routine must save the I/O completion status in a place | |
| which the next invocation of the unit service routine will reference | |
| and act on it. This allows device code to return error conditions | |
| back to scp in a consistent way without regard to how the callback | |
| routine (and the actual I/O) may have been executed. When the callback | |
| routine is called, it will already be on the simulator event queue with | |
| an event time which was specified when the unit was attached or via a | |
| call to sim_disk_set_async. If no value has been specified then it | |
| will have been scheduled with a delay time of 0. If a different event | |
| firing time is desired, then the callback completion routine should | |
| call sim_activate_abs to schedule the event at the appropriate time. | |
| Required change in device coding. | |
| Devices which wish to leverage the benefits of asynch I/O must rearrange | |
| the code which implements the unit service routine. This rearrangement | |
| usually entails breaking the activities into two phases. The first phase | |
| (I'll call the top half) involves performing whatever is needed to | |
| initiate a call to perform an I/O operation with a callback argument. | |
| Control is then immediately returned to the scp event dispatcher. | |
| The callback routine needs to be coded to stash away the io completion | |
| status and some indicator that an I/O has completed. | |
| The top/bottom half separation of the unit service routine would be | |
| coded to examine the I/O completion indicator and invoke the bottom half | |
| code upon completion. The bottom half code should clear the I/O | |
| completion indicator and then perform any activities which normally | |
| need to occur after the I/O completes. Care should be taken while | |
| performing these top/bottom half activities to return to the scp event | |
| dispatcher with either SCPE_OK or an appropriate error code when needed. | |
| The need to return error indications to the scp event dispatcher is why | |
| the bottom half activities can't simply be performed in the | |
| callback routine (the callback routine does not return a status). | |
| Care should also be taken to realize that local variables in the | |
| unit service routine will not directly survive between the separate | |
| top and bottom half calls to the unit service routine. If any such | |
| information must be referenced in both the top and bottom half code paths | |
| then it must either be recomputed prior to the top/bottom half check | |
| or not stored in local variables of the unit service routine. | |
| Sample Asynch I/O device implementations. | |
| The pdp11_rq.c module has been refactored to leverage the asynch I/O | |
| features of the sim_disk library. The impact to this code to adopt the | |
| asynch I/O paradigm was quite minimal. | |
| The pdp11_rp.c module has also been refactored to leverage the asynch I/O | |
| features of the sim_disk library. The impact to this code to adopt the | |
| asynch I/O paradigm was also quite minimal. After conversion a latent | |
| bug in the VAX Massbus adapter implementation was illuminated due to the | |
| more realistic delays to perform I/O operations. | |
| The pdp11_tq.c module has been refactored to leverage the asynch I/O | |
| features of the sim_tape library. The impact to this code to adopt the | |
| asynch I/O paradigm was very significant. This was due to the two facts: | |
| 1) there are many different operations which can be requested of tape | |
| devices and 2) some of the tmscp operations required many separate | |
| operations on the physical device layer to perform a single tmscp request. | |
| This issue was addressed by adding additional routines to the physical | |
| device layer (in sim_tape.c) which combined these multiple operations. | |
| This approach will dovetail well with a potential future addition of | |
| operations on physical tapes as yet another supported tape format. | |
| Programming Console and Multiplexer devices to leverage Asynch I/O to | |
| minimize 'unproductive' polling. | |
| There are two goals for asynchronous Multiplexer I/O: 1) Minimize polling | |
| to only happen when data is available, not arbitrarily on every clock tick, | |
| and 2) to have polling actually happen as soon as data may be available. | |
| In most cases no effort is required to add Asynch I/O support to a | |
| multiplexer device emulation. If a device emulation takes the normal | |
| model of polling for arriving data on every simulated clock tick, then if | |
| Asynch I/O is enabled, the device will operate asynchronously and behave | |
| well. There is one restriction in this model. Specifically, the device | |
| emulation logic can't expect that there will be a particular number (clock | |
| tick rate maybe) of invocations of a unit service routine to perform polls | |
| in any interval of time (this is what we're trying to change, right?). | |
| Therefore presumptions about measuring time by counting polls is not | |
| valid. If a device needs to manage time related activities, then the | |
| device should create a separate unit which is dedicated to the timing | |
| activities and which explicitly schedules a different unit service routine | |
| for those activities as needed. Such scheduled polling should only be | |
| enabled when actual timing is required. | |
| A device which is unprepared to operate asynchronously can specifically | |
| disable multiplexer Asynch I/O for that device by explicitly defining | |
| NO_ASYNCH_MUX at compile time. This can be defined at the top of a | |
| particular device emulation which isn't capable of asynch operation, or | |
| it can be defined globally on the compile command line for the simulator. | |
| Alternatively, if a specific Multiplexer device doesn't function correctly | |
| under the multiplexer asynchronous environment and it will never be | |
| revised to operate correctly, it may statically set the TMUF_NOASYNCH bit | |
| in its unit flags field. | |
| Some devices will need a small amount of extra coding to leverage the | |
| Multiplexer Asynch I/O capabilties. Devices which require extra coding | |
| have one or more of the following characteristics: | |
| - they poll for input data on a different unit (or units) than the unit | |
| which was provided when tmxr_attach was called. | |
| - they poll for connections on a different unit than the unit which was | |
| provided when tmxr_attach was called. | |
| The extra coding required for proper operation is to call | |
| tmxr_set_line_unit() to associate the appropriate input polling unit to | |
| the respective multiplexer line (ONLY if input polling is done by a unit | |
| different than the unit specified when the MUX was attached). If output | |
| polling is done on a different unit, then tmxr_set_line_output_unit() | |
| should be called to describe that fact. | |
| Console I/O can operate asynchronously if the simulator notifies the | |
| tmxr/console subsystem which device unit is used by the simulator to poll | |
| for console input and output units. This is done by including sim_tmxr.h | |
| in the source module which contains the console input device definition | |
| and calling tmxr_set_console_units(). tmxr_set_console_units would usually | |
| be called in a device reset routine. | |
| sim_tmxr consumers: | |
| - Altair Z80 SIO devices = 1, units = 1, lines = 4, flagbits = 8, Untested Asynch | |
| - HP2100 BACI devices = 1, units = 1, lines = 1, flagbits = 3, Untested Asynch | |
| - HP2100 MPX devices = 1, units = 10, lines = 8, flagbits = 2, Untested Asynch | |
| - HP2100 MUX devices = 3, units = 1/16/1, lines = 16, flagbits = 4, Untested Asynch | |
| - I7094 COM devices = 2, units = 4/33, lines = 33, flagbits = 4, Untested Asynch | |
| - Interdata PAS devices = 2, units = 1/32, lines = 32, flagbits = 3, Untested Asynch | |
| - Nova QTY devices = 1, units = 1, lines = 64, flagbits = 1, Untested Asynch | |
| - Nova TT1 devices = 2, units = 1/1, lines = 1, flagbits = 1, Untested Asynch | |
| - PDP-1 DCS devices = 2, units = 1/32, lines = 32, flagbits = 0, Untested Asynch | |
| - PDP-8 TTX devices = 2, units = 1/4, lines = 4, flagbits = 0, Untested Asynch | |
| - PDP-11 DC devices = 2, units = 1/16, lines = 16, flagbits = 5, Untested Asynch | |
| - PDP-11 DL devices = 2, units = 1/16, lines = 16, flagbits = 3, Untested Asynch | |
| - PDP-11 DZ devices = 1, units = 1/1, lines = 32, flagbits = 0, Good Asynch | |
| - PDP-11 VH devices = 1, units = 4, lines = 32, flagbits = 4, Good Asynch | |
| - PDP-18b TT1 devices = 2, units = 1/16, lines = 16, flagbits = 0, Untested Asynch | |
| - SDS MUX devices = 2, units = 1/32, lines = 32, flagbits = 0, Untested Asynch | |
| - sim_console Good Asynch | |
| Program Clock Devices to leverage Asynsh I/O | |
| simh's concept of time is calibrated by counting the number of | |
| instructions which the simulator can execute in a given amount of wall | |
| clock time. Once this is determined, the appropriate value is continually | |
| recalibrated and used throughout a simulator to schedule device time | |
| related delays as needed. Historically, this was fine until modern | |
| processors started having dynamically variable processor clock rates. | |
| On such host systems, the simulator's concept of time passing can vary | |
| drastically. This dynamic adjustment of the host system's execution rate | |
| may cause dramatic drifting of the simulated operating system's concept | |
| of time. Once all devices are disconnected from the calibrated clock's | |
| instruction count, the only concern for time in the simulated system is | |
| that it's clock tick be as accurate as possible. This has worked well | |
| in the past, however each simulator was burdened with providing code | |
| which facilitated managing the concept of the relationship between the | |
| number of instructions executed and the passage of wall clock time. | |
| To accomodate the needs of activities or events which should be measured | |
| against wall clock time (vs specific number of instructions executed), | |
| the simulator framework has been extended to specifically provide event | |
| scheduling based on elapsed wall time. A new API can be used by devices | |
| to schedule unit event delivery after the passage of a specific amount | |
| of wall clock time. The api sim_activate_after() provides this | |
| capability. This capability is not limited to being available ONLY when | |
| compiling with SIM_SYNCH_IO defined. When SIM_ASYNCH_IO is defined, this | |
| facility is implemented by a thread which drives the delivery of these | |
| events from the host system's clock ticks (interpolated as needed to | |
| accomodate hosts with relatively large clock ticks). When SIM_ASYNCH_IO | |
| is not defined, this facility is implemented using the traditional simh | |
| calibrated clock approach. This new approach has been measured to provide | |
| clocks which drift far less than the drift realized in prior simh versions. | |
| Using the released simh v3.9-0 vax simulator with idling enabled, the clock | |
| drifted some 4 minutes in 35 minutes time (approximately 10%). The same OS | |
| disk also running with idling enabled booted for 4 hours had less that 5 | |
| seconds of clock drift (approximately 0.03%). | |
| Co-Scheduling Clock and Multiplexer (or other devices) | |
| Many simulator devices have needs to periodically executed with timing on the | |
| order of the simulated system's clock ticks. There are numerous reasons for | |
| this type of execution. Meanwhile, many of these events aren't particular | |
| about exactly when they execute as long as they execute frequently enough. | |
| Frequently executing events has the potential to interfere with a simulator's | |
| attempts to idle when the simulated system isn't actually doing useful work. | |
| Interactions with attempts to 'co-schedule' multiplexer polling with clock | |
| ticks can cause strange simulator behaviors. These strange behaviors only | |
| happen under a combination of conditions: | |
| 1) a multiplexer device is defined in the simulator configuration, | |
| 2) the multiplexor device is NOT attached, and thus is not being managed by | |
| the asynchronous multiplexer support | |
| 3) the multiplexer device schedules polling (co-scheduled) when not | |
| attached (such polling will never produce any input, so this is probably | |
| a bug). | |
| In prior simh versions support for clock co-scheduling was implmented | |
| separately by each simulator, and usually was expressed by code of the form: | |
| sim_activate (uptr, clk_cosched (tmxr_poll)); | |
| As a part of asynchronous timer support, the simulator framework has been | |
| extended to generically provide clock co-scheduling support. The use of this | |
| new capability requires an initial call (usually in the clock device reset | |
| routing) of the form: | |
| sim_register_clock_unit (&clk_unit); | |
| Once the clock unit has been registered, co-scheduling is achieved by replacing | |
| the earlier sim_activate with the following: | |
| sim_clock_coschedule (&dz_unit, tmxr_poll); | |
| Run time requirements to use SIM_ASYNCH_IO. | |
| The Posix threads API (pthreads) is required for asynchronous execution. | |
| Most *nix platforms have these APIs available and on these platforms | |
| simh is typically built with these available since on these platforms, | |
| pthreads is required for simh networking support. Windows can also | |
| utilize the pthreads APIs if the compile and run time support for the | |
| win32Pthreads package has been installed on the build system. | |