GRM Grid Application Monitor v1.0 User's Manual 1. What is GRM? 1.1 History of GRM 1.2 Why is GRM an off-line, one-process monitoring tool? 1.3 Future of GRM 2. Instrumentation with GRM 2.1 Instrumentation API 2.1.1 Basic instrumentation functions 2.1.2 User defined events 2.1.3 Remarks 2.2.Compilation of the code 3. Trace collection with GRM 3.1 Job submission 3.2 Trace file 3.3. Visualisaton of the trace file 1. What is GRM? =============== GRM v1.0 is an - off-line monitoring tool - for a one-process application - executed in the grid environment. 1.1 History of GRM ------------------ GRM has been developed for the P-GRADE graphical parallel programming environment and it has been used for "semi-on-line" monitoring of P-GRADE applications in cluster environment. Semi-on-line monitoring means that generated trace data is kept at the generation place until the user (or a tool) requests for it. When requested, the monitoring tool collects trace data from all machines where the application is running and writes it into a global trace file which is then visualised by the PROVE visualisation tool of P-GRADE. With frequent collection requests the visualisation behaves like an on-line visualisation tool. Basically, the difference between on-line monitoring and semi-on-line monitoring is that the latter uses a 'pull data model' for the collection instead of the 'push model'. This tool is the basis in the development of a grid application monitor that will support semi-on-line monitoring of parallel/distributed applications in the grid. Its development is in close correlation with the development of the PROVE visualisation tool. The tools will be developed parallel and support the same features in following versions. 1.2 Why is GRM an off-line, one-process monitoring tool? -------------------------------------------------------- The start-up and semi-on-line collection is different and much more difficult to achieve and GRM is not in that state that it could support the final goal. The first version provides the instrumentation library but the generated trace data is written into a local file. 1.3 Future of GRM ----------------- The next version of GRM will be a semi-on-line monitoring tool and will support the monitoring of parallel/distributed applications in the grid. 2. Instrumentation with GRM =========================== For trace generation the user should first instrument the application with trace generation functions. GRM provides an instrumentation API and library for tracing. The instrumentation API and library is available in C. 2.1 Instrumentation API ----------------------- For the instrumentation, you should include the "grm_instr.h" header file. #include "grm_instr.h" 2.1.1 Basic instrumentation functions ------------------------------------- The following trace generation constructs are supported in GRM v1.0. GMI_Start To indicate the start of the process and initialize the monitoring library GMI_Exit End of process, no more trace events afterwards GMI_BeginBlock Beginning of a code block GMI_EndBlock End of code block GMI_Start (char * process_name; int id); The user should give a process name and id. The id will be printed into each event record in the trace. GMI_Exit (void); This event call generates an exit event and closes the trace file. No more trace event generation is possible by the process. GMI_BeginBlock(int blocktype, int blockID ); GMI_EndBlock(int blocktype, int blockID ); Block events are used to indicate the start and exit (entry and exit) of code blocks in the code. 'blocktype' can be used to differentiate among different purposes of codes while blockID can be used to unambigously identify the code block. In the PROVE visualisation tool, different colors can be assigned to different 'blocktypes'. 2.1.2 User defined events ------------------------- More general tracing is possible with user defined events. For this purpose, first the format string of the user event should be defined (similar to C printf format strings). Then the predefined event format can be used for trace event generation. Event format definition: int GMI_Define( int eid, /* event identifier */ char *format, /* event format string */ char *descr /* description of the event */ ); The 'eid' identifier should be given by the user that will be used for trace event generation in the code. It should be equal or less then GMI_MAX_EVENTFORMAT_ID and greater than GMI_NUM_PREDEFINED. Trace events can be generated by the followin function call: int GMI_Event( int eid, ...); The 'eid' identifies the format string. The user should pass only the variables to be printed. The format string defined with GMI_Define is used to print (using vsprintf C standard function) the event string. There is no need to print the following values (since the header of the event contains it already): 'eid' user event id, time of the event, process id. 2.1.3 Trace format ------------------ The trace format originates from the Tape/PVM tool that was used as the very first monitoring tool for P-GRADE. The trace is event record oriented. One record (line) in the trace file represents one trace event of the application. Each record starts with a header containing information about the type of the event, generation time and id of the generating process. The remainder of the record contains the values for that given type of event. The generated trace event record for the 'start' event is like the following string: event header event content |----------------------------------------| |----------------------------------------------| 1 999 -1 -1 0 0 999 Long originf event id process id file ids time process id process name machine name not used sec, microsec User defined events contain the header too. The format string defined by the user is used for printing the content of the event. 2.1.4 Remarks ------------- The original GRM in P-GRADE naturally provided more functions for trace generation, e.g., communication tracing (Send, Receive), support for process groups and communication templates. These functionality is not supported in GRM v1.0, for it supports sequential processes. The next version of GRM will be a semi-on-line monitor for parallel applications and the instrumentation will also support the tracing of communication. PROVE visualisation tool supports the visualisation of block events generated by GMI_Block* calls. The 'blocktype' variable of the call defines the color of the block in PROVE. A user defined event is presented with a small colored block. No statistics are presented about them. Clicking on them shows the content of the event, as it is printed into the trace file. The timestamps are generated automatically in the instrumentation functions. The library uses the standard 'gettimeofday' function the get the actual time in standard UNIX time. Time values are printed into the trace file in the "sec microsec" format. 2.2.Compilation of the code --------------------------- The compilation is very simple. The provided 'libgrmon.a' library should be linked to the application process code. All internal functions of the library are named "_grm_..." to avoid name conflicts. There are some public functions to keep compatibility with P-GRADE. The name of these funtions starts with the "GM_" predicate. They can be found in the "grmonitor.h" header file. 3. Trace collection with GRM ============================ The trace collection and visualisation has the following steps: - submit the instrumented application into the grid - start the visualisation tool with the trace file 3.1 Job submission ------------------ The local (where the process will be running) path and name of the trace file should be defined by environment variables that are submitted together with the job by Globus. The name of the environment variable is "GRM_TRACENAME". 3.2 Trace file -------------- The generated trace file resides locally where the job is executed. It should be transferred back to the user's host. The easiest way to do this is to let the job submitting middleware to bring it back. This can be done by Globus. ??? 3.3. Visualisaton of the trace file ----------------------------------- When the trace file has been transferred, PROVE can be started to visualise it. $ prove tracefile END OF MANUAL