Clubmask Resource Manager

Clubmask is a 'glue' package that combines the outstanding management and speed of the Bproc distributed process layer with the power and configuration of the Maui HPC Scheduler. It uses the Supermon resource monitor to gather node information. This node information is combined with job submission data, and suplied to Maui. Maui issues job start and termination commands which are handled by Clubmask via the Bproc layer. Clubmask also supplies a 'supermon2ganglia' translator that allows supermon data to be displayed in a ganglia Web frontend.

Tags Clustering/Distributed Networks
Licenses GPL

Tweet this project Short link

Rss Recent releases

Changes: Support has been added for a runtime configuration to use ganglia to gather node data. This is added to the current support for supermon. Ganglia is now the preferred subsystem, as it is much more stable.

  • Rrelease-mid
  •  17 Nov 2003 13:05
  • Rrelease-after

Changes: CPU speed gathering was fixed, as only the last node's speed was used for all of the nodes. cmdbrestore has been fixed to restore a singleton tuple. The Supermon recv and revive_nodes methods have been cleaned up. There are many smaller fixes.

Changes: The job names (JOBID) have been changed from absolute timestamps to a more normal "string.number" format, where "string" is an arbitrary job name that defaults to the username, and "number" is the number in the sequence of that partitcular job name. Many options have been added to cmsumbit. A supermon_state daemon that handles node state in supermon has been added. This separates this logic out of resource_manager. There are many more changes.

  • Rrelease-mid
  •  29 Jul 2003 10:29
  • Rrelease-after

Changes: The main fixes made were to rework the SupermonInterface class by adding a few new classes and splitting up the error handling in a sane fashion. This should make the supermon data retrieval much more stable. Also added is the ability to use either bpsh and/or ssh to each node to really kill a job.

  • Rrelease-mid
  •  16 Jul 2003 05:24
  • Rrelease-after

Changes: The code in the mauichksummodule now makes sure that checksum is null terminated for Py_BuildValue. In ResourceManager::Machine, BprocSupermon is now allowed to find nodes. In BprocSupermon, the logic in findNodes was fixed to make sure that supermon sees all of the nodes that bproc does. This also solves a problem where too much data was returned by each 'findNodes' call. supermon is now only contacted if there are nodes that need to be added. In IdResolv, bug where a node with a leading 0 would mismatch what supermon would assign as a nodeid was fixed.

4f0b502e7fc1dbf6c111371857c28069_thumb

Project Spotlight

Arkeia Network Backup

Enterprise-class network backup for heterogeneous networks.

168e0444faedbaf7e95e088f3337b088_thumb

Project Spotlight

GPS Library Installer

A GNAT Programming Studio library installer.