Skip to content
  • Iustin Pop's avatar
    Implement device to instance mapping cache · 3f78eef2
    Iustin Pop authored
    Currently, troubleshooting DRBD problems involves a manual process of going
    backwards from the DRBD device to the instance that owns it.
    
    This patch adds a weak (i.e. not guaranteed to be correct or up-to-date)
    cache of device to instance. The cache should be, in normal operation,
    having correct information as the only time when devices change paths
    are when they are started/stopped, and the code in backend.py adds cache
    updates to exactly these operations.
    
    The only drawback of this implementation is that we don't fully update
    the cache on renames of devices (we clean the old entries but we don't
    add new ones). Since the rename changes the path only for LVs (and not
    drbd and md), this is less of a problem as the target of this code is
    debugging DRBD and MD issues.
    
    The patch writes files named bdev_drbd<N> (or bdev_md<N>,
    bdev_xenvg_...) in /var/run/ganeti (more exactly, LOCALSTATEDIR/ganeti).
    The files start with 'bdev_' and continue with the path of the device
    under /dev/ (this prefix stripped), and contain the following values,
    space separated:
      - instance name
      - primary or secondary (depending on how the device is on the primary
        or secondary node)
      - instance visible name: sda or sdb or not_visible, the latter case
        when the device is not the top-level device (i.e. remote_raid1
        templates will have sd[ab] for the md, but not_visible for drbd and
        logical volumes)
    
    The cache is designed to not raise any errors, if there is an I/O error
    it will only be logged in the node daemon log file. This is in order to
    reduce the possible impact of the cache on the block device activation
    and shutdown code.
    
    Reviewed-by: imsnah
    3f78eef2