next up previous contents
Next: 5.4.0.1 Growing memory region Up: 5. Implementing abstract resources Previous: 5.3.4 address translation allocation

5.4 Kernel address space manager

The kernel uses a simple virtual memory manager with an architecture similar to those used by user processes. There is a central object which intercepts virtual memory requests and delegates them among a set of memory segments or regions.

The kernel virtual memory manger (KVM) is used mostly for block allocator paging, kernel stacks for shuttles, and dynamic memory allocation. It is responsible of administering the kernel address space.

Besides, it maintains a DTLB for kernel virtual memory when DTLB support is enabled and uses physical memory otherwise. It is the entry point for kernel page faults and every other kernel (virtual) memory management issue.

<Off kernel virtual memory manager. >= (U->)
class off_KVM {
public: 
  <Other public methods of off_KVM. >
private:
  <Other private members of off_KVM. >
  <Other private methods of off_KVM. >
};

Defines off_KVM (links are to index).

There is a single kvm object instantiated on its own (all its state is static).

<Off KVM instance. >= (U->)
off_KVM kvm;

<Off KVM exported variables. >= (U->)
extern off_KVM kvm;

<Other public methods of off_KVM. >= (<-U) [D->]
// Creates the KVM instance.
off_KVM(void);

<off_KVM::off_KVM implementation. >= (U->)
// Creates the KVM instance.
off_KVM::off_KVM(void)
{
  <Initialize other private members of off_KVM. >
}

After physical memory banks have been started, the KVM object is notified of both the support for DTLBs in the kernel and the first unallocated address (and page) after the kernel image. Such information is used to arrange different KVM regions into existing space.

<Other public methods of off_KVM. >+= (<-U) [<-D->]
// Start the KVM.
void start(vm_offset_t end, boolean_t dtlb_support);

<Other private members of off_KVM. >= (<-U) [D->]
static boolean_t   k_hasdtlb;          // Do we have dtlb support?
static vm_offset_t k_end;              // End of kernel memory.

<off_KVM static members. >= (U->) [D->]
boolean_t   off_KVM::k_hasdtlb=FALSE; // Do we have dtlb support?
vm_offset_t off_KVM::k_end=0;   // End of kernel memory.

<off_KVM::start implementation. >= (U->)
// Start the KVM.
void off_KVM::start(vm_offset_t end, boolean_t dtlb_support)
{
  k_end=end;
  k_hasdtlb=dtlb_support;
  <Initialize KVM regions starting at end with dtlb_support. >
}

There are several kinds of KVM regions to consider:

Unmapped regions
Like the first virtual page. Page faults into this kind of region are due to programming errors.
Fixed size read-write regions
Like the kernel text or static data. Page faults into this kind of region are due to programming errors.
Growing read-write regions
Like regions used in growing block allocators or those for shuttle kernel stacks. Page faults into this kind of region should extend it automatically and resume kernel operation (or fault when there is no more space for growing).

The kernel virtual memory has initially an identity mapping for the whole physical memory. However, for growing regions we take advantage of kernel virtual memory and map non-contiguous page frames into contiguous kernel virtual addresses.

In few words, memory allocation and (kernel) virtual to physical address translation depend on the address being considered: different regions have different semantics.

The KVM maintains an array of kernel regions, for each region an object with its starting address, its size and some methods defining the region semantics are kept.

<Other private members of off_KVM. >+= (<-U) [<-D->]
static off_KVMRegion *k_regions[OFF_NREGS_MAX]; // Kernel regions.
static natural_t      k_nregions; // Number of existing kernel regions. 

<Off kernel virtual memory manager dependencies. >= (U->) [D->]
#include <dmm/KVMRegion.h>      // for off_KVMRegion

<off_KVM static members. >+= (U->) [<-D->]
off_KVMRegion *off_KVM::k_regions[OFF_NREGS_MAX]; // Kernel regions.
natural_t      off_KVM::k_nregions=0; // Number of existing kernel regions. 

<Initialize other private members of off_KVM. >= (<-U) [D->]
for(natural_t r=0; r<OFF_NREGS_MAX; r++)
      k_regions[r]=NULL;

where the maximum number of regions is a system limit which depends on the maximum number of kernel stacks (i.e. shuttles) supported. There is also a limit on kernel stack size.

<Off limits. >= [D->]
const int OFF_MDEP_KSTK_MAX = 0x4000; // 16 kbytes. -- must be power of 2.
const int OFF_MDEP_KSTK_MSK = 0x3fff; // 16*KBYTE-1
const int OFF_NKSTKS_MAX    = 512; // Max # of kernel stacks.
const int OFF_NKBLKS_MAX    = 6; // Max # of regions not counting kern. stacks.
const int OFF_NREGS_MAX = OFF_NKSTKS_MAX+OFF_NKBLKS_MAX; // Max. # of regions
Defines OFF_NKBLKS_MAX, OFF_NKSTKS_MAX, OFF_NREGS_MAX (links are to index).

<Off kernel virtual memory manager dependencies. >+= (U->) [<-D->]
#include <klib/limits.h>        // for KVM related system limits.

Here we counted one null region, one region per existing kernel stack (OFF_NKSTKS_MAX at worst case) and five extra regions: one for kernel dynamic memory, and four for the block allocators used by shuttles, portals, DTLBs and relocation table entries.

There are names for some of them.

<Off KVM region numbers. >= (U->)
const natural_t OFF_KVM_RNULL =0; // Null catch all region.
const natural_t OFF_KVM_RALLOC=1; // dynamic memory region.

Only entries in range 0...k_nregions-1 are considered. They are always sorted by starting address so that a binary search could find the region involved with a particular kernel virtual address.

This method can be used to find the region involved given a KVM address.

<Other private methods of off_KVM. >= (<-U) [D->]
// Finds the region handling kva
// or returns a pointer to the given default one.
off_KVMRegion *find_region(vm_offset_t kva, natural_t dflt);

The KVM object simply accepts any of the following calls and forwards them to the region involved.

<Other public methods of off_KVM. >+= (<-U) [<-D->]
// Allocates a kernel page.
// Chooses the location in the kern. dynamic memory region if at is 0.
vm_offset_t pg_alloc(vm_offset_t at);
// Deallocates a kernel page.
void pg_free(vm_offset_t p);
// Allocates n kernel pages.
// Chooses the location in the kern. dynamic memory region if at is 0.
vm_offset_t pg_alloc(vm_offset_t at, natural_t n);
// Deallocates n kernel pages.
void pg_free(vm_offset_t p, natural_t n);

// Handles page faults 
err_t pg_fault(off_PgFltReq *pgf);

<Off kernel virtual memory manager dependencies. >+= (U->) [<-D->]
#include <prtl/ex.h>            // for off_PgFltReq

Of course, no page fault should occur when no \dtlb{} support is built into the kernel.

Their implementations use find_region and forward the call.

<off_KVM::pg_alloc implementation. >= (U->) [D->]
// Allocates a kernel page.
// Chooses the location in the kern. dynamic memory region if at is 0.
vm_offset_t off_KVM::pg_alloc(vm_offset_t at)
{
    return find_region(at,OFF_KVM_RALLOC)->pg_alloc(at);
}

<off_KVM::pg_free implementation. >= (U->) [D->]
// Deallocates a kernel page.
void off_KVM::pg_free(vm_offset_t p)
{
  find_region(p,OFF_KVM_RNULL)->pg_free(p);
}

<off_KVM::pg_alloc implementation. >+= (U->) [<-D]
// Allocates n kernel pages.
// Chooses the location in the kern. dynamic memory region if at is 0.
vm_offset_t off_KVM::pg_alloc(vm_offset_t at, natural_t n)
{
  return find_region(at,OFF_KVM_RALLOC)->pg_alloc(at,n);
}

<off_KVM::pg_free implementation. >+= (U->) [<-D]
// Deallocates n kernel pages.
void off_KVM::pg_free(vm_offset_t p, natural_t n)
{
  find_region(p,OFF_KVM_RNULL)->pg_free(p,n);
}

<off_KVM::pg_fault implementation. >= (U->)
// Handles page faults 
err_t off_KVM::pg_fault(off_PgFltReq *pgf)
{
  assert(pgf);
  return find_region(pgf->get_vaddr(),OFF_KVM_RNULL)->pg_fault(pgf);
}

<Off kernel virtual memory manager implementation dependencies. >= (U->) [D->]
#include <dmm/KVMRegion.h>      // for pg_fault

You can see how by default the dynamic memory region is chosen for memory allocation and the null region for deallocation and page faulting.

The creation of KVM regions and the implementation of find_region will be seen later. Before that, we will show how are physical page frames allocated.

\subsection{Allocating kernel pages}

The kernel virtual memory manager includes a generic page allocator which is used to choose the appropriate memory bank to allocate and deallocate kernel pages.

<Other private members of off_KVM. >+= (<-U) [<-D->]
static off_KPgAllocator k_palloc; // Kernel page allocator.

<Off kernel virtual memory manager static members. >= (U->)
off_KPgAllocator off_KVM::k_palloc; // Kernel page allocator.

<Off kernel virtual memory manager dependencies. >+= (U->) [<-D->]
#include <flux/types.h>         // for boolean_t natural_t et al.
#include <dmm/KPgAllocator.h>   // for off_KPgAllocator et al

The page allocator provides simple page allocation methods.

<Off kernel page allocator. >= (U->)
class off_KPgAllocator {
public:
  // Creates a kernel page allocator.
  off_KPgAllocator(void);

  // Allocates a kernel page.
  // Chooses the location if at is 0.
  vm_offset_t alloc(vm_offset_t at);
  // Deallocates a kernel page.
  void free(vm_offset_t p);
  
  <Other public methods of off_KPgAllocator. >
private:
  <Other private members of off_KPgAllocator. >
  <Other private methods of off_KPgAllocator. >
};

Defines off_KPgAllocator (links are to index).

<Off kernel page allocator dependencies. >= (U->) [D->]
#include <flux/types.h>         // for boolean_t natural_t et al.

These methods try to use dumb memory to save precious (i.e. DMA capable, etc.) memory. To do so, the KPgAllocator maintains the notion of the preferred (local) memory bank to allocate memory.

<Other private members of off_KPgAllocator. >= (<-U) [D->]
off_MBank *k_current;           // Current preferred bank. 

<Off kernel page allocator dependencies. >+= (U->) [<-D->]
#include <hw/MBank.h>           // for off_MBank 
#include <hw/PFrame.h>          // for off_PFrame

It is initially set to null by the constructor.

<off_KPgAllocator::off_KPgAllocator implementation. >= (U->)
off_KPgAllocator::off_KPgAllocator(void) : 
  k_current(NULL)
{;}

When the preferred bank is unknown, or the current one is exhausted, reconsider chooses a new preferred bank.

<Other private methods of off_KPgAllocator. >= (<-U)
// Reconsiders the memory bank preferences. 
err_t reconsider(void);

Its implementation scans existing memory banks searching for a valid one. It assumes that the node instance will offer first the best banks.

<off_KPgAllocator::reconsider implementation. >= (U->)
// Reconsiders the memory bank preferences. 
err_t off_KPgAllocator::reconsider(void)
{
  if (k_current==NULL) {
    k_current=&nd.get_mbank(nd.get_num_mbanks()-1);
    assert(k_current->valid());
    do_debug(kcout << "kernel using " << fmt("%d",nd.get_num_mbanks()-1));
    do_debug(kcout << " mbank" << nl);
    return EOK;
  }
  else {
    natural_t mbank = k_current - &nd.get_mbank(0);
    if (mbank > 0){
      mbank--;
      k_current=&nd.get_mbank(mbank);
      assert(k_current->valid());
      do_debug(kcout << "kernel using " << fmt("%d",mbank) << " mbank" << nl);
      return EOK;
    }
  }
  k_current=NULL;
  return ENOMEM;

}

<Off kernel page allocator implementation dependencies. >= (U->)
#include <klib/str.h>
#include <flux/debug.h>         // for do_debug et al.

<Off kernel page allocator dependencies. >+= (U->) [<-D]
#include <node/Node.h>          // for nd
#include <klib/err.h>           // for err_t and error numbers.

Page allocation is simply forwarded to the current preferred memory bank.

<off_KPgAllocator::alloc implementation. >= (U->) [D->]
// Allocates a page frame for kernel usage.
vm_offset_t off_KPgAllocator::alloc(vm_offset_t at) 
{
  off_PFrame *p;
  if (!k_current && reconsider())
    return 0;

  do {
    if (at) {
      off_MBank *bank=(off_MBank*)((*k_current)+at)->get_container();
      p = new(bank,off_id_t(bank->get_id(),at)) 
          off_PFrame(nd.get_protection(), nd.get_domain());
    }
    else
      p = new(k_current) off_PFrame(nd.get_protection(),
                                    nd.get_domain()     );
  } while (!at && !p && !reconsider());
  return p;

}

<off_KPgAllocator::free implementation. >= (U->) [D->]
// Deallocates a page frame used by the kernel.
void off_KPgAllocator::free(vm_offset_t p)
{
  delete ((*k_current)+p);
}

In those cases where many contiguous physical pages are desired (e.g. when initializing big allocator at system boot time) these routines can be used instead. Its use is strongly discouraged because it can lead to service failures even when there is enough free memory. However it is a reasonable thing to do during system boot.

<Other public methods of off_KPgAllocator. >= (<-U)
// Allocates n kernel page frames.
vm_offset_t alloc(vm_offset_t at, natural_t n);
// Deallocates n kernel page frames.
void free(vm_offset_t p, natural_t n);

Their implementation is similar to that of the previous routines. Only alloc gets more complicated as it must scan for a range of contiguous available page frames.

<off_KPgAllocator::alloc implementation. >+= (U->) [<-D]
// Allocates n kernel page frames.
vm_offset_t off_KPgAllocator::alloc(vm_offset_t at, natural_t n)
{
  if (!k_current && reconsider())
    return 0;

   do {
     off_PFrame *p=(!at) ? find_range(n) : (*k_current)+at;
     off_pg_id_t pg;
     natural_t count;
     if (p){
       if (at && k_current != p->get_container())
         k_current=(off_MBank*)p->get_container();
       for(pg=p->get_id(),count=0; 
           count<n && new(k_current,pg) off_PFrame(nd.get_protection(),
                                                   nd.get_domain() ) ;
           ++count,pg+=off_MBank::get_pgsize()){
         do_debug(kcout << "kern allocating " << fmt("%x",(natural_t)pg) <<nl);
       }
       if (count == n)
         return *p;
       else {
         for( --pg ; count ; count--,pg-=off_MBank::get_pgsize())
           delete ((*k_current)+pg);
       }
     }
   } while (!at && !reconsider());
   return 0;
}

Note how we used find_range to find a contiguous range of memory. In case a page was allocated by another process while allocating the range, we released already allocated memory and tried again.

<Other private members of off_KPgAllocator. >+= (<-U) [<-D]
// Fins a range of contiguous n free page frames. 
// Returns their physical address. 
off_PFrame *find_range(natural_t n);

<off_KPgAllocator::free implementation. >+= (U->) [<-D]
// Deallocates n kernel page frames. 
void off_KPgAllocator::free(vm_offset_t p, natural_t n)
{
  for(vm_offset_t pg=p; n ;n-- ,pg+=off_MBank::get_pgsize())
    delete ((*k_current)+pg);
    
}

To find a free range of memory we use a brute force approach.

<off_KPgAllocator::find_range implementation. >= (U->)
// Fins a range of contiguous n free page frames. 
// Returns their physical address. 
off_PFrame *off_KPgAllocator::find_range(natural_t n)
{
  vm_offset_t attempt,current,last;

  assert(k_current->valid());

  for (attempt=k_current->get_first(),
         last=k_current->get_last()-k_current->p2a(n); 
       attempt < last;
       attempt=current+k_current->get_pgsize()){
    natural_t count;
    for(count=0,current=attempt; 
        count<n && (*k_current+current)->is_free();
        count++,current+=k_current->get_pgsize())
      ;
    if (count==n){
      assert(((*k_current)+attempt)->get_container() == k_current);
      return (*k_current)+attempt;
    }
  }
  do_debug(kcout << "no range for " << fmt("%d",n) << " pages" << nl);
  return NULL;
}

\subsection{Kernel virtual memory regions}  

Every region must implement those methods which can be called from the KVM. It also has a starting address and a length. Besides, regions share the same KVM page allocator.

<Off kernel virtual memory region. >= (U->)
// A kernel virtual memory region.
//
class off_KVMRegion {
public:
  // Get the starting address.
  inline vm_offset_t get_start(void) const;
  // Get the region length.  
  inline vm_offset_t get_end(void) const;

  // Allocates a kernel page.
  // Chooses the location if at is 0.
  virtual vm_offset_t pg_alloc(vm_offset_t at)=0;
  // Deallocates a kernel page.
  virtual void pg_free(vm_offset_t p)=0;
  // Allocates n kernel pages.
  // Chooses the location if at is 0.
  virtual vm_offset_t pg_alloc(vm_offset_t at, natural_t n)=0;
  // Deallocates n kernel pages.
  virtual void pg_free(vm_offset_t p, natural_t n)=0;

  // Handles page faults 
  virtual err_t pg_fault(off_PgFltReq *pgf)=0;

  // Returns the page size.
  virtual vm_size_t get_pgsz(void);

protected:
  vm_offset_t r_start;          // Starting address.
  vm_size_t   r_end;            // first invalid address. 
                                // length == r_end - r_start
  <Other protected methods of off_KVMRegion. >
  friend class off_KVM;
  static off_KPgAllocator *r_palloc;
  
};

Defines off_KVMRegion (links are to index).

<Off kernel virtual memory region dependencies. >= (U->) [D->]
#include <prtl/ex.h>            // for off_PgFltReq

<off_KVMRegion::get_pgsize implementation. >= (U->)
// Returns the page size.
vm_size_t off_KVMRegion::get_pgsz(void) 
{
 return off_MBank::get_pgsize(); 
}

<Off kernel virtual memory region dependencies. >+= (U->) [<-D->]
#include <klib/err.h>           // for err_t and error numbers.
#include <klib/limits.h>        // for KVM related system limits.
#include <flux/types.h>         // for boolean_t natural_t et al.
class off_KPgAllocator;

<Off kernel virtual memory region implementation dependencies. >= (U->)
#include <dmm/KPgAllocator.h>   // for off_KPgAllocator et al
#include <hw/MBank.h>

<off_KVMRegion static members. >= (U->)
off_KPgAllocator *off_KVMRegion::r_palloc=NULL;

The allocator is setup by KVM at start time.

<Initialize other private members of off_KVM. >+= (<-U) [<-D->]
off_KVMRegion::r_palloc=&k_palloc;

Note how for each concrete region allocation, deallocation and page fault handling may be handled differently.

The easy ones are get_start and get_end.

<off_KVMRegion::get_start and get_end implementation. >= (U->)
// Get the starting address.
inline vm_offset_t off_KVMRegion::get_start(void) const { return r_start; }
// Get the region end. first invalid address. 
inline vm_offset_t off_KVMRegion::get_end(void) const { return r_end; }

We now can implement KVM::find_region using them. To find a region we ignore the catch-all region KVM_RNULL and use a linear search (which should be replaced by a binary search in the future, as regions are sorted).

<off_KVM::find_region implementation. >= (U->)
// Finds the region handling kva
// or returns a pointer to the given default one.
off_KVMRegion *off_KVM::find_region(vm_offset_t kva, natural_t dflt)
{
  natural_t reg;
  if (!kva)
    return k_regions[dflt];
  for(reg=1; reg < k_nregions; reg++){
    assert(k_regions[reg]);
    if (k_regions[reg]->get_start() <= kva && kva < k_regions[reg]->get_end())
      return k_regions[reg];
  }
  return k_regions[dflt];
  
}

No KVMRegion can be instantiated. Only concrete subclasses can.

<Other protected methods of off_KVMRegion. >= (<-U)
// Creates a KVM region.
off_KVMRegion(vm_offset_t start, vm_offset_t end);

They must specify initial values for r_start and r_end and the page allocator to be used.

<off_KVMRegion::off_KVMRegion implementation. >= (U->)
off_KVMRegion::off_KVMRegion(vm_offset_t start, vm_offset_t end) :
  r_start(start), r_end(end)
{;}

\subsubsection{Null faulting regions}

An unmapped always-faulting catch-all region is an easy thing to implement. It does not allocate memory, it does not free memory, it only aborts kernel execution on page faults.

<Off fixed and faulting kernel virtual memory region. >= (U->)
// An unmapped kernel virtual memory region. 
//
class off_NullFaultingRegion : public off_KVMRegion {
public:
  <Other public methods of off_NullFaultingRegion. >

  // Allocates a kernel page.
  // Chooses the location if at is 0.
  virtual vm_offset_t pg_alloc(vm_offset_t at) {
    (void)at; return 0;
  }
  // Deallocates a kernel page.
  virtual void pg_free(vm_offset_t p) { (void)p; }
  // Allocates n kernel pages.
  // Chooses the location if at is 0.
  virtual vm_offset_t pg_alloc(vm_offset_t at, natural_t n) {
    (void) at; (void) n; return 0;
  }
  // Deallocates n kernel pages.
  virtual void pg_free(vm_offset_t p, natural_t n) { (void)p; (void)n; }
  
  // Handles page faults 
  virtual err_t pg_fault(off_PgFltReq *pgf);

};

Defines off_NullFaultingRegion (links are to index).

<off_NullFaultingRegion::pg_fault implementation. >= (U->)
// Handles page faults 
err_t off_NullFaultingRegion::pg_fault(off_PgFltReq *pgf) 
{
  assert(pgf);
  kcout << "kernel page fault on unmapped region at ";
  kcout << fmt("%08x",pgf->get_vaddr());
  kcout << " reason " << fmt("%08x",pgf->t_error) << nl; 
  return EINVAL;
}

<Off kernel virtual memory region dependencies. >+= (U->) [<-D->]
#include <klib/str.h>           // for kcout.

The address and length of a NullFaultingRegion must be known in advance.

<Other public methods of off_NullFaultingRegion. >= (<-U)
// Creates a null faulting region.
off_NullFaultingRegion(vm_offset_t start,vm_offset_t end);

<off_NullFaultingRegion::off_NullFaultingRegion implementation. >= (U->)
// Creates a null faulting region.
off_NullFaultingRegion::off_NullFaultingRegion(vm_offset_t start, 
                                               vm_offset_t end) :
  off_KVMRegion(start,end)
{;}

The KVM itself instantiates a null faulting region at system boot time. It is used as a catch-all when no other region is in charge of the address involved.

<Other private members of off_KVM. >+= (<-U) [<-D->]
static off_NullFaultingRegion k_null; // A null catch all region.

<off_KVM static members. >+= (U->) [<-D]
off_NullFaultingRegion off_KVM::k_null(0,(vm_offset_t)-1); // A null region.

It is setup in its slot at KVM instantiation time.

<Initialize other private members of off_KVM. >+= (<-U) [<-D]
k_regions[OFF_KVM_RNULL]= &k_null;
k_nregions++;



 
next up previous contents
Next: 5.4.0.1 Growing memory region Up: 5. Implementing abstract resources Previous: 5.3.4 address translation allocation
Francisco J. Ballesteros
1998-05-25