next up previous contents
Next: 5.4 Kernel address space Up: 5.3 Distributed Memory Managers Previous: 5.3.3 DTLB allocation

5.3.4 address translation allocation

Allocation for address translations is a little bit special. On architectures with page tables most of the translations can be kept in the machine dependent data structures; thus we do not store them twice. Therefore, only remote translations need space in the address translation allocator. When no hardware page table is used, the machine dependent DTLB should still handle every local translation.

<Off address translation allocator. >=
// An address translation value allocator.
// (Relies heavily on the mdep translation allocator).
//
class off_AddrTrValAllocator : public off_BKAllocator {
public:
  // Allows declarations of uninitialized address translation allocators.
  off_AddrTrValAllocator(void){;}

  const off_AddrTrValAllocator&operator=(const off_AddrTrValAllocator &other){ 
    return (*this=other);
  }
  <Other public methods of off_AddrTrValAllocator. >
private:
  <Other private members of off_AddrTrValAllocator. >
  off_mdepAddrTrValAllocator a_lalloc; // Local address translations.
  off_RemAddrTrValAllocator  a_ralloc; // Remote address translations. 
};

Defines off_AddrTrValAllocator (links are to index).

Hence, machine independent translation allocation is an issue just for remote translations. These ones are kept in a RemAddrTrAllocator. We will now describe it before proceeding with the address translation allocator.

<Off remote address translation allocator. >=
// A remote address translation allocator.
//
class off_RemAddrTrValAllocator : off_TBlockAllocator<off_LRemAddrTrVal> {
public:
  // Allows declarations of uninitialized address translation allocators.
  off_RemAddrTrValAllocator(void){;}

  const off_RemAddrTrValAllocator&operator=(
                      const off_RemAddrTrValAllocator &other){ 
    return (*this=other);
  }
  <Other public methods of off_RemAddrTrValAllocator. >
private:
  <Other private members of off_RemAddrTrValAllocator. >
};

Defines off_RemAddrTrValAllocator (links are to index).

<Off address translation allocator dependencies. >=
#include <klib/BlockAllocator.h> // for off_BlockAllocator
#include <dmm/AddrTr.h>         // for off_AddrTrVal

As the block allocator uses a liked free-list, the array nodes must subclass DLinkedNode.

<Off linkable remote address translation. >=
class off_LRemAddrTrVal : public DLinkedNode {
public:
  off_RemAddrTrVal a_at;        // The  translation target.

  // To allow uninitialized LAddrTrs
  off_LRemAddrTrVal(void) {;}
  <Other public methods of off_LRemAddrTrVal. >
};
Defines off_LRemAddrTrVal (links are to index).

An indexer is maintained by the allocator.

<Other private members of off_RemAddrTrValAllocator. >= (<-U) [D->]
static off_Indexer<off_LRemAddrTrVal>> a_idx; 

<Off remote address translation allocator static members. >=
off_Indexer<off_LRemAddrTrVal> off_AddrTrValAllocator::a_idx;

The allocator is initially created uninitialized.

<Other public methods of off_RemAddrTrValAllocator. >= (<-U) [D->]
off_RemAddrTrValAllocator(void) {;}

When the \dtlb{} is being created, it is initialized.

<Other public methods of off_RemAddrTrValAllocator. >+= (<-U) [<-D]
// Creates and remote address translation allocator.
off_RemAddrTrValAllocator(off_Exhausted *r, off_KVMRegion *reg);

The region used by the block allocator is stored in the first part of user virtual memory, past the kernel space. Note how it can grow and might be paged.

<Initialize a_ralloc in reg. >=
off_RemAddrTrValAllocator((off_Exhausted *)this, reg);

<off_RemAddrTrAllocator::off_RemAddrTrAllocator implementation. >=
// Creates an remote address translation allocator.
off_RemAddrTrValAllocator::off_RemAddrTrValAllocator(off_Exhausted *r, 
                                                     off_KVMRegion *reg):
    off_TBlockAllocator<off_LRemAddrTrVal>(r,(const off_Indexable*)&r_idx,reg),
  <Initialize other aggregate members of off_RemAddrTrValAllocator. >
{
  assert(reg);
}
  

An indexer is used for LRemAddrTrVals.

<Other private members of off_RemAddrTrValAllocator. >+= (<-U) [<-D]
static off_Indexer<off_LRemAddrTrVal> r_idx;

Not every translation is being held in the base allocator used (i.e. the machine dependent allocator). Therefore, The address allocator must redefine typical kernel allocator methods.

<Other public methods of off_AddrTrValAllocator. >= (<-U)
// Allocates an address translation value.
off_AddrTrValRef allocate(vm_offset_t at);
// Returns a reference to an address translation value.
inline off_AddrTrValRef operator+(vm_offset_t at);
// Returns the position of an address translation value.
// (might be expensive!)
vm_offset_t      pos(off_AddrTrValRef at);
// Deallocates an address translation value.
void deallocate(off_AddrTrValRef at);
void deallocate(vm_offset_t at);
// Returns non-zero if the address translation value is not in use. 
boolean_t  is_free(vm_offset_t at);

Note how its interface uses references to the translation value. That is to avoid extra copies of translation contents.

<off_AddrTrValAllocator::allocate implementation. >=
// Allocates an address translation value.
off_AddrTrValRef off_AddrTrValAllocator::allocate(vm_offset_t at) 
{
  off_mdepAddrTrVal *mval=a_lalloc.allocate(at);
  off_AddrTrValRef res(mval)
  if (mval)
    off_BKAllocator::allocate();
  return res;
}

<off_AddrTrValAllocator::operator+ implementation. >=
// Returns a reference to an address translation value.
inline off_AddrTrValRef off_AddrTrValAllocator::operator+(vm_offset_t at)
{
  return a_lalloc+at;
}

<off_AddrTrValAllocator::pos implementation. >=
// Returns the position of an address translation value.
// (might be expensive!)
vm_offset_t      off_AddrTrValAllocator::pos(off_AddrTrValRef at)
{
  return a_lalloc.pos(at);
}

<off_AddrTrValAllocator::deallocate implementation. >=
// Deallocates an address translation value.
void off_AddrTrValAllocator::deallocate(off_AddrTrValRef at)
{
  off_BKAllocator::deallocate();
  a_lalloc.deallocate(at);
}

inline void off_AddrTrValAllocator::deallocate(vm_offset_t at)
{
  off_BKAllocator::deallocate();
  a_lalloc.deallocate(a_lalloc+at);
}

Address translations are always obtained by indexing the machine dependent \dtlb{} data structure with a virtual address. For local translations all needed information is found there; for remote translations, the page frame number (PFN) points to an entry in the remote translation allocator, which must be indexed to obtain the translation information.

To simplify address translation handling, we use AddrTrValRefs which looks up translation tables as said before.

<Off address translation reference. >=
class off_AddrTrValRef {
private:
  off_mdepAddrTrVal *mdep;         // Reference to the mdep. translation entry.
public:
  // Is it a local translation?
  inline boolean_t is_remote(void) const {
    return mdep->is_remote();
  }
  // Creates an AddrTrValRef
  off_AddrTrValRef(off_mdepAddrTrVal *m) : mdep(m) {;}
  // To test wrt NULL
  operator boolean_t(void) const { return (boolean_t)mdep; }
  // To extract machine dependent information. 
  operator off_mdepAddrTrVal*(void) const { return mdep; }

  <Other public methods of off_AddrTrValRef. >
};

Defines off_AddrTrValRef (links are to index).

<Other public methods of off_AddrTrValRef. >= (<-U <-U) [D->]
  inline const off_pg_id_t get_pa(void) const { 
    if (!mdep->is_remote())
      return off_id_t(mdep->get_ma()); 
    else
      return mdep->get_rat()->get_pa();XXX
  }
  inline vm_offset_t get_ma(void) const { 
    if (!mdep->is_remote())
      return mdep->get_ma(); 
    else
      return mdep->get_rat()->get_ma();
  }
  inline off_mod_t   get_mod(void) const { 
    if (!mdep->is_remote())
      return mdep->get_mod(); 
    else
      return mdep->get_rat()->get_mod();
  }
  
  inline void set_pa(const off_pg_id_t &pg);
  inline void set_ma(vm_offset_t ma);
  inline void set_mod(off_mod_t mod);

  <Other public methods of off_AddrTrValRef. >

};
Defines off_AddrTrValRef (links are to index).

The \dtlb{} access operator is thus as follows:

<off_DTLB::operator+ implementation. >=
//Gets an address translation from its virtual address.
// (or NULL is no one). 
inline off_AddrTrValRef off_DTLB::operator+(vm_offset_t s) 
{
  return mdep + s;
}

\subsection{Page fault handling}

Initially, page faults are handled by the DMM, which delegates either to the KVM to handle kernel page faults or to the DTLB being used for user page faults.

<Other public methods of off_DMM. >+= (<-U) [<-D]
// Handles a page fault. 
err_t pg_fault(vm_offset_t addr, err_t reason);

<off_DMM::pg_fault implementation. >=
// Handles a page fault. 
err_t off_DMM::pg_fault(off_TrapReq *m, vm_size_t s)
{
  off_PgFltReq pf(m,get_vaddr_from_user_state());
  if (pf.is_from_kernel())
    return kvm.pg_fault(&pf);
  else 
    return get_current()->pg_fault(&pf) ;
}

The information carried by a page fault is represented by an PgFltReq, which is a simple trap message. It knows both the faulting address and how to decode the error code for page faults and get_current returns a reference to the DTLB in use at the current processor.

<Off user-kernel messages. >=
// Page fault message.
// XXX this is an mdep msg and should be at mdep/ex.h instead.
//
struct off_PgFltReq : public off_TrapReq {
  vm_offset_t p_vaddr;
  off_PgFltReq(const off_TrapReq &t, vm_offset_t va): 
    off_TrapReq(t),
    p_vaddr(va)
  {;}

  // This is the faulting address.
  inline vm_offset_t get_vaddr(void) const { return p_vaddr; }

  // Page fault was from kernel
  inline boolean_t is_from_kernel(void) const { 
    return (t_error&T_PF_USER)==0; 
  }
  // It was an invalid entry
  inline boolean_t is_absent(void) const { 
    return (t_error&T_PF_PROT)==0;
  }
  // It was a write attempt
  inline boolean_t is_write(void)  const { 
    return (t_error&T_PF_WRITE)&&!is_absent(); 
  }
  // It was a read attempt
  inline boolean_t is_read(void)  const { 
    return !is_write()&&!is_absent(); 
  }
  // It was an exec attempt
  inline boolean_t is_exec(void)  const { 
    return !is_write()&&!is_absent(); 
  }

};

Defines off_PgFltReq (links are to index).

<Off user-kernel messages dependencies. >=
#include <flux/machine/trap.h>  // for mach.dep. error codes.

Once a page fault reaches the \dtlb{}. It can be serviced by either translating a remote translation to a local one or by delivering the page fault trap to the user.

<Other public methods of off_DTLB. >+= (<-U) [<-D]
// Handles a page fault on this dtlb. 
err_t pg_fault(off_PgFltReq *pgf);

The first thing done by pg_fault is to check whether a translation is installed for the faulting address or not. In case there is one, the page fault is serviced by the address translation. In case there is no translation installed or the address can not resolve the translation, the page fault is delegated to the user.

<off_DTLB::pg_fault implementation. >=
// Handles a page fault on this dtlb. 
err_t off_DTLB::pg_fault(off_PgFltReq *pgf)
{
  off_AddrTrValRef tr;
  assert(pgf);
  tr=(*this)+pgf->get_vaddr();
  if (!tr || !tr.is_remote() || !pgf->is_absent())
    return usr_fault(pgf);
  else {
    <Try to resolve pgf in an absent remote translation tr. >
  }
}

<Try to resolve pgf in an absent remote translation tr. >= (<-U)
off_LAddrTrVal *rent=a_alloc+pgf->get_vaddr();
make_available(rent->a_at.a_pa);
install it;

Here, we used a new AddrTrVal method to handle page faults and also an internal method of the DTLB to raise page faults to the user.

<Other public methods of off_AddrTrValRef. >+= (<-U <-U) [<-D]
// Handles a page fault on this address translation.
err_t pg_fault(off_PgFltReq *pgf);

<Other protected methods of off_DTLB. >= (<-U)
// Raises a page fault to the user.
// An error code would lead to a double page fault. 
err_t usr_fault(off_PgFltReq *pgf);

In case there is no translation, the page fault is delivered to the DTLB exception portal.

<off_DTLB::usr_fault implementation. >=
// Raises a page fault to the user.
// An error code would lead to a double page fault. 
err_t off_DTLB::usr_fault(off_PgFltReq *pgf)
{
 if (get_domain() == OFF_PRTL_NULL)
    return ENOPRTL;
#if 0
  return prtl.kpct(get_domain(),off_Shtl::self(),sizeof(*pgf),0,pgf,NULL,0);
#else
  return ENOSYS;
#endif
  
}

<Off DTLB implementation dependencies. >=
#include <prtl/ex.h>            // for off_Frozen{Req|Rep} XXX include it
//#include <prtl/PrtlSrv.h>       // for prtl.
#include <shtl/Shtl.h>          // for Shtl and self.

Address translations resolve page faults automatically only for remote translations which have local (caching) page frames. Every other fault is simply delivered to the user.

<off_AddrTrValRef::pg_fault implementation. >=
// Handles a page fault on this address translation.
err_t off_AddrTrVal::pg_fault(off_PgFltReq *pgf)
{
  if (!mdep->is_remote())
    return ENOENT;
  else {
    mdep->get_pfn()
  }
    
}

The kernel does not support guaranteed translations. Therefore, a page fault on the current shuttle while servicing a pending page fault (i.e. a double page fault) would mean that the user virtual memory code is page faulting. Such double fault is considered special and is serviced by raising a double page fault virtual trap to the DMM exception portal (which should be handled by a per-node pager). Such trap is supposed to be a last resort to page in any code or data for the application pager.

<Other virtual traps. >=
OFF_EX_DPGFAULT                 // Double page fault.

\subsection{\dtlb{} switching}

As DTLBs are valid shuttle property values, the DMM also implements the signature of a property server.

implements the SthlPropSrv interface.

<off_DMM shuttle property methods. >= (<-U)
// Property interface routines (off_ShtlPropSrv signature).

// Switches property values. 
// Returns either 0 or an error code.
err_t pswitch(const off_dtlb_id_t &to, const off_shtl_id_t &s);
  
// Set or clear the property at a given shuttle. 
err_t pset(const off_dtlb_id_t &pval,const off_shtl_id_t &s);
void  pclr(const off_dtlb_id_t &pval, const off_shtl_id_t &s);

// Is pswitch used?
boolean_t needs_switch(void);

Now, to know which one is the current \dtlb{} we should ask for the current shuttle and then ask for its STHLP_DTLB property value. To save some time, the DMM has an array of DTLB references so that we only need to index with the current processor identifier to obtain a reference to the DTLB being used.

<Other private members of off_DMM. >+= [<-D]
off_DTLB *d_current[OFF_NPROC_MAX]; // References to current dtlbs

<Other protected methods of off_DMM. >+= (<-U) [<-D]
// Returns a reference to the dtlb for the current processor.
inline off_DTLB *get_current(void);

<off_DMM::get_current implementation. >=
// Returns a reference to the dtlb for the current processor.
extern inline off_DTLB *off_DMM::get_current(void) {
  return d_current[off_mdepProcessor::get_proc_id()];
}

\subsection{\dtlb{}s for plain users}

Users can handle their DTLBs through the DTLB and DMM wrappers

<Off DMM for users. >=
ENTRY class off_uDMM : public off_uAbsCompResource {
public:
  // Allocates a DTLB.
  off_uDTLB *alloc(const off_Protection &prot,
                   natural_t n=1, off_dtlb_id_t at=OFF_DTLB_NULL);
  // Deallocates a DTLB.
  void free(off_uDTLB *DTLB, natural_t n=1, const off_Rights &r);

  //Gets a DTLB from its number.
  off_uDTLB *operator [](off_dtlb_id_t id, const off_Rights &r);

  //Gets the size of pages being translated.
  vm_size_t get_pgsize(const off_Rights &r,);

};
Defines off_uDMM (links are to index).

<Off DTLB for users. >=
ENTRY class off_uDTLB : public off_uAbsResUnit {
public:
  // Installs a set of  (contiguous and w/ the same access rights)
  // address translations.
  void install(vm_offset_t va, off_pg_id_t pa, 
               off_mode_t access_mode, natural_t n=1,
               const off_Rights &dtlb_r, const off_Rights &pa_r );
  // Deinstalls a set of  (contiguous) address translations.
  void invalidate(vm_offset_t va, natural_t n=1, const off_Rights &r);
  // Changes the access mode bits for the given translations
  void set_mode(vm_offset_t va, off_mode_t access_mode, natural_t n=1,
                const off_Rights &dtlb_r, const off_Rights &pa_r);
};
Defines off_uDTLB (links are to index).

\subsection{Distributed Memory Managers for ix86 based architectures}

On Intel based architectures a DTLB maintains a page table using the corresponding cr3 (page table pointer register) as the physical name.

<Off machine dependent DTLB. >=
// A DTLB for ix86s.
//
class off_mdepDTLB {
public:
private:
};
Defines off_mdepDTLB (links are to index).

We are only concerned about local address translations. They must implement is_remote as a service to the machine independent address translation machinery. Such method must return non-zero when the translation is supposed to be a remote one (in which case the page frame number is an index into the DTLB remote address translation allocator).

<Off machine dependent address translation. >=
// An address translation value for ix86s
//
class off_mdepAddrTrVal {
private:
  volatile pt_entry_t a_pte;   // ix86 Page table entry.
public:
  inline boolean_t is_remote(void) const;

  // To test wrt NULL
  operator boolean_t(void) const {return pte_to_pa(a_pte);}
  inline const vm_offset_t get_ma(void) const {return pte_to_pa(a_pte);}
  inline const vm_offset_t get_pfn(void) const{return atop(pte_to_pa(a_pte));}
  inline off_mod_t   get_mod(void) const { return a_pte&INTEL_OFFMASK; }

  inline void set_pfn(vm_offset_t pfn) {
    a_pte = ptoa(pfn)|(a_pte & INTEL_OFFMASK);
  };
  inline void set_ma(vm_offset_t ma) {
    a_pte = (ma&INTEL_PTE_PFN)|(a_pte&INTEL_OFFMASK);
  }
  inline void set_mod(off_mod_t mod) {
    a_pte = (a_pte&INTEL_PTE_PFN)|(mod&INTEL_OFFMASK);
  }
};

Defines off_mdepAddrTrVal (links are to index).

<Off machine dependent address translation dependencies. >= [D->]
#include <flux/machine/paging.h> // for pt_entry_t

Mode bits are encoded as an mdep_mod_t.

<Off machine dependent mode bits. >=
// Translation mode bits for ix86 machines.
//
struct off_mdep_mod_t {
natural_t m_bits;
  off_mdep_mod_t(natural_t bits): m_bits(bits) {;}
  operator natural_t(void) const { return m_bits; }

  // Check bits
  boolean_t has_rd(void) const { return (m_bits&INTEL_PTE_VALID); }
  boolean_t has_wr(void) const { return (m_bits&INTEL_PTE_WRITE); }
  boolean_t has_ex(void) const { return has_rd(); }
  boolean_t has_ref(void)const { return m_bits&INTEL_PTE_REF; }
  boolean_t has_mod(void)const { return m_bits&INTEL_PTE_MOD; }
  boolean_t has_ncache(void)const{return (m_bits&INTEL_PTE_NCACHE); }
  boolean_t has_valid(void)const{return (m_bits&INTEL_PTE_AVAIL); }
  boolean_t has_usr(void)const{return (m_bits&INTEL_PTE_USER); }
  boolean_t has_rem(void)const{return (m_bits&INTEL_PTR_USR1); }
  // Set bits
  void set_rd(void) {m_bits|=INTEL_PTE_VALID; }
  void set_wr(void) {m_bits|=INTEL_PTE_WRITE; }
  void set_ex(void) {m_bits|=INTEL_PTE_REF; }
  void set_mod(void) {m_bits|=INTEL_PTE_MOD; }
  void set_ncache(void) {m_bits|=INTEL_PTE_NCACHE; }
  void set_valid(void) {m_bits|=INTEL_PTE_AVAIL; }
  void set_usr(void) {m_bits|=INTEL_PTE_USER; }
  void set_rem(void) {m_bits|=INTEL_PTE_USR1; }
  
  // Clear bits
  void clr_rd(void) {m_bits&=~INTEL_PTE_VALID; }
  void clr_wr(void) {m_bits&=~INTEL_PTE_WRITE; }
  void clr_ex(void) {m_bits&=~INTEL_PTE_REF; }
  void clr_mod(void) {m_bits&=~INTEL_PTE_MOD; }
  void clr_ncache(void) {m_bits&=~INTEL_PTE_NCACHE; }
  void clr_valid(void) {m_bits&=~INTEL_PTE_AVAIL; }
  void clr_usr(void) {m_bits&=~INTEL_PTE_USER; }
  void clr_rem(void) {m_bits&=~INTEL_PTE_USR1; }
  
};

const natural_t OFF_MOD_R=INTEL_PTE_VALID;
const natural_t OFF_MOD_W=INTEL_PTE_WRITE;
const natural_t OFF_MOD_X=INTEL_PTE_VALID;
const natural_t OFF_MOD_V=(INTEL_PTE_VALID|INTEL_PTE_AVAIL);
const natural_t OFF_MOD_U=INTEL_PTE_USER;
const natural_t OFF_MOD_RW=INTEL_PTE_VALID|INTEL_PTE_WRITE;
const natural_t OFF_MOD_RX=INTEL_PTE_VALID;
const natural_t OFF_MOD_RWX=INTEL_PTE_VALID|INTEL_PTE_WRITE;
const natural_t OFF_MOD_REM=INTEL_PTR_USR1;
Defines off_mdep_mod_t (links are to index).

<Off machine dependent address translation dependencies. >+= [<-D]
#include <flux/machine/paging.h>

A machine dependent translation is guaranteed to have a local (valid) frame when its mode bits differ from MOD_REM.

<off_mdepAddrTrVal::is_remote implementation. >=
extern inline off_mdepAddrTrVal::is_remote(void) const {
  return (a_pte&OFF_MOD_REM);
}

%% --------------------------------------------------------------


next up previous contents
Next: 5.4 Kernel address space Up: 5.3 Distributed Memory Managers Previous: 5.3.3 DTLB allocation
Francisco J. Ballesteros
1998-05-25