| Introduction |
| |
| hwloc provides command line tools and a C API to obtain the hierarchical map of |
| key computing elements, such as: NUMA memory nodes, shared caches, processor |
| sockets, processor cores, and processing units (logical processors or |
| "threads"). hwloc also gathers various attributes such as cache and memory |
| information, and is portable across a variety of different operating systems |
| and platforms. |
| |
| hwloc primarily aims at helping high-performance computing (HPC) applications, |
| but is also applicable to any project seeking to exploit code and/or data |
| locality on modern computing platforms. |
| |
| Note that the hwloc project represents the merger of the libtopology project |
| from INRIA and the Portable Linux Processor Affinity (PLPA) sub-project from |
| Open MPI. Both of these prior projects are now deprecated. The first hwloc |
| release was essentially a "re-branding" of the libtopology code base, but with |
| both a few genuinely new features and a few PLPA-like features added in. Prior |
| releases of hwloc included documentation about switching from PLPA to hwloc; |
| this documentation has been dropped on the assumption that everyone who was |
| using PLPA has already switched to hwloc. |
| |
| hwloc supports the following operating systems: |
| |
| * Linux (including old kernels not having sysfs topology information, with |
| knowledge of cpusets, offline CPUs, ScaleMP vSMP, and Kerrighed support) |
| * Solaris |
| * AIX |
| * Darwin / OS X |
| * FreeBSD and its variants, such as kFreeBSD/GNU |
| * OSF/1 (a.k.a., Tru64) |
| * HP-UX |
| * Microsoft Windows |
| |
| hwloc only reports the number of processors on unsupported operating systems; |
| no topology information is available. |
| |
| For development and debugging purposes, hwloc also offers the ability to work |
| on "fake" topologies: |
| |
| * Symmetrical tree of resources generated from a list of level arities |
| * Remote machine simulation through the gathering of Linux sysfs topology |
| files |
| |
| hwloc can display the topology in a human-readable format, either in graphical |
| mode (X11), or by exporting in one of several different formats, including: |
| plain text, PDF, PNG, and FIG (see CLI Examples below). Note that some of the |
| export formats require additional support libraries. |
| |
| hwloc offers a programming interface for manipulating topologies and objects. |
| It also brings a powerful CPU bitmap API that is used to describe topology |
| objects location on physical/logical processors. See the Programming Interface |
| below. It may also be used to binding applications onto certain cores or memory |
| nodes. Several utility programs are also provided to ease command-line |
| manipulation of topology objects, binding of processes, and so on. |
| |
| Installation |
| |
| hwloc (http://www.open-mpi.org/projects/hwloc/) is available under the BSD |
| license. It is hosted as a sub-project of the overall Open MPI project (http:// |
| www.open-mpi.org/). Note that hwloc does not require any functionality from |
| Open MPI -- it is a wholly separate (and much smaller!) project and code base. |
| It just happens to be hosted as part of the overall Open MPI project. |
| |
| Nightly development snapshots are available on the web site. Additionally, the |
| code can be directly checked out of Subversion: |
| |
| shell$ svn checkout http://svn.open-mpi.org/svn/hwloc/trunk hwloc-trunk |
| shell$ cd hwloc-trunk |
| shell$ ./autogen.sh |
| |
| Note that GNU Autoconf >=2.63, Automake >=1.10 and Libtool >=2.2.6 are required |
| when building from a Subversion checkout. |
| |
| Installation by itself is the fairly common GNU-based process: |
| |
| shell$ ./configure --prefix=... |
| shell$ make |
| shell$ make install |
| |
| The hwloc command-line tool "lstopo" produces human-readable topology maps, as |
| mentioned above. It can also export maps to the "fig" file format. Support for |
| PDF, Postscript, and PNG exporting is provided if the "Cairo" development |
| package can be found when hwloc is configured and build. Similarly, lstopo's |
| XML support requires the libxml2 development package. |
| |
| CLI Examples |
| |
| On a 4-socket 2-core machine with hyperthreading, the lstopo tool may show the |
| following graphical output: |
| |
| dudley.png |
| |
| Here's the equivalent output in textual form: |
| |
| Machine (16GB) |
| Socket L#0 + L3 L#0 (4096KB) |
| L2 L#0 (1024KB) + L1 L#0 (16KB) + Core L#0 |
| PU L#0 (P#0) |
| PU L#1 (P#8) |
| L2 L#1 (1024KB) + L1 L#1 (16KB) + Core L#1 |
| PU L#2 (P#4) |
| PU L#3 (P#12) |
| Socket L#1 + L3 L#1 (4096KB) |
| L2 L#2 (1024KB) + L1 L#2 (16KB) + Core L#2 |
| PU L#4 (P#1) |
| PU L#5 (P#9) |
| L2 L#3 (1024KB) + L1 L#3 (16KB) + Core L#3 |
| PU L#6 (P#5) |
| PU L#7 (P#13) |
| Socket L#2 + L3 L#2 (4096KB) |
| L2 L#4 (1024KB) + L1 L#4 (16KB) + Core L#4 |
| PU L#8 (P#2) |
| PU L#9 (P#10) |
| L2 L#5 (1024KB) + L1 L#5 (16KB) + Core L#5 |
| PU L#10 (P#6) |
| PU L#11 (P#14) |
| Socket L#3 + L3 L#3 (4096KB) |
| L2 L#6 (1024KB) + L1 L#6 (16KB) + Core L#6 |
| PU L#12 (P#3) |
| PU L#13 (P#11) |
| L2 L#7 (1024KB) + L1 L#7 (16KB) + Core L#7 |
| PU L#14 (P#7) |
| PU L#15 (P#15) |
| |
| Finally, here's the equivalent output in XML. Long lines were artificially |
| broken for document clarity (in the real output, each XML tag is on a single |
| line), and only socket #0 is shown for brevity: |
| |
| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE topology SYSTEM "hwloc.dtd"> |
| <topology> |
| <object type="Machine" os_level="-1" os_index="0" cpuset="0x0000ffff" |
| complete_cpuset="0x0000ffff" online_cpuset="0x0000ffff" |
| allowed_cpuset="0x0000ffff" |
| dmi_board_vendor="Dell Computer Corporation" dmi_board_name="0RD318" |
| local_memory="16648183808"> |
| <page_type size="4096" count="4064498"/> |
| <page_type size="2097152" count="0"/> |
| <object type="Socket" os_level="-1" os_index="0" cpuset="0x00001111" |
| complete_cpuset="0x00001111" online_cpuset="0x00001111" |
| allowed_cpuset="0x00001111"> |
| <object type="Cache" os_level="-1" cpuset="0x00001111" |
| complete_cpuset="0x00001111" online_cpuset="0x00001111" |
| allowed_cpuset="0x00001111" cache_size="4194304" depth="3" |
| cache_linesize="64"> |
| <object type="Cache" os_level="-1" cpuset="0x00000101" |
| complete_cpuset="0x00000101" online_cpuset="0x00000101" |
| allowed_cpuset="0x00000101" cache_size="1048576" depth="2" |
| cache_linesize="64"> |
| <object type="Cache" os_level="-1" cpuset="0x00000101" |
| complete_cpuset="0x00000101" online_cpuset="0x00000101" |
| allowed_cpuset="0x00000101" cache_size="16384" depth="1" |
| cache_linesize="64"> |
| <object type="Core" os_level="-1" os_index="0" cpuset="0x00000101" |
| complete_cpuset="0x00000101" online_cpuset="0x00000101" |
| allowed_cpuset="0x00000101"> |
| <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001" |
| complete_cpuset="0x00000001" online_cpuset="0x00000001" |
| allowed_cpuset="0x00000001"/> |
| <object type="PU" os_level="-1" os_index="8" cpuset="0x00000100" |
| complete_cpuset="0x00000100" online_cpuset="0x00000100" |
| allowed_cpuset="0x00000100"/> |
| </object> |
| </object> |
| </object> |
| <object type="Cache" os_level="-1" cpuset="0x00001010" |
| complete_cpuset="0x00001010" online_cpuset="0x00001010" |
| allowed_cpuset="0x00001010" cache_size="1048576" depth="2" |
| cache_linesize="64"> |
| <object type="Cache" os_level="-1" cpuset="0x00001010" |
| complete_cpuset="0x00001010" online_cpuset="0x00001010" |
| allowed_cpuset="0x00001010" cache_size="16384" depth="1" |
| cache_linesize="64"> |
| <object type="Core" os_level="-1" os_index="1" cpuset="0x00001010" |
| complete_cpuset="0x00001010" online_cpuset="0x00001010" |
| allowed_cpuset="0x00001010"> |
| <object type="PU" os_level="-1" os_index="4" cpuset="0x00000010" |
| complete_cpuset="0x00000010" online_cpuset="0x00000010" |
| allowed_cpuset="0x00000010"/> |
| <object type="PU" os_level="-1" os_index="12" cpuset="0x00001000" |
| complete_cpuset="0x00001000" online_cpuset="0x00001000" |
| allowed_cpuset="0x00001000"/> |
| </object> |
| </object> |
| </object> |
| </object> |
| </object> |
| <!-- ...other sockets listed here ... --> |
| </object> |
| </topology> |
| |
| On a 4-socket 2-core Opteron NUMA machine, the lstopo tool may show the |
| following graphical output: |
| |
| hagrid.png |
| |
| Here's the equivalent output in textual form: |
| |
| Machine (32GB) |
| NUMANode L#0 (P#0 8190MB) + Socket L#0 |
| L2 L#0 (1024KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#0) |
| L2 L#1 (1024KB) + L1 L#1 (64KB) + Core L#1 + PU L#1 (P#1) |
| NUMANode L#1 (P#1 8192MB) + Socket L#1 |
| L2 L#2 (1024KB) + L1 L#2 (64KB) + Core L#2 + PU L#2 (P#2) |
| L2 L#3 (1024KB) + L1 L#3 (64KB) + Core L#3 + PU L#3 (P#3) |
| NUMANode L#2 (P#2 8192MB) + Socket L#2 |
| L2 L#4 (1024KB) + L1 L#4 (64KB) + Core L#4 + PU L#4 (P#4) |
| L2 L#5 (1024KB) + L1 L#5 (64KB) + Core L#5 + PU L#5 (P#5) |
| NUMANode L#3 (P#3 8192MB) + Socket L#3 |
| L2 L#6 (1024KB) + L1 L#6 (64KB) + Core L#6 + PU L#6 (P#6) |
| L2 L#7 (1024KB) + L1 L#7 (64KB) + Core L#7 + PU L#7 (P#7) |
| |
| And here's the equivalent output in XML. Similar to above, line breaks were |
| added and only PU #0 is shown for brevity: |
| |
| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE topology SYSTEM "hwloc.dtd"> |
| <topology> |
| <object type="Machine" os_level="-1" os_index="0" cpuset="0x000000ff" |
| complete_cpuset="0x000000ff" online_cpuset="0x000000ff" |
| allowed_cpuset="0x000000ff" nodeset="0x000000ff" |
| complete_nodeset="0x000000ff" allowed_nodeset="0x000000ff" |
| dmi_board_vendor="TYAN Computer Corp" dmi_board_name="S4881 "> |
| <page_type size="4096" count="0"/> |
| <page_type size="2097152" count="0"/> |
| <object type="NUMANode" os_level="-1" os_index="0" cpuset="0x00000003" |
| complete_cpuset="0x00000003" online_cpuset="0x00000003" |
| allowed_cpuset="0x00000003" nodeset="0x00000001" |
| complete_nodeset="0x00000001" allowed_nodeset="0x00000001" |
| local_memory="7514177536"> |
| <page_type size="4096" count="1834516"/> |
| <page_type size="2097152" count="0"/> |
| <object type="Socket" os_level="-1" os_index="0" cpuset="0x00000003" |
| complete_cpuset="0x00000003" online_cpuset="0x00000003" |
| allowed_cpuset="0x00000003" nodeset="0x00000001" |
| complete_nodeset="0x00000001" allowed_nodeset="0x00000001"> |
| <object type="Cache" os_level="-1" cpuset="0x00000001" |
| complete_cpuset="0x00000001" online_cpuset="0x00000001" |
| allowed_cpuset="0x00000001" nodeset="0x00000001" |
| complete_nodeset="0x00000001" allowed_nodeset="0x00000001" |
| cache_size="1048576" depth="2" cache_linesize="64"> |
| <object type="Cache" os_level="-1" cpuset="0x00000001" |
| complete_cpuset="0x00000001" online_cpuset="0x00000001" |
| allowed_cpuset="0x00000001" nodeset="0x00000001" |
| complete_nodeset="0x00000001" allowed_nodeset="0x00000001" |
| cache_size="65536" depth="1" cache_linesize="64"> |
| <object type="Core" os_level="-1" os_index="0" |
| cpuset="0x00000001" complete_cpuset="0x00000001" |
| online_cpuset="0x00000001" allowed_cpuset="0x00000001" |
| nodeset="0x00000001" complete_nodeset="0x00000001" |
| allowed_nodeset="0x00000001"> |
| <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001" |
| complete_cpuset="0x00000001" online_cpuset="0x00000001" |
| allowed_cpuset="0x00000001" nodeset="0x00000001" |
| complete_nodeset="0x00000001" allowed_nodeset="0x00000001"/> |
| </object> |
| </object> |
| </object> |
| <!-- ...more objects listed here ... --> |
| </topology> |
| |
| On a 2-socket quad-core Xeon (pre-Nehalem, with 2 dual-core dies into each |
| socket): |
| |
| emmett.png |
| |
| Here's the same output in textual form: |
| |
| Machine (16GB) |
| Socket L#0 |
| L2 L#0 (4096KB) |
| L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0) |
| L1 L#1 (32KB) + Core L#1 + PU L#1 (P#4) |
| L2 L#1 (4096KB) |
| L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2) |
| L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6) |
| Socket L#1 |
| L2 L#2 (4096KB) |
| L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1) |
| L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5) |
| L2 L#3 (4096KB) |
| L1 L#6 (32KB) + Core L#6 + PU L#6 (P#3) |
| L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7) |
| |
| And the same output in XML (line breaks added, only PU #0 shown): |
| |
| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE topology SYSTEM "hwloc.dtd"> |
| <topology> |
| <object type="Machine" os_level="-1" os_index="0" cpuset="0x000000ff" |
| complete_cpuset="0x000000ff" online_cpuset="0x000000ff" |
| allowed_cpuset="0x000000ff" dmi_board_vendor="Dell Inc." |
| dmi_board_name="0NR282" local_memory="16865292288"> |
| <page_type size="4096" count="4117503"/> |
| <page_type size="2097152" count="0"/> |
| <object type="Socket" os_level="-1" os_index="0" cpuset="0x00000055" |
| complete_cpuset="0x00000055" online_cpuset="0x00000055" |
| allowed_cpuset="0x00000055"> |
| <object type="Cache" os_level="-1" cpuset="0x00000011" |
| complete_cpuset="0x00000011" online_cpuset="0x00000011" |
| allowed_cpuset="0x00000011" cache_size="4194304" depth="2" |
| cache_linesize="64"> |
| <object type="Cache" os_level="-1" cpuset="0x00000001" |
| complete_cpuset="0x00000001" online_cpuset="0x00000001" |
| allowed_cpuset="0x00000001" cache_size="32768" depth="1" |
| cache_linesize="64"> |
| <object type="Core" os_level="-1" os_index="0" cpuset="0x00000001" |
| complete_cpuset="0x00000001" online_cpuset="0x00000001" |
| allowed_cpuset="0x00000001"> |
| <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001" |
| complete_cpuset="0x00000001" online_cpuset="0x00000001" |
| allowed_cpuset="0x00000001"/> |
| </object> |
| </object> |
| <object type="Cache" os_level="-1" cpuset="0x00000010" |
| complete_cpuset="0x00000010" online_cpuset="0x00000010" |
| allowed_cpuset="0x00000010" cache_size="32768" depth="1" |
| cache_linesize="64"> |
| <object type="Core" os_level="-1" os_index="1" cpuset="0x00000010" |
| complete_cpuset="0x00000010" online_cpuset="0x00000010" |
| allowed_cpuset="0x00000010"> |
| <object type="PU" os_level="-1" os_index="4" cpuset="0x00000010" |
| complete_cpuset="0x00000010" online_cpuset="0x00000010" |
| allowed_cpuset="0x00000010"/> |
| </object> |
| </object> |
| </object> |
| <!-- ...more objects listed here ... --> |
| </topology> |
| |
| Programming Interface |
| |
| The basic interface is available in hwloc.h. It essentially offers low-level |
| routines for advanced programmers that want to manually manipulate objects and |
| follow links between them. Documentation for everything in hwloc.h are provided |
| later in this document. Developers should also look at hwloc/helper.h (and also |
| in this document, which provides good higher-level topology traversal examples. |
| |
| To precisely define the vocabulary used by hwloc, a Terms and Definitions |
| section is available and should probably be read first. |
| |
| Each hwloc object contains a cpuset describing the list of processing units |
| that it contains. These bitmaps may be used for CPU binding and Memory binding. |
| hwloc offers an extensive bitmap manipulation interface in hwloc/bitmap.h. |
| |
| Moreover, hwloc also comes with additional helpers for interoperability with |
| several commonly used environments. See the Interoperability With Other |
| Software section for details. |
| |
| The complete API documentation is available in a full set of HTML pages, man |
| pages, and self-contained PDF files (formatted for both both US letter and A4 |
| formats) in the source tarball in doc/doxygen-doc/. |
| |
| NOTE: If you are building the documentation from a Subversion checkout, you |
| will need to have Doxygen and pdflatex installed -- the documentation will be |
| built during the normal "make" process. The documentation is installed during |
| "make install" to $prefix/share/doc/hwloc/ and your systems default man page |
| tree (under $prefix, of course). |
| |
| Portability |
| |
| As shown in CLI Examples, hwloc can obtain information on a wide variety of |
| hardware topologies. However, some platforms and/or operating system versions |
| will only report a subset of this information. For example, on an PPC64-based |
| system with 32 cores (each with 2 hardware threads) running a default |
| 2.6.18-based kernel from RHEL 5.4, hwloc is only able to glean information |
| about NUMA nodes and processor units (PUs). No information about caches, |
| sockets, or cores is available. |
| |
| Similarly, Operating System have varying support for CPU and memory binding, |
| e.g. while some Operating Systems provide interfaces for all kinds of CPU and |
| memory bindings, some others provide only interfaces for a limited number of |
| kinds of CPU and memory binding, and some do not provide any binding interface |
| at all. Hwloc's binding functions would then simply return the ENOSYS error |
| (Function not implemented), meaning that the underlying Operating System does |
| not provide any interface for them. CPU binding and Memory binding provide more |
| information on which hwloc binding functions should be preferred because |
| interfaces for them are usually available on the supported Operating Systems. |
| |
| Here's the graphical output from lstopo on this platform when Simultaneous |
| Multi-Threading (SMT) is enabled: |
| |
| ppc64-with-smt.png |
| |
| And here's the graphical output from lstopo on this platform when SMT is |
| disabled: |
| |
| ppc64-without-smt.png |
| |
| Notice that hwloc only sees half the PUs when SMT is disabled. PU #15, for |
| example, seems to change location from NUMA node #0 to #1. In reality, no PUs |
| "moved" -- they were simply re-numbered when hwloc only saw half as many. |
| Hence, PU #15 in the SMT-disabled picture probably corresponds to PU #30 in the |
| SMT-enabled picture. |
| |
| This same "PUs have disappeared" effect can be seen on other platforms -- even |
| platforms / OSs that provide much more information than the above PPC64 system. |
| This is an unfortunate side-effect of how operating systems report information |
| to hwloc. |
| |
| Note that upgrading the Linux kernel on the same PPC64 system mentioned above |
| to 2.6.34, hwloc is able to discover all the topology information. The |
| following picture shows the entire topology layout when SMT is enabled: |
| |
| ppc64-full-with-smt.png |
| |
| Developers using the hwloc API or XML output for portable applications should |
| therefore be extremely careful to not make any assumptions about the structure |
| of data that is returned. For example, per the above reported PPC topology, it |
| is not safe to assume that PUs will always be descendants of cores. |
| |
| Additionally, future hardware may insert new topology elements that are not |
| available in this version of hwloc. Long-lived applications that are meant to |
| span multiple different hardware platforms should also be careful about making |
| structure assumptions. For example, there may someday be an element "lower" |
| than a PU, or perhaps a new element may exist between a core and a PU. |
| |
| API Example |
| |
| The following small C example (named ``hwloc-hello.c'') prints the topology of |
| the machine and bring the process to the first logical processor of the second |
| core of the machine. |
| |
| /* Example hwloc API program. |
| * |
| * Copyright ? 2009-2010 INRIA. All rights reserved. |
| * Copyright ? 2009-2011 Universit? Bordeaux 1 |
| * Copyright ? 2009-2010 Cisco Systems, Inc. All rights reserved. |
| * See COPYING in top-level directory. |
| * |
| * hwloc-hello.c |
| */ |
| |
| #include <hwloc.h> |
| #include <errno.h> |
| #include <stdio.h> |
| #include <string.h> |
| |
| static void print_children(hwloc_topology_t topology, hwloc_obj_t obj, |
| int depth) |
| { |
| char string[128]; |
| unsigned i; |
| |
| hwloc_obj_snprintf(string, sizeof(string), topology, obj, "#", 0); |
| printf("%*s%s\n", 2*depth, "", string); |
| for (i = 0; i < obj->arity; i++) { |
| print_children(topology, obj->children[i], depth + 1); |
| } |
| } |
| |
| int main(void) |
| { |
| int depth; |
| unsigned i, n; |
| unsigned long size; |
| int levels; |
| char string[128]; |
| int topodepth; |
| hwloc_topology_t topology; |
| hwloc_cpuset_t cpuset; |
| hwloc_obj_t obj; |
| |
| /* Allocate and initialize topology object. */ |
| hwloc_topology_init(&topology); |
| |
| /* ... Optionally, put detection configuration here to ignore |
| some objects types, define a synthetic topology, etc.... |
| |
| The default is to detect all the objects of the machine that |
| the caller is allowed to access. See Configure Topology |
| Detection. */ |
| |
| /* Perform the topology detection. */ |
| hwloc_topology_load(topology); |
| |
| /* Optionally, get some additional topology information |
| in case we need the topology depth later. */ |
| topodepth = hwloc_topology_get_depth(topology); |
| |
| /***************************************************************** |
| * First example: |
| * Walk the topology with an array style, from level 0 (always |
| * the system level) to the lowest level (always the proc level). |
| *****************************************************************/ |
| for (depth = 0; depth < topodepth; depth++) { |
| printf("*** Objects at level %d\n", depth); |
| for (i = 0; i < hwloc_get_nbobjs_by_depth(topology, depth); |
| i++) { |
| hwloc_obj_snprintf(string, sizeof(string), topology, |
| hwloc_get_obj_by_depth(topology, depth, i), |
| "#", 0); |
| printf("Index %u: %s\n", i, string); |
| } |
| } |
| |
| /***************************************************************** |
| * Second example: |
| * Walk the topology with a tree style. |
| *****************************************************************/ |
| printf("*** Printing overall tree\n"); |
| print_children(topology, hwloc_get_root_obj(topology), 0); |
| |
| /***************************************************************** |
| * Third example: |
| * Print the number of sockets. |
| *****************************************************************/ |
| depth = hwloc_get_type_depth(topology, HWLOC_OBJ_SOCKET); |
| if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) { |
| printf("*** The number of sockets is unknown\n"); |
| } else { |
| printf("*** %u socket(s)\n", |
| hwloc_get_nbobjs_by_depth(topology, depth)); |
| } |
| |
| /***************************************************************** |
| * Fourth example: |
| * Compute the amount of cache that the first logical processor |
| * has above it. |
| *****************************************************************/ |
| levels = 0; |
| size = 0; |
| for (obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, 0); |
| obj; |
| obj = obj->parent) |
| if (obj->type == HWLOC_OBJ_CACHE) { |
| levels++; |
| size += obj->attr->cache.size; |
| } |
| printf("*** Logical processor 0 has %d caches totaling %luKB\n", |
| levels, size / 1024); |
| |
| /***************************************************************** |
| * Fifth example: |
| * Bind to only one thread of the last core of the machine. |
| * |
| * First find out where cores are, or else smaller sets of CPUs if |
| * the OS doesn't have the notion of a "core". |
| *****************************************************************/ |
| depth = hwloc_get_type_or_below_depth(topology, HWLOC_OBJ_CORE); |
| |
| /* Get last core. */ |
| obj = hwloc_get_obj_by_depth(topology, depth, |
| hwloc_get_nbobjs_by_depth(topology, depth) - 1); |
| if (obj) { |
| /* Get a copy of its cpuset that we may modify. */ |
| cpuset = hwloc_bitmap_dup(obj->cpuset); |
| |
| /* Get only one logical processor (in case the core is |
| SMT/hyperthreaded). */ |
| hwloc_bitmap_singlify(cpuset); |
| |
| /* And try to bind ourself there. */ |
| if (hwloc_set_cpubind(topology, cpuset, 0)) { |
| char *str; |
| int error = errno; |
| hwloc_bitmap_asprintf(&str, obj->cpuset); |
| printf("Couldn't bind to cpuset %s: %s\n", str, strerror(error)); |
| free(str); |
| } |
| |
| /* Free our cpuset copy */ |
| hwloc_bitmap_free(cpuset); |
| } |
| |
| /***************************************************************** |
| * Sixth example: |
| * Allocate some memory on the last NUMA node, bind some existing |
| * memory to the last NUMA node. |
| *****************************************************************/ |
| /* Get last node. */ |
| n = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_NODE); |
| if (n) { |
| void *m; |
| size = 1024*1024; |
| |
| obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NODE, n - 1); |
| m = hwloc_alloc_membind_nodeset(topology, size, obj->nodeset, |
| HWLOC_MEMBIND_DEFAULT, 0); |
| hwloc_free(topology, m, size); |
| |
| m = malloc(size); |
| hwloc_set_area_membind_nodeset(topology, m, size, obj->nodeset, |
| HWLOC_MEMBIND_DEFAULT, 0); |
| free(m); |
| } |
| |
| /* Destroy topology object. */ |
| hwloc_topology_destroy(topology); |
| |
| return 0; |
| } |
| |
| hwloc provides a pkg-config executable to obtain relevant compiler and linker |
| flags. For example, it can be used thusly to compile applications that utilize |
| the hwloc library (assuming GNU Make): |
| |
| CFLAGS += $(pkg-config --cflags hwloc) |
| LDLIBS += $(pkg-config --libs hwloc) |
| cc hwloc-hello.c $(CFLAGS) -o hwloc-hello $(LDLIBS) |
| |
| On a machine with 4GB of RAM and 2 processor sockets -- each socket of which |
| has two processing cores -- the output from running hwloc-hello could be |
| something like the following: |
| |
| shell$ ./hwloc-hello |
| *** Objects at level 0 |
| Index 0: Machine(3938MB) |
| *** Objects at level 1 |
| Index 0: Socket#0 |
| Index 1: Socket#1 |
| *** Objects at level 2 |
| Index 0: Core#0 |
| Index 1: Core#1 |
| Index 2: Core#3 |
| Index 3: Core#2 |
| *** Objects at level 3 |
| Index 0: PU#0 |
| Index 1: PU#1 |
| Index 2: PU#2 |
| Index 3: PU#3 |
| *** Printing overall tree |
| Machine(3938MB) |
| Socket#0 |
| Core#0 |
| PU#0 |
| Core#1 |
| PU#1 |
| Socket#1 |
| Core#3 |
| PU#2 |
| Core#2 |
| PU#3 |
| *** 2 socket(s) |
| shell$ |
| |
| Questions and Bugs |
| |
| Questions should be sent to the devel mailing list (http://www.open-mpi.org/ |
| community/lists/hwloc.php). Bug reports should be reported in the tracker ( |
| https://svn.open-mpi.org/trac/hwloc/). |
| |
| If hwloc discovers an incorrect topology for your machine, the very first thing |
| you should check is to ensure that you have the most recent updates installed |
| for your operating system. Indeed, most of hwloc topology discovery relies on |
| hardware information retrieved through the operation system (e.g., via the /sys |
| virtual filesystem of the Linux kernel). If upgrading your OS or Linux kernel |
| does not solve your problem, you may also want to ensure that you are running |
| the most recent version of the BIOS for your machine. |
| |
| If those things fail, contact us on the mailing list for additional help. |
| Please attach the output of lstopo after having given the --enable-debug option |
| to ./configure and rebuilt completely, to get debugging output. |
| |
| History / Credits |
| |
| hwloc is the evolution and merger of the libtopology (http:// |
| runtime.bordeaux.inria.fr/libtopology/) project and the Portable Linux |
| Processor Affinity (PLPA) (http://www.open-mpi.org/projects/plpa/) project. |
| Because of functional and ideological overlap, these two code bases and ideas |
| were merged and released under the name "hwloc" as an Open MPI sub-project. |
| |
| libtopology was initially developed by the INRIA Runtime Team-Project (http:// |
| runtime.bordeaux.inria.fr/) (headed by Raymond Namyst (http:// |
| dept-info.labri.fr/~namyst/). PLPA was initially developed by the Open MPI |
| development team as a sub-project. Both are now deprecated in favor of hwloc, |
| which is distributed as an Open MPI sub-project. |
| |
| Further Reading |
| |
| The documentation chapters include |
| |
| * Terms and Definitions |
| * Command-Line Tools |
| * Environment Variables |
| * CPU and Memory Binding Overview |
| * Interoperability With Other Software |
| * Thread Safety |
| * Embedding hwloc in Other Software |
| * Frequently Asked Questions |
| |
| Make sure to have had a look at those too! |
| |
| ------------------------------------------------------------------------------- |
| |
| Generated on Tue Aug 16 2011 19:37:04 for Hardware Locality (hwloc) by doxygen |
| 1.7.4 |