Please describe the new features with any relevant use cases.
I'm trying to scale SST to use massive numbers of components and links. At these scales memory usage becomes concern, this issue proposes something we could do to save memory ---
Specifically, we could try and rearrange data to avoid extraneous padding in data structures. I wouldn't expect this dramatically save memory, but it does seem like a relatively easy step to take.
Describe the proposed solution you plan to implement
I would propose manually rearranging the member data in these classes (for example, going from largest to smallest). And then remeasuring the amount of padding. If doing so causes an appreciable difference, then I would merge these changes.
Testing plan
Adding robust testing for this enhancement may be somewhat difficult.
We could add some mechanism to have SST report on the sizeof these various classes, and then hard-code what we expected their sizes to be for the particular testing platform(s) SST uses for AutoTesting; however, this would specialize the test to these given platforms, which I'm guessing would not be desirable.
Adding a test would also mean that, that test would have to be updated anytime a new field is added, although that may be desirable as it would serve as a reminder for the developer to consider the impact the placement of this added data would have on padding.
I'd like more feedback about what testing should be added for this enhancement, if any.
Additional context
To give a sense of how much padding there is:
On my system with SST built using Cray Clang 19.0, I estimate that we have the following padding these data-structures:
Class sizeof(Class) sum field sizes padding (difference)
--------------------------------------------------------------------------
Params 112 105 7
ConfigStatistic 168 146 15
ConfigComponent 592 538 25
ConfigLink 216 209 7
ComponentInfo 280 262 18
Link 64 64 0
I computed this by summing the sizeof individual elements in each class (sum of field sizes) and comparing that against the reported sizeof the object itself.
As a quick experiment, I rearranging the data members in ConfigComponent as shown below. This resulted in sizeof(ConfigComponent) reducing from 592 bytes to 568 bytes; a savings of 24 bytes per config component.
ConfigStatistic allStatConfig;
Params params; /*!< Set of Parameters */
RankInfo rank; /*!< Parallel Rank for this component */
std::map<std::string, StatisticId_t> enabledStatNames;
std::vector<ConfigComponent*> subComponents; /*!< List of subcomponents */
std::vector<LinkId_t> links; /*!< List of links connected */
// std::vector<ConfigStatistic> enabledStatistics; /*!< List of subcomponents */
std::vector<double> coords;
std::string name; /*!< Name of this component, or slot name for subcomp */
std::string type; /*!< Type of this component */
ComponentId_t id; /*!< Unique ID of this component */
ConfigGraph* graph; /*!< Graph that this component belongs to */
float weight; /*!< Partitioning weight for this component */
int slot_num; /*!< Slot number. Only valid for subcomponents */
uint16_t nextSubID; /*!< Next subID to use for children, if component, if subcomponent, subid of parent */
uint16_t nextStatID; /*!< Next statID to use for children */
uint8_t statLoadLevel; /*!< Statistic load level for this component */
bool visited; /*! Used when traversing graph to indicate component was visited already */
bool enabledAllStats;
An alternative solution we could consider: various compilers have directives to indicate that data should be packed so that there is not padding between members. For example, gcc has __attribute__((packed)). This may lead to some additional memory savings, but would be at the cost of performance. Furthermore, these attributes are compiler-specific, we could add additional code to detect the compiler being used and adjust, but this adds complexity to the code and this enhancement would only apply to the set of compilers we support it for.
I've heard that there's been some recent efforts that may impact what fields are in SST's core data structures. As such, I'll defer working on this until I've heard that things have stabilized. In the meantime, I wanted to get this feature documented as an issue; I'd be happy to pick it up whenever the time is right.
Please describe the new features with any relevant use cases.
I'm trying to scale SST to use massive numbers of components and links. At these scales memory usage becomes concern, this issue proposes something we could do to save memory ---
Specifically, we could try and rearrange data to avoid extraneous padding in data structures. I wouldn't expect this dramatically save memory, but it does seem like a relatively easy step to take.
Describe the proposed solution you plan to implement
I would propose manually rearranging the member data in these classes (for example, going from largest to smallest). And then remeasuring the amount of padding. If doing so causes an appreciable difference, then I would merge these changes.
Testing plan
Adding robust testing for this enhancement may be somewhat difficult.
We could add some mechanism to have SST report on the
sizeofthese various classes, and then hard-code what we expected their sizes to be for the particular testing platform(s) SST uses for AutoTesting; however, this would specialize the test to these given platforms, which I'm guessing would not be desirable.Adding a test would also mean that, that test would have to be updated anytime a new field is added, although that may be desirable as it would serve as a reminder for the developer to consider the impact the placement of this added data would have on padding.
I'd like more feedback about what testing should be added for this enhancement, if any.
Additional context
To give a sense of how much padding there is:
On my system with SST built using Cray Clang 19.0, I estimate that we have the following padding these data-structures:
I computed this by summing the
sizeofindividual elements in each class (sum of field sizes) and comparing that against the reportedsizeofthe object itself.As a quick experiment, I rearranging the data members in
ConfigComponentas shown below. This resulted insizeof(ConfigComponent)reducing from 592 bytes to 568 bytes; a savings of 24 bytes per config component.An alternative solution we could consider: various compilers have directives to indicate that data should be packed so that there is not padding between members. For example, gcc has
__attribute__((packed)). This may lead to some additional memory savings, but would be at the cost of performance. Furthermore, these attributes are compiler-specific, we could add additional code to detect the compiler being used and adjust, but this adds complexity to the code and this enhancement would only apply to the set of compilers we support it for.I've heard that there's been some recent efforts that may impact what fields are in SST's core data structures. As such, I'll defer working on this until I've heard that things have stabilized. In the meantime, I wanted to get this feature documented as an issue; I'd be happy to pick it up whenever the time is right.