
Note: Updated Lurker's Guide available (but not this page!)This page belongs to the old 1990s SGI Lurker's Guide. As of 2008, several of the Lurker's Guide pages have been updated for HDTV and for modern OS platforms like Windows and Mac. This particular page is not one of those, but you can see what new stuff is available here. Thanks!
(src % 2) == (dest % 2) - probably won't help (src % 4) == (dest % 4) - definitely will help (src % 8) == (dest % 8) - often helps even more even more well aligned - usually same as 8Referring to the 4-byte alignment case on some Indigo2-class machines, Micheal Minakami tells us that "when copying between buffers with the same alignment, throughput is approximately 400% greater than when copying between buffers with mismatched alignments." He also notes that "the memory allocation routines such as malloc return double-word (64-bit aligned) buffers." Entries in VLBuffers and DMbufferpools are also always at least 8 byte aligned.
SGI primary caches are virtually indexed, meaning that for a cache_size byte cache whose lines are each cache_line_size bytes long, a reference to a virtual address V will map to cache line number (V%cache_size)/cache_line_size. Say you're copying from a buffer A to a buffer B, and both buffers are 32k-aligned. Say your computer has a cache_size==32k primary data cache. Say your cache is direct-mapped, meaning each cache line can store data for only one address. If the inner loop of your copy looks like "B[i]=A[i]", then the math above tells us that &A[i] and &B[i] will always map to the same cache line. Thus each reference to A[i] and B[i] in the inner loop above will miss in the primary cache, severly hurting performance. To solve this, you need to reduce the number of cache misses by reducing the number of times the same cache line gets reused for a different address. If you can control the address of buffer A and B, then allocating A and B such that:
(A%cache_size)/cache_line_size != (B%cache_size)/cache_line_size.will minimize cache thrashing. If you can't control the address of A and B (for example, if you're copying between VLBuffers or DMbuffers), then you should try to read a whole cache line at a time from buffer A into registers, and then write a whole cache line at a time from registers into buffer B. For cache_line_size=32bytes, this might look something like:
{
__uint64_t *A, *B;
int i;
...
assert((max%4) == 0);
for(i=0; i < max; i+=4)
{
__uint64_t tmp0, tmp1, tmp2, tmp3;
tmp0 = A[i+0];
tmp1 = A[i+1];
tmp2 = A[i+2];
tmp3 = A[i+3];
B[i+0] = tmp0;
B[i+1] = tmp1;
B[i+2] = tmp2;
B[i+3] = tmp3;
}
}
You should disassemble any C code written like this to make sure the compiler did the right thing, or write assembly yourself.If your primary cache is 2-way set-associative instead of direct-mapped, then a copy operation ("B[i]=A[i]") will not thrash in the cache, but an operation involving three pointers ("C[i]=A[i]+B[i]") will, and the solution is the same.
SGI secondary caches are indexed by physical address, and it is possible for them to thrash in the same way. Because secondary caches are so much larger, and because a given range of virtual memory (say, our buffers A and B) sometimes maps to discontiguous physical pages, secondary cache thrashing is not as common. On O2 (mvp), it happens that video buffers are made up of 64k-aligned physical memory chunks of 64k in size, and so O2 customers notice this problem more often. Since a normal user-mode app cannot control the physical address of any memory it allocates, the only workaround is the second one above: try to read a whole secondary cache line into registers, and then write from registers into a whole secondary cache line.
| Like It? | Like what you see on this site? Want to see more in the future?
| ||||||||||
| Google Ads | |||||||||||
| Shameless Plugs | If you find this information useful or entertaining, consider browsing these shameless plugs for things that me and my friends to do earn a living. | ||||||||||
![]() | mapfling.com: free custom maps with your own labels Party? Meeting? Request a map, label it yourself, and easily fling it to your friends! | ||||||||||
![]() | world's stupidest everything See some of the worst the world has to offer, and add some of your own! World's Stupidest Holiday and Birthday Presents - stupidest-presents.com | ||||||||||
![]() | lurkertech: video tech and diversions Buzzword bingo, bill the borg, MEZ, lurker's guide to video, and Thai, oh my! | ||||||||||
![]() | slice-of-thai.com Tasty morsels of information on Thai food, language, culture, and general silliness. | ||||||||||
![]() | thailand fever I co-authored this bilingual cultural guidebook to Thai-Western romantic relationships. | ||||||||||
![]() | thai-english software dictionary Check out my Thai-English, English-Thai dictionary for Palm OS® PDAs. | ||||||||||
![]() | allaboutpai.com A site about Pai, my peaceful home in the mountains of Northern Thailand. | ||||||||||
![]() | thailand your way Travel with my friend Nang, who is a great nature, birding, and cultural guide. | ||||||||||
![]() | jeed illustration My English-fluent Thai friend Jeed is a freelance illustrator who is available for hire. | ||||||||||
| Copyright | All text and images copyright 1999-2008 Chris Pirazzi unless otherwise indicated. | ||||||||||