In most systems, the CPU clock runs at much higher frequency than the speed of the memory bus. The following paragraphs contain descriptions to some of the techniques used in the final implementation. Both our implementations got better and better and looked more alike and finally we had an implementation that was very fast and that beats both the native library routines in Windows and Linux, especially when the memory to be copied is not aligned on a 32 bit boundary. I made an implementation, which was quite a lot faster than my co-vorker's and this started a friendly competition to make the fastest portable C implementation of memcpy(). When looking at his code, I found several places where improvements could be made. His implementation was faster than many standardized C library routines found in the embedded market. The story began when a co-worker of mine made an implementation of memcpy that he was very proud of. This implementation has been used successfully in several project where performance needed a boost, including the iPod Linux port, the xHarbour Compiler, the pymat python-Matlab interface, the Inspire IRCd client, and various PSP games. This article describes a fast and portable memcpy implementation that can replace the standard library version of memcpy when higher performance is needed.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |