byteswapping.py 5.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157
  1. """
  2. =============================
  3. Byteswapping and byte order
  4. =============================
  5. Introduction to byte ordering and ndarrays
  6. ==========================================
  7. The ``ndarray`` is an object that provide a python array interface to data
  8. in memory.
  9. It often happens that the memory that you want to view with an array is
  10. not of the same byte ordering as the computer on which you are running
  11. Python.
  12. For example, I might be working on a computer with a little-endian CPU -
  13. such as an Intel Pentium, but I have loaded some data from a file
  14. written by a computer that is big-endian. Let's say I have loaded 4
  15. bytes from a file written by a Sun (big-endian) computer. I know that
  16. these 4 bytes represent two 16-bit integers. On a big-endian machine, a
  17. two-byte integer is stored with the Most Significant Byte (MSB) first,
  18. and then the Least Significant Byte (LSB). Thus the bytes are, in memory order:
  19. #. MSB integer 1
  20. #. LSB integer 1
  21. #. MSB integer 2
  22. #. LSB integer 2
  23. Let's say the two integers were in fact 1 and 770. Because 770 = 256 *
  24. 3 + 2, the 4 bytes in memory would contain respectively: 0, 1, 3, 2.
  25. The bytes I have loaded from the file would have these contents:
  26. >>> big_end_str = chr(0) + chr(1) + chr(3) + chr(2)
  27. >>> big_end_str
  28. '\\x00\\x01\\x03\\x02'
  29. We might want to use an ``ndarray`` to access these integers. In that
  30. case, we can create an array around this memory, and tell numpy that
  31. there are two integers, and that they are 16 bit and big-endian:
  32. >>> import numpy as np
  33. >>> big_end_arr = np.ndarray(shape=(2,),dtype='>i2', buffer=big_end_str)
  34. >>> big_end_arr[0]
  35. 1
  36. >>> big_end_arr[1]
  37. 770
  38. Note the array ``dtype`` above of ``>i2``. The ``>`` means 'big-endian'
  39. (``<`` is little-endian) and ``i2`` means 'signed 2-byte integer'. For
  40. example, if our data represented a single unsigned 4-byte little-endian
  41. integer, the dtype string would be ``<u4``.
  42. In fact, why don't we try that?
  43. >>> little_end_u4 = np.ndarray(shape=(1,),dtype='<u4', buffer=big_end_str)
  44. >>> little_end_u4[0] == 1 * 256**1 + 3 * 256**2 + 2 * 256**3
  45. True
  46. Returning to our ``big_end_arr`` - in this case our underlying data is
  47. big-endian (data endianness) and we've set the dtype to match (the dtype
  48. is also big-endian). However, sometimes you need to flip these around.
  49. .. warning::
  50. Scalars currently do not include byte order information, so extracting
  51. a scalar from an array will return an integer in native byte order.
  52. Hence:
  53. >>> big_end_arr[0].dtype.byteorder == little_end_u4[0].dtype.byteorder
  54. True
  55. Changing byte ordering
  56. ======================
  57. As you can imagine from the introduction, there are two ways you can
  58. affect the relationship between the byte ordering of the array and the
  59. underlying memory it is looking at:
  60. * Change the byte-ordering information in the array dtype so that it
  61. interprets the underlying data as being in a different byte order.
  62. This is the role of ``arr.newbyteorder()``
  63. * Change the byte-ordering of the underlying data, leaving the dtype
  64. interpretation as it was. This is what ``arr.byteswap()`` does.
  65. The common situations in which you need to change byte ordering are:
  66. #. Your data and dtype endianness don't match, and you want to change
  67. the dtype so that it matches the data.
  68. #. Your data and dtype endianness don't match, and you want to swap the
  69. data so that they match the dtype
  70. #. Your data and dtype endianness match, but you want the data swapped
  71. and the dtype to reflect this
  72. Data and dtype endianness don't match, change dtype to match data
  73. -----------------------------------------------------------------
  74. We make something where they don't match:
  75. >>> wrong_end_dtype_arr = np.ndarray(shape=(2,),dtype='<i2', buffer=big_end_str)
  76. >>> wrong_end_dtype_arr[0]
  77. 256
  78. The obvious fix for this situation is to change the dtype so it gives
  79. the correct endianness:
  80. >>> fixed_end_dtype_arr = wrong_end_dtype_arr.newbyteorder()
  81. >>> fixed_end_dtype_arr[0]
  82. 1
  83. Note the array has not changed in memory:
  84. >>> fixed_end_dtype_arr.tobytes() == big_end_str
  85. True
  86. Data and type endianness don't match, change data to match dtype
  87. ----------------------------------------------------------------
  88. You might want to do this if you need the data in memory to be a certain
  89. ordering. For example you might be writing the memory out to a file
  90. that needs a certain byte ordering.
  91. >>> fixed_end_mem_arr = wrong_end_dtype_arr.byteswap()
  92. >>> fixed_end_mem_arr[0]
  93. 1
  94. Now the array *has* changed in memory:
  95. >>> fixed_end_mem_arr.tobytes() == big_end_str
  96. False
  97. Data and dtype endianness match, swap data and dtype
  98. ----------------------------------------------------
  99. You may have a correctly specified array dtype, but you need the array
  100. to have the opposite byte order in memory, and you want the dtype to
  101. match so the array values make sense. In this case you just do both of
  102. the previous operations:
  103. >>> swapped_end_arr = big_end_arr.byteswap().newbyteorder()
  104. >>> swapped_end_arr[0]
  105. 1
  106. >>> swapped_end_arr.tobytes() == big_end_str
  107. False
  108. An easier way of casting the data to a specific dtype and byte ordering
  109. can be achieved with the ndarray astype method:
  110. >>> swapped_end_arr = big_end_arr.astype('<i2')
  111. >>> swapped_end_arr[0]
  112. 1
  113. >>> swapped_end_arr.tobytes() == big_end_str
  114. False
  115. """
  116. from __future__ import division, absolute_import, print_function