October 11, 2018 · 数据结构与算法 本文字数: 1.1k 阅读时长:4 min 全站字数:344.2k

敏而好学之WHAT——数组的精髓

  1. 数组的特性
    1. 随机访问复杂度为O(1)
    2. 删除和插入需要乾坤大挪移数据
  2. 使用数组的注意事项
  3. 题外话:为什么数组下标从0开始

本文是数据结构与算法之美专栏的学习笔记。

数据结构可以分为线性表和非线性表。线性表(linear list)如其名称,是一种一维的数据结构,只有前后两个方向。数组,队列,栈,链表都是线性表(linear list)。非线性表十多维的数据结构,如二叉树,堆,图等。

数组就是使用一段连续的内存空间,来存储一组相同数据类型的数据。

数组的特性

随机访问复杂度为O(1)

由于数组占用连续的内存空间,而且存储相同的数据类型,所以数组的随机访问的复杂度为O(1)。但其代价就是数据删除和插入时需要移动大量的数组元素,从而保证数组中数据的连续特性。

对于一个数组:int[] a = new int[10];。数组中任意元素的寻址公式就是:a[i] = a[0] + i * sizeof(int)

删除和插入需要乾坤大挪移数据

为了保证数组之所以为数组,即保证数组的性质不变,插入和删除时必须移动数据。

对于插入数据,如果将数据插入到位置j。针对数组是有序还是无序具有不同的时间复杂度。

如果数组需要保持有序,那么数组中原来的数据(j, n-1)的数据必须向后移动一位。插入的平均复杂度是O(n)

如果数组无序,则可以将原来j位置的数据移动到数组最后一位即可,插入的复杂度为O(1)

对于删除操作,与插入类似,只是执行相反的操作即可。

删除操作的优化:没有必要时刻保持数组的性质不变,可以只是标识数据为已经删除,到了恰当的时间(如数组空间不足时),再批量处理数组元素的删除,恢复数组的性质。

使用数组的注意事项

  1. 数组越界问题。如果你是Java程序员,你是幸运的。但如果你是C程序员请务必注意,缓冲区溢出攻击,数组越界导致程序崩溃等等。

  2. Java中的ArrayList将数组的操作细节进行封装,并进行优化,而且支持动态扩容。但是内存申请和数据移动是比较耗时的,如果可以预估数据的大小,可以指定大小创建,从而避免自动扩容导致的开销。

  3. Java中容器只支持对象,基本类型则需要自动装箱和拆箱有一定的消耗,如果是简单场景使用数组更直观。

  4. 多维数组使用数组更直观。

题外话:为什么数组下标从0开始

参考:为什么数组下标从0开始

历史原因:

If a BCPL variable represents a pointer, it points to one or more consecutive words of memory. These words are the same size as BCPL variables. Just as machine code allows address arithmetic so does BCPL, so if p is a pointer p+1 is a pointer to the next word after the one p points to. Naturally p+0 has the same value as p. The monodic indirection operator ! takes a pointer as it’s argument and returns the contents of the word pointed to. If v is a pointer !(v+I) will access the word pointed to by v+I.

更高效:

So: the technical reason we started counting arrays at zero is that in the mid-1960’s, you could shave a few cycles off of a program’s compilation time on an IBM 7094. The social reason is that we had to save every cycle we could, because if the job didn’t finish fast it might not finish at all and you never know when you’re getting bumped off the hardware because the President of IBM just called and fuck your thesis, it’s yacht-racing time.

更简洁:

Using 0-based indexing, half-open intervals, and suitable defaults (as Python ended up having), they are beautiful: a[:n] and a[i:i+n]; the former is long for a[0:n]. Using 1-based indexing, if you want a[:n] to mean the first n elements, you either have to use closed intervals or you can use a slice notation that uses start and length as the slice parameters. Using half-open intervals just isn’t very elegant when combined with 1-based indexing. Using closed intervals, you’d have to write a[i:i+n-1] for the n items starting at i. So perhaps using the slice length would be more elegant with 1-based indexing? Then you could write a[i:n]. And this is in fact what ABC did – it used a different notation so you could write a@i|n.(See http://homepages.cwi.nl/~steven/abc/qr.html#EXPRESSIONS.)