版本 0.16.0（2015 年 3 月 22 日）#

这是 0.15.2 的主要版本，包括少量 API 更改、一些新功能、增强功能和性能改进以及大量错误修复。我们建议所有用户升级到此版本。

亮点包括：

DataFrame.assign方法请看这里
Series.to_coo/from_coo交互方法scipy.sparse，请参见此处
向后不兼容的更改以Timedelta使.seconds属性符合datetime.timedelta，请参阅此处
对.loc切片 API 进行更改以符合.ix此处的行为
更改Categorical构造函数中的默认排序，请参阅此处
访问器的增强.str使字符串操作更容易，请参见此处
、pandas.tools.rplot和pandas.sandbox.qtpandas模块pandas.rpy 已弃用。我们建议用户使用 seaborn、 pandas-qt和 rpy2等外部软件包来获取类似或等效的功能，请参阅此处

新功能＃

数据帧分配#

受到dplyr mutate动词的启发，DataFrame 有一个新 assign()方法。的函数签名assign很简单**kwargs。键是新字段的列名称，值是要插入的值（例如，一个Series或 NumPy 数组），或者是要在DataFrame.插入新值，并返回整个 DataFrame（包含所有原始列和新列）。

In [1]: iris = pd.read_csv('data/iris.data')

In [2]: iris.head()
Out[2]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name
0          5.1         3.5          1.4         0.2  Iris-setosa
1          4.9         3.0          1.4         0.2  Iris-setosa
2          4.7         3.2          1.3         0.2  Iris-setosa
3          4.6         3.1          1.5         0.2  Iris-setosa
4          5.0         3.6          1.4         0.2  Iris-setosa

[5 rows x 5 columns]

In [3]: iris.assign(sepal_ratio=iris['SepalWidth'] / iris['SepalLength']).head()
Out[3]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name  sepal_ratio
0          5.1         3.5          1.4         0.2  Iris-setosa     0.686275
1          4.9         3.0          1.4         0.2  Iris-setosa     0.612245
2          4.7         3.2          1.3         0.2  Iris-setosa     0.680851
3          4.6         3.1          1.5         0.2  Iris-setosa     0.673913
4          5.0         3.6          1.4         0.2  Iris-setosa     0.720000

[5 rows x 6 columns]

上面是插入预先计算值的示例。我们还可以传入一个要评估的函数。

In [4]: iris.assign(sepal_ratio=lambda x: (x['SepalWidth']
   ...:                                    / x['SepalLength'])).head()
   ...: 
Out[4]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name  sepal_ratio
0          5.1         3.5          1.4         0.2  Iris-setosa     0.686275
1          4.9         3.0          1.4         0.2  Iris-setosa     0.612245
2          4.7         3.2          1.3         0.2  Iris-setosa     0.680851
3          4.6         3.1          1.5         0.2  Iris-setosa     0.673913
4          5.0         3.6          1.4         0.2  Iris-setosa     0.720000

[5 rows x 6 columns]

assign当在操作链中使用时，它的力量就会显现出来。例如，我们可以将 DataFrame 限制为萼片长度大于 5 的 DataFrame，计算比率并绘制

In [5]: iris = pd.read_csv('data/iris.data')

In [6]: (iris.query('SepalLength > 5')
   ...:      .assign(SepalRatio=lambda x: x.SepalWidth / x.SepalLength,
   ...:              PetalRatio=lambda x: x.PetalWidth / x.PetalLength)
   ...:      .plot(kind='scatter', x='SepalRatio', y='PetalRatio'))
   ...: 
Out[6]: <Axes: xlabel='SepalRatio', ylabel='PetalRatio'>

请参阅文档了解更多信息。（GH 9229）

与 scipy.sparse 交互#

添加了用于与实例进行转换的SparseSeries.to_coo()和SparseSeries.from_coo()方法 ( GH 8048 )（请参阅此处）。例如，给定具有 MultiIndex 的 SparseSeries，我们可以通过将行和列标签指定为索引级别来转换为 a：scipy.sparse.coo_matrixscipy.sparse.coo_matrix

s = pd.Series([3.0, np.nan, 1.0, 3.0, np.nan, np.nan])
s.index = pd.MultiIndex.from_tuples([(1, 2, 'a', 0),
                                     (1, 2, 'a', 1),
                                     (1, 1, 'b', 0),
                                     (1, 1, 'b', 1),
                                     (2, 1, 'b', 0),
                                     (2, 1, 'b', 1)],
                                    names=['A', 'B', 'C', 'D'])

s

# SparseSeries
ss = s.to_sparse()
ss

A, rows, columns = ss.to_coo(row_levels=['A', 'B'],
                             column_levels=['C', 'D'],
                             sort_labels=False)

A
A.todense()
rows
columns

SparseSeries from_coo 方法是从 a创建 a 的便捷方法scipy.sparse.coo_matrix：

from scipy import sparse
A = sparse.coo_matrix(([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])),
                      shape=(3, 4))
A
A.todense()

ss = pd.SparseSeries.from_coo(A)
ss

字符串方法增强#

可通过.str访问器访问以下新方法，以将函数应用于每个值。这样做的目的是使其与字符串的标准方法更加一致。（GH 9282、GH 9352、GH 9386、GH 9387、GH 9439）

方法

isalnum()

isalpha()

isdigit()

isdigit()

isspace()

islower()

isupper()

istitle()

isnumeric()

isdecimal()

find()

rfind()

ljust()

rjust()

zfill()
```
In [7]: s = pd.Series(['abcd', '3456', 'EFGH'])

In [8]: s.str.isalpha()
Out[8]: 
0     True
1    False
2     True
Length: 3, dtype: bool

In [9]: s.str.find('ab')
Out[9]: 
0    0
1   -1
2   -1
Length: 3, dtype: int64
```

		方法
`isalnum()`	`isalpha()`	`isdigit()`	`isdigit()`	`isspace()`
`islower()`	`isupper()`	`istitle()`	`isnumeric()`	`isdecimal()`
`find()`	`rfind()`	`ljust()`	`rjust()`	`zfill()`

Series.str.pad()现在Series.str.center()接受fillchar指定填充字符的选项（GH 9352）

In [10]: s = pd.Series(['12', '300', '25'])

In [11]: s.str.pad(5, fillchar='_')
Out[11]: 
0    ___12
1    __300
2    ___25
Length: 3, dtype: object

添加了Series.str.slice_replace()之前提出的NotImplementedError（GH 8888）

In [12]: s = pd.Series(['ABCD', 'EFGH', 'IJK'])

In [13]: s.str.slice_replace(1, 3, 'X')
Out[13]: 
0    AXD
1    EXH
2     IX
Length: 3, dtype: object

# replaced with empty char
In [14]: s.str.slice_replace(0, 1)
Out[14]: 
0    BCD
1    FGH
2     JK
Length: 3, dtype: object

其他增强功能#

Reindex 现在支持method='nearest'具有单调递增或递减索引的帧或系列 ( GH 9258 )：

In [15]: df = pd.DataFrame({'x': range(5)})

In [16]: df.reindex([0.2, 1.8, 3.5], method='nearest')
Out[16]: 
     x
0.2  0
1.8  2
3.5  4

[3 rows x 1 columns]

这个方法也被较低层Index.get_indexer和Index.get_loc方法暴露出来。

该read_excel()函数的sheetname参数现在接受列表和None,以分别获取多个或所有工作表。如果指定了多个工作表，则返回字典。 ( GH 9450 )
```
# Returns the 1st and 4th sheet, as a dictionary of DataFrames.
pd.read_excel('path_to_file.xls', sheetname=['Sheet1', 3])
```
允许使用迭代器增量读取 Stata 文件；支持 Stata 文件中的长字符串。请参阅此处的文档（GH 9493：）。
以 ~ 开头的路径现在将扩展为以用户的主目录 ( GH 9066 )开头
get_data_yahoo在( GH 9071 )中添加了时间间隔选择
添加Timestamp.to_datetime64()到补充Timedelta.to_timedelta64()（GH 9255）
tseries.frequencies.to_offset()现在接受Timedelta作为输入（GH 9064）
的自相关方法中添加了滞后参数Series，默认为lag-1自相关（GH 9192）
Timedelta现在将nanoseconds在构造函数中接受关键字（GH 9273）
SQL 代码现在可以安全地转义表名和列名 ( GH 8986 )
添加了自动完成Series.str.<tab>,Series.dt.<tab>和Series.cat.<tab>( GH 9322 )
Index.get_indexer现在支持method='pad'甚至method='backfill'任何目标数组，而不仅仅是单调目标。这些方法也适用于单调递减和单调递增索引（GH 9258）。
Index.asof现在适用于所有索引类型（GH 9258）。
verbose中增加了一个参数io.read_excel()，默认为 False。设置为 True 可在解析工作表名称时打印它们。 ( GH 9450 )
向、、、和( GH 9572 )添加了days_in_month（兼容性别名daysinmonth）属性TimestampDatetimeIndexPeriodPeriodIndexSeries.dt
添加了为非“.”提供格式设置的decimal选项to_csv小数点分隔符 ( GH 781 )
添加了标准化为午夜的normalize选项（ GH 8794）Timestamp
DataFrame添加了使用 HDF5 文件和rhdf5 库导入到 R的示例。有关更多信息，请参阅文档 ( GH 9636 )。

向后不兼容的 API 更改#

timedelta 的变化#

在 v0.15.0 中Timedelta引入了一种新的标量类型，它是datetime.timedelta.这里提到的是关于访问器的 API 更改的通知.seconds。目的是提供一组用户友好的访问器，为该单位提供“自然”值，例如，如果您有一个，则将返回 12。但是，这与的定义不一致，后者定义为。Timedelta('1 day, 10:11:12').secondsdatetime.timedelta.seconds10 * 3600 + 11 * 60 + 12 == 36672

因此，在 v0.16.0 中，我们正在恢复 API 以匹配datetime.timedelta.此外，组件值仍然可以通过.components访问器获得。这会影响.seconds和.microseconds访问器，并删除.hours, .minutes,.milliseconds访问器。这些更改也会影响TimedeltaIndexSeries访问器。 .dt（GH 9185，GH 9139）

以前的行为

In [2]: t = pd.Timedelta('1 day, 10:11:12.100123')

In [3]: t.days
Out[3]: 1

In [4]: t.seconds
Out[4]: 12

In [5]: t.microseconds
Out[5]: 123

新行为

In [17]: t = pd.Timedelta('1 day, 10:11:12.100123')

In [18]: t.days
Out[18]: 1

In [19]: t.seconds
Out[19]: 36672

In [20]: t.microseconds
Out[20]: 100123

使用.components允许完整的组件访问

In [21]: t.components
Out[21]: Components(days=1, hours=10, minutes=11, seconds=12, milliseconds=100, microseconds=123, nanoseconds=0)

In [22]: t.components.seconds
Out[22]: 12

索引更改#

使用的一小部分边缘情况的行为.loc已经改变（GH 8613）。此外，我们还改进了出现的错误消息的内容：

.loc现在允许在索引中未找到起始和/或停止边界的位置进行切片；这以前会引发一个KeyError.这使得行为与.ix本例中的行为相同。此更改仅适用于切片，不适用于使用单个标签进行索引时。

In [23]: df = pd.DataFrame(np.random.randn(5, 4),
   ....:                   columns=list('ABCD'),
   ....:                   index=pd.date_range('20130101', periods=5))
   ....: 

In [24]: df
Out[24]: 
                   A         B         C         D
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401

[5 rows x 4 columns]

In [25]: s = pd.Series(range(5), [-2, -1, 1, 2, 3])

In [26]: s
Out[26]: 
-2    0
-1    1
 1    2
 2    3
 3    4
Length: 5, dtype: int64

以前的行为

In [4]: df.loc['2013-01-02':'2013-01-10']
KeyError: 'stop bound [2013-01-10] is not in the [index]'

In [6]: s.loc[-10:3]
KeyError: 'start bound [-10] is not the [index]'

新行为

In [27]: df.loc['2013-01-02':'2013-01-10']
Out[27]: 
                   A         B         C         D
2013-01-02  1.212112 -0.173215  0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
2013-01-04  0.721555 -0.706771 -1.039575  0.271860
2013-01-05 -0.424972  0.567020  0.276232 -1.087401

[4 rows x 4 columns]

In [28]: s.loc[-10:3]
Out[28]: 
-2    0
-1    1
 1    2
 2    3
 3    4
Length: 5, dtype: int64

允许在的整数索引上使用类似浮点的值进行切片.ix。以前，此功能仅适用于.loc：

以前的行为

In [8]: s.ix[-1.0:2]
TypeError: the slice start value [-1.0] is not a proper indexer for this index type (Int64Index)

新行为

In [2]: s.ix[-1.0:2]
Out[2]:
-1    1
 1    2
 2    3
dtype: int64

使用时，为索引的无效类型提供有用的异常.loc。例如，尝试使用整数（或浮点）.loc类型的索引DatetimeIndexorPeriodIndex或。TimedeltaIndex

以前的行为

In [4]: df.loc[2:3]
KeyError: 'start bound [2] is not the [index]'

新行为

In [4]: df.loc[2:3]
TypeError: Cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'> with <type 'int'> keys

类别变化#

在以前的版本中，Categoricals具有未指定的顺序（意味着没有ordered传递关键字）被默认为ordered分类。展望未来，构造函数ordered中的关键字Categorical将默认为False。现在订购必须明确。

此外，以前您可以ordered通过设置属性来更改分类的属性，例如cat.ordered=True；现在已弃用，您应该使用cat.as_ordered()或cat.as_unordered()。默认情况下，它们将返回一个新对象，并且不会修改现有对象。（GH 9347，GH 9190）

以前的行为

In [3]: s = pd.Series([0, 1, 2], dtype='category')

In [4]: s
Out[4]:
0    0
1    1
2    2
dtype: category
Categories (3, int64): [0 < 1 < 2]

In [5]: s.cat.ordered
Out[5]: True

In [6]: s.cat.ordered = False

In [7]: s
Out[7]:
0    0
1    1
2    2
dtype: category
Categories (3, int64): [0, 1, 2]

新行为

In [29]: s = pd.Series([0, 1, 2], dtype='category')

In [30]: s
Out[30]: 
0    0
1    1
2    2
Length: 3, dtype: category
Categories (3, int64): [0, 1, 2]

In [31]: s.cat.ordered
Out[31]: False

In [32]: s = s.cat.as_ordered()

In [33]: s
Out[33]: 
0    0
1    1
2    2
Length: 3, dtype: category
Categories (3, int64): [0 < 1 < 2]

In [34]: s.cat.ordered
Out[34]: True

# you can set in the constructor of the Categorical
In [35]: s = pd.Series(pd.Categorical([0, 1, 2], ordered=True))

In [36]: s
Out[36]: 
0    0
1    1
2    2
Length: 3, dtype: category
Categories (3, int64): [0 < 1 < 2]

In [37]: s.cat.ordered
Out[37]: True

为了便于创建一系列分类数据，我们添加了在调用时传递关键字的功能.astype()。这些直接传递给构造函数。

In [54]: s = pd.Series(["a", "b", "c", "a"]).astype('category', ordered=True)

In [55]: s
Out[55]:
0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): [a < b < c]

In [56]: s = (pd.Series(["a", "b", "c", "a"])
   ....:        .astype('category', categories=list('abcdef'), ordered=False))

In [57]: s
Out[57]:
0    a
1    b
2    c
3    a
dtype: category
Categories (6, object): [a, b, c, d, e, f]

其他 API 更改#

Index.duplicated现在返回np.array(dtype=bool)而不是Index(dtype=object)包含bool值。 ( GH 8875 )
DataFrame.to_json现在为混合数据类型帧的每列返回准确的类型序列化（GH 9037）

以前，数据在序列化之前被强制转换为通用数据类型，例如，这会导致整数被序列化为浮点数：
```
In [2]: pd.DataFrame({'i': [1,2], 'f': [3.0, 4.2]}).to_json()
Out[2]: '{"f":{"0":3.0,"1":4.2},"i":{"0":1.0,"1":2.0}}'
```
现在，每一列都使用其正确的数据类型进行序列化：
```
In [2]:  pd.DataFrame({'i': [1,2], 'f': [3.0, 4.2]}).to_json()
Out[2]: '{"f":{"0":3.0,"1":4.2},"i":{"0":1,"1":2}}'
```
DatetimeIndex，PeriodIndex现在TimedeltaIndex.summary输出相同的格式。 ( GH 9116 )
TimedeltaIndex.freqstr现在输出与相同的字符串格式DatetimeIndex。 ( GH 9116 )
条形图和水平条形图不再沿信息轴添加虚线。先前的风格可以使用 matplotlibaxhline或axvline方法（GH 9088）来实现。
Series访问器.dt，.cat现在如果该系列不包含适当类型的数据（GH 9617.str ），则引发AttributeError而不是。这更紧密地遵循 Python 的内置异常层次结构，并确保类似的测试在 Python 2 和 3 上保持一致。TypeErrorhasattr(s, 'cat')

Series现在支持整数类型的按位运算（GH 9016）。以前，即使输入 dtype 是整数，输出 dtype 也会被强制为bool。

以前的行为

In [2]: pd.Series([0, 1, 2, 3], list('abcd')) | pd.Series([4, 4, 4, 4], list('abcd'))
Out[2]:
a    True
b    True
c    True
d    True
dtype: bool

新行为。如果输入数据类型是整型，则输出数据类型也是整型，并且输出值是按位运算的结果。

In [2]: pd.Series([0, 1, 2, 3], list('abcd')) | pd.Series([4, 4, 4, 4], list('abcd'))
Out[2]:
a    4
b    5
c    6
d    7
dtype: int64

在涉及Seriesor 的除法中DataFrame，0/0现在0//0用 ornp.nan代替np.inf。（GH 9144，GH 8445）

以前的行为

In [2]: p = pd.Series([0, 1])

In [3]: p / 0
Out[3]:
0    inf
1    inf
dtype: float64

In [4]: p // 0
Out[4]:
0    inf
1    inf
dtype: float64

新行为

In [38]: p = pd.Series([0, 1])

In [39]: p / 0
Out[39]: 
0    NaN
1    inf
Length: 2, dtype: float64

In [40]: p // 0
Out[40]: 
0    NaN
1    inf
Length: 2, dtype: float64

Series.values_counts对于Series.describe分类数据，现在将NaN条目放在末尾。 ( GH 9443 )
Series.describe对于分类数据，现在将给出 0 的计数和频率，而NaN对于未使用的类别 ( GH 9443 )
由于错误修复，现在查找部分字符串标签时会包含与字符串匹配的值，即使它们位于部分字符串标签 ( GH 9258 )DatetimeIndex.asof的开头之后。

旧行为：
```
In [4]: pd.to_datetime(['2000-01-31', '2000-02-28']).asof('2000-02')
Out[4]: Timestamp('2000-01-31 00:00:00')
```
固定行为：
```
In [41]: pd.to_datetime(['2000-01-31', '2000-02-28']).asof('2000-02')
Out[41]: Timestamp('2000-02-28 00:00:00')
```
要重现旧的行为，只需为标签添加更多精度（例如，使用2000-02-01而不是2000-02）。

弃用#

网格rplot绘图界面已弃用，并将在未来版本中删除。我们参考 seaborn之类的外部包来获得类似但更精致的功能（GH 3445）。该文档包括一些如何将现有代码从此处rplot转换为 seaborn 的示例。
该pandas.sandbox.qtpandas接口已弃用，并将在未来版本中删除。我们建议用户参考外部包pandas-qt。（GH 9615）
该pandas.rpy接口已弃用，并将在未来版本中删除。可以通过rpy2项目（GH 9602）访问类似的功能
添加DatetimeIndex/PeriodIndex到另一个DatetimeIndex/PeriodIndex作为集合操作已被弃用。这将TypeError在未来版本中更改为 a 。.union()应该用于并集运算。（GH 9094）
DatetimeIndex/PeriodIndex相减DatetimeIndex/PeriodIndex作为集合运算已被弃用。TimeDeltaIndex在未来的版本中，这将更改为实际的数字减法，生成 a 。.difference()应该用于差分集操作。（GH 9094）

删除先前版本的弃用/更改#

DataFrame.pivot_tableand和关键字crosstab参数已被删除，以支持and ( GH 6581 )rowscolsindexcolumns
DataFrame.to_excel并且DataFrame.to_csv cols关键字参数被删除以支持columns（GH 6581）
删除convert_dummies以支持get_dummies( GH 6581 )
删除value_range以支持describe( GH 6581 )

性能改进#

.loc修复了使用数组或类似列表进行索引的性能回归（ GH 9126：）。
DataFrame.to_json混合 dtype 帧的性能提高了 30 倍。（GH 9037）
MultiIndex.duplicated通过使用标签而不是值来提高性能（ GH 9125）
nunique提高了通过调用unique代替的速度value_counts（GH 9129，GH 7771）
DataFrame.count通过DataFrame.dropna适当利用同质/异质数据类型，性能提高高达 10 倍( GH 9136 )
DataFrame.count使用 aMultiIndex和关键字参数时，性能提升高达 20 倍level( GH 9163 )
merge当密钥空间超出界限时，性能和内存使用情况得到改善int64（GH 9151）
多键性能改进groupby( GH 9429 )
MultiIndex.sortlevel( GH 9445 )中的性能改进
DataFrame.duplicated( GH 9398 )中的性能和内存使用改进
Cythonized Period（GH 9440）
减少内存使用量to_hdf( GH 9648 )

Bug修复＃

更改.to_html为删除表体中的前导/尾随空格（GH 4987）
read_csv修复了在 s3 上使用 Python 3 的问题（ GH 9452）
修复了影响默认DatetimeIndex架构的兼容性问题（GH 8943）numpy.int_numpy.int32
使用类似对象的面板索引中的错误（GH 9140）
返回Series.dt.components索引中的错误已重置为默认索引（GH 9247）
类似列表的输入Categorical.__getitem__/__setitem__由于索引器强制而获得不正确的结果（GH 9469）
DatetimeIndex 部分设置中的错误（GH 9478）
应用聚合器时，整数和 datetime64 列的 groupby 中存在错误，导致数字足够大时值发生更改（GH 9311、GH 6620）
to_sql修复了将Timestamp对象列（带有时区信息的日期时间列）映射到适当的 sqlalchemy 类型 ( GH 9085 )时的错误。
修复了参数中to_sql dtype不接受实例化 SQLAlchemy 类型的错误 ( GH 9083 )。
.loc部分设置中的错误np.datetime64（GH 9516）
在类似日期时间的外观Series和.xs切片上推断出不正确的数据类型（GH 9477）
中的项目Categorical.unique()（s.unique()如果s是 dtype category）现在按最初找到的顺序显示，而不是按排序顺序（GH 9331）。现在，这与 pandas 中其他数据类型的行为一致。
修复了大端平台上产生错误结果的错误StataReader（GH 8688）。
当具有多个级别时MultiIndex.has_duplicates会导致索引器溢出（GH 9075、GH 5873）
错误pivot以及值会破坏索引对齐的unstack位置nan（GH 4862、GH 7401、GH 7403、GH 7405、GH 7466、GH 9497）
joinMultiIndex 左侧存在带有sort=True或空值的错误（ GH 9210）。
MultiIndex插入新密钥会失败的错误( GH 9250 )。
groupby当密钥空间超出int64界限时出现错误（ GH 9096）。
错误unstack与TimedeltaIndexorDatetimeIndex和 null ( GH 9491 )。
rank将浮点数与公差进行比较会导致行为不一致的错误（ GH 8365）。
修复了从 URL ( GH 9231read_stata ) 加载数据时的字符编码错误。StataReader
offsets.Nano添加到其他偏移量时会出现错误TypeError（GH 9284）
迭代中的错误，与 ( GH 8890DatetimeIndex )相关，已在 ( GH 9100 )中修复
夏令时转换周围的错误resample。这需要修复偏移量类，以便它们在 DST 转换时正确运行。（GH 5172、GH 8744、GH 8653、GH 9173、GH 9468）。
二元运算符方法（例如.mul()）与整数级别对齐（GH 9463）中的错误。
箱线图、散点图和十六进制图中的错误可能会显示不必要的警告（GH 8877）
kw子图中的错误layout可能会显示不必要的警告（GH 9464）
使用包装函数（例如）时，使用需要传递参数（例如轴）的石斑鱼函数时出现错误fillna（GH 9221）
DataFrame现在正确支持构造函数中的同时copy和dtype参数（GH 9099）
read_csv在以 c 引擎结尾的 CR 行的文件上使用跳行时出现错误。（GH 9079）
isnull现在检测NaT到PeriodIndex( GH 9129 )
.nth()具有多列 groupby 的groupby 中的错误（ GH 8979）
错误插入DataFrame.where并Series.where强制数字错误地字符串（GH 9280）
当传递类似字符串列表时出现错误DataFrame.where并Series.where引发。 ValueError( GH 9280 )
Series.str现在使用非字符串值访问方法会引发错误TypeError，而不是产生错误的结果 ( GH 9184 )
DatetimeIndex.__contains__当索引有重复且不是单调递增时出现错误（ GH 9512）
Series.kurt()修复了所有值相等时除以零的错误（ GH 9197）
修复了引擎中的问题：xlsxwriter如果没有应用其他格式，它会向单元格添加默认的“常规”格式。这会阻止应用其他行或列格式。（GH 9167）
index_col=False修复了何时usecols也在中指定的问题read_csv。 ( GH 9082 )
wide_to_long修改输入存根名称列表的错误（ GH 9204）
to_sql不使用双精度存储 float64 值的错误。（GH 9009）
SparseSeries现在SparsePanel接受零参数构造函数（与其非稀疏对应构造函数相同）（GH 9272）。
合并Categorical和object数据类型的回归（GH 9426）
read_csv某些格式错误的输入文件会导致缓冲区溢出错误( GH 9205 )
groupby MultiIndex 中缺少对的错误（GH 9049，GH 9344）
Series.groupby修复了按级别分组MultiIndex会忽略排序参数的错误（ GH 9444）
修复在分类列的情况下忽略DataFrame.Groupbywhere 的错误。 sort=False( GH 8868 )
修复了在 python 3 上从 Amazon S3 读取 CSV 文件时引发 TypeError ( GH 9452 )的错误
Google BigQuery 阅读器中的错误，其中“jobComplete”键可能存在，但查询结果中为 False ( GH 8728 )
错误Series.values_counts排除NaN分类类型Series（dropna=TrueGH 9443）
DataFrame.std/var/sem修复了( GH 9201 )缺少的 numeric_only 选项
支持使用标量数据构造Panelor ( GH 8285 )Panel4D
Series文本表示与max_rows/ max_columns( GH 7508 ) 断开。

Series截断时数字格式不一致 ( GH 8532 )。

以前的行为

In [2]: pd.options.display.max_rows = 10
In [3]: s = pd.Series([1,1,1,1,1,1,1,1,1,1,0.9999,1,1]*10)
In [4]: s
Out[4]:
0    1
1    1
2    1
...
127    0.9999
128    1.0000
129    1.0000
Length: 130, dtype: float64

新行为

    1.0000
    1.0000
    1.0000
    1.0000
    1.0000
...
  1.0000
  1.0000
  0.9999
  1.0000
  1.0000
dtype: float64

SettingWithCopy在某些情况下，在框架中设置新项目时会生成虚假警告（ GH 8730）

以下内容之前会报告SettingWithCopy警告。

In [42]: df1 = pd.DataFrame({'x': pd.Series(['a', 'b', 'c']),
   ....:                     'y': pd.Series(['d', 'e', 'f'])})
   ....: 

In [43]: df2 = df1[['x']]

In [44]: df2['y'] = ['g', 'h', 'i']

贡献者#

共有 60 人为此版本贡献了补丁。名字带有“+”的人首次贡献了补丁。

亚伦·托特 +
杜艾伦+
亚历山德罗·阿米奇 +
阿特米·科尔钦斯基
阿什维尼·乔杜里 +
本席勒
比尔·莱森
布兰登·布拉德利 +
周黄+
克里斯·雷诺兹
克里斯·惠兰 +
克里斯特·范德梅伦 +
大卫·科特雷尔 +
大卫·斯蒂芬斯
埃桑·阿扎纳萨布 +
加勒特-R+
纪尧姆·盖伊
杰克·托卡索 +
杰森·塞克绍尔
杰夫·雷巴克
约翰·麦克纳马拉
乔里斯·范登博什
Joschka zur Jacobsmühlen +
华雷斯·博奇 +
林淳也 +
K.-迈克尔·埃伊
克比·谢登 +
凯文·谢泼德
基兰·奥马霍尼
科迪·阿弗 +
马蒂·艾拉斯 +
最小 RK +
莫尔塔达·梅哈尔
罗伯特+
斯科特·拉斯利
斯科特·拉斯利 +
塞尔吉奥·帕斯夸尔 +
船长西博尔德
史蒂芬·霍耶
托马斯·格兰杰
汤姆·奥格斯普格
汤姆·奥格斯普格
弗拉基米尔·菲利莫诺夫 +
维奥姆凯什·特里帕蒂 +
威尔·霍姆格伦
杨玉龙+
贝赫扎德·努里
伯特兰豪特+
比约宁
细胞4+
克勒姆
超强超强+
伊施瓦巴赫
克拉蒂
乔沙姆+
杰雷巴克
奥姆丁内斯+
罗克+
辛赫克斯
乌努特布