时间增量# 时间增量是时间上的差异,以差异单位表示,例如天、小时、分钟、秒。它们既可以是积极的,也可以是消极的。 Timedelta是 的子类datetime.timedelta,其行为方式类似,但允许与np.timedelta64类型以及大量自定义表示、解析和属性兼容。 解析# 您可以Timedelta通过各种参数构造标量,包括ISO 8601 持续时间字符串。 In [1]: import datetime # strings In [2]: pd.Timedelta("1 days") Out[2]: Timedelta('1 days 00:00:00') In [3]: pd.Timedelta("1 days 00:00:00") Out[3]: Timedelta('1 days 00:00:00') In [4]: pd.Timedelta("1 days 2 hours") Out[4]: Timedelta('1 days 02:00:00') In [5]: pd.Timedelta("-1 days 2 min 3us") Out[5]: Timedelta('-2 days +23:57:59.999997') # like datetime.timedelta # note: these MUST be specified as keyword arguments In [6]: pd.Timedelta(days=1, seconds=1) Out[6]: Timedelta('1 days 00:00:01') # integers with a unit In [7]: pd.Timedelta(1, unit="d") Out[7]: Timedelta('1 days 00:00:00') # from a datetime.timedelta/np.timedelta64 In [8]: pd.Timedelta(datetime.timedelta(days=1, seconds=1)) Out[8]: Timedelta('1 days 00:00:01') In [9]: pd.Timedelta(np.timedelta64(1, "ms")) Out[9]: Timedelta('0 days 00:00:00.001000') # negative Timedeltas have this string repr # to be more consistent with datetime.timedelta conventions In [10]: pd.Timedelta("-1us") Out[10]: Timedelta('-1 days +23:59:59.999999') # a NaT In [11]: pd.Timedelta("nan") Out[11]: NaT In [12]: pd.Timedelta("nat") Out[12]: NaT # ISO 8601 Duration strings In [13]: pd.Timedelta("P0DT0H1M0S") Out[13]: Timedelta('0 days 00:01:00') In [14]: pd.Timedelta("P0DT0H0M0.000000123S") Out[14]: Timedelta('0 days 00:00:00.000000123') DateOffsets () 也可用于构造。Day, Hour, Minute, Second, Milli, Micro, Nano In [15]: pd.Timedelta(pd.offsets.Second(2)) Out[15]: Timedelta('0 days 00:00:02') 此外,标量之间的运算会产生另一个标量Timedelta。 In [16]: pd.Timedelta(pd.offsets.Day(2)) + pd.Timedelta(pd.offsets.Second(2)) + pd.Timedelta( ....: "00:00:00.000123" ....: ) ....: Out[16]: Timedelta('2 days 00:00:02.000123') to_timedelta # 使用顶层pd.to_timedelta,您可以将标量、数组、列表或系列从可识别的 timedelta 格式/值转换为类型Timedelta。如果输入是系列,它将构造 Series;如果输入是类似标量,它将构造标量,否则它将输出TimedeltaIndex。 您可以将单个字符串解析为 Timedelta: In [17]: pd.to_timedelta("1 days 06:05:01.00003") Out[17]: Timedelta('1 days 06:05:01.000030') In [18]: pd.to_timedelta("15.5us") Out[18]: Timedelta('0 days 00:00:00.000015500') 或字符串列表/数组: In [19]: pd.to_timedelta(["1 days 06:05:01.00003", "15.5us", "nan"]) Out[19]: TimedeltaIndex(['1 days 06:05:01.000030', '0 days 00:00:00.000015500', NaT], dtype='timedelta64[ns]', freq=None) 如果输入是数字,则关键字unit参数指定 Timedelta 的单位: In [20]: pd.to_timedelta(np.arange(5), unit="s") Out[20]: TimedeltaIndex(['0 days 00:00:00', '0 days 00:00:01', '0 days 00:00:02', '0 days 00:00:03', '0 days 00:00:04'], dtype='timedelta64[ns]', freq=None) In [21]: pd.to_timedelta(np.arange(5), unit="d") Out[21]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None) 警告 如果将字符串或字符串数组作为输入传递,则unit关键字参数将被忽略。如果传递没有单位的字符串,则假定默认单位为纳秒。 时间增量限制# pandasTimedeltas使用 64 位整数以纳秒分辨率表示。因此,64 位整数限制决定了Timedelta限制。 In [22]: pd.Timedelta.min Out[22]: Timedelta('-106752 days +00:12:43.145224193') In [23]: pd.Timedelta.max Out[23]: Timedelta('106751 days 23:47:16.854775807') 运营# 您可以对 Series/DataFrame 进行操作,并通过对Series 或timedelta64[ns]进行减法运算来构造 Series 。datetime64[ns]Timestamps In [24]: s = pd.Series(pd.date_range("2012-1-1", periods=3, freq="D")) In [25]: td = pd.Series([pd.Timedelta(days=i) for i in range(3)]) In [26]: df = pd.DataFrame({"A": s, "B": td}) In [27]: df Out[27]: A B 0 2012-01-01 0 days 1 2012-01-02 1 days 2 2012-01-03 2 days In [28]: df["C"] = df["A"] + df["B"] In [29]: df Out[29]: A B C 0 2012-01-01 0 days 2012-01-01 1 2012-01-02 1 days 2012-01-03 2 2012-01-03 2 days 2012-01-05 In [30]: df.dtypes Out[30]: A datetime64[ns] B timedelta64[ns] C datetime64[ns] dtype: object In [31]: s - s.max() Out[31]: 0 -2 days 1 -1 days 2 0 days dtype: timedelta64[ns] In [32]: s - datetime.datetime(2011, 1, 1, 3, 5) Out[32]: 0 364 days 20:55:00 1 365 days 20:55:00 2 366 days 20:55:00 dtype: timedelta64[ns] In [33]: s + datetime.timedelta(minutes=5) Out[33]: 0 2012-01-01 00:05:00 1 2012-01-02 00:05:00 2 2012-01-03 00:05:00 dtype: datetime64[ns] In [34]: s + pd.offsets.Minute(5) Out[34]: 0 2012-01-01 00:05:00 1 2012-01-02 00:05:00 2 2012-01-03 00:05:00 dtype: datetime64[ns] In [35]: s + pd.offsets.Minute(5) + pd.offsets.Milli(5) Out[35]: 0 2012-01-01 00:05:00.005 1 2012-01-02 00:05:00.005 2 2012-01-03 00:05:00.005 dtype: datetime64[ns] 对级数中的标量进行运算timedelta64[ns]: In [36]: y = s - s[0] In [37]: y Out[37]: 0 0 days 1 1 days 2 2 days dtype: timedelta64[ns] NaT支持一系列具有值的时间增量: In [38]: y = s - s.shift() In [39]: y Out[39]: 0 NaT 1 1 days 2 1 days dtype: timedelta64[ns] 元素可以设置为NaT使用np.nan类似于日期时间的方式: In [40]: y[1] = np.nan In [41]: y Out[41]: 0 NaT 1 NaT 2 1 days dtype: timedelta64[ns] 操作数也可以以相反的顺序出现(用系列操作的单个对象): In [42]: s.max() - s Out[42]: 0 2 days 1 1 days 2 0 days dtype: timedelta64[ns] In [43]: datetime.datetime(2011, 1, 1, 3, 5) - s Out[43]: 0 -365 days +03:05:00 1 -366 days +03:05:00 2 -367 days +03:05:00 dtype: timedelta64[ns] In [44]: datetime.timedelta(minutes=5) + s Out[44]: 0 2012-01-01 00:05:00 1 2012-01-02 00:05:00 2 2012-01-03 00:05:00 dtype: datetime64[ns] min, max框架上支持相应的操作:idxmin, idxmax In [45]: A = s - pd.Timestamp("20120101") - pd.Timedelta("00:05:05") In [46]: B = s - pd.Series(pd.date_range("2012-1-2", periods=3, freq="D")) In [47]: df = pd.DataFrame({"A": A, "B": B}) In [48]: df Out[48]: A B 0 -1 days +23:54:55 -1 days 1 0 days 23:54:55 -1 days 2 1 days 23:54:55 -1 days In [49]: df.min() Out[49]: A -1 days +23:54:55 B -1 days +00:00:00 dtype: timedelta64[ns] In [50]: df.min(axis=1) Out[50]: 0 -1 days 1 -1 days 2 -1 days dtype: timedelta64[ns] In [51]: df.idxmin() Out[51]: A 0 B 0 dtype: int64 In [52]: df.idxmax() Out[52]: A 2 B 0 dtype: int64 min, max, idxmin, idxmax系列也支持操作。标量结果将是Timedelta. In [53]: df.min().max() Out[53]: Timedelta('-1 days +23:54:55') In [54]: df.min(axis=1).min() Out[54]: Timedelta('-1 days +00:00:00') In [55]: df.min().idxmax() Out[55]: 'A' In [56]: df.min(axis=1).idxmin() Out[56]: 0 您可以填充时间增量,传递时间增量来获取特定值。 In [57]: y.fillna(pd.Timedelta(0)) Out[57]: 0 0 days 1 0 days 2 1 days dtype: timedelta64[ns] In [58]: y.fillna(pd.Timedelta(10, unit="s")) Out[58]: 0 0 days 00:00:10 1 0 days 00:00:10 2 1 days 00:00:00 dtype: timedelta64[ns] In [59]: y.fillna(pd.Timedelta("-1 days, 00:00:05")) Out[59]: 0 -1 days +00:00:05 1 -1 days +00:00:05 2 1 days 00:00:00 dtype: timedelta64[ns] 您还可以对 求反、乘法和abs使用Timedeltas: In [60]: td1 = pd.Timedelta("-1 days 2 hours 3 seconds") In [61]: td1 Out[61]: Timedelta('-2 days +21:59:57') In [62]: -1 * td1 Out[62]: Timedelta('1 days 02:00:03') In [63]: -td1 Out[63]: Timedelta('1 days 02:00:03') In [64]: abs(td1) Out[64]: Timedelta('1 days 02:00:03') 减少# 数值归约运算timedelta64[ns]将返回Timedelta对象。像往常一样 NaT在评估期间被跳过。 In [65]: y2 = pd.Series( ....: pd.to_timedelta(["-1 days +00:00:05", "nat", "-1 days +00:00:05", "1 days"]) ....: ) ....: In [66]: y2 Out[66]: 0 -1 days +00:00:05 1 NaT 2 -1 days +00:00:05 3 1 days 00:00:00 dtype: timedelta64[ns] In [67]: y2.mean() Out[67]: Timedelta('-1 days +16:00:03.333333334') In [68]: y2.median() Out[68]: Timedelta('-1 days +00:00:05') In [69]: y2.quantile(0.1) Out[69]: Timedelta('-1 days +00:00:05') In [70]: y2.sum() Out[70]: Timedelta('-1 days +00:00:10') 变频# Timedelta 系列 和TimedeltaIndex, 和Timedelta可以通过键入特定的 timedelta dtype 来转换为其他频率。 In [71]: december = pd.Series(pd.date_range("20121201", periods=4)) In [72]: january = pd.Series(pd.date_range("20130101", periods=4)) In [73]: td = january - december In [74]: td[2] += datetime.timedelta(minutes=5, seconds=3) In [75]: td[3] = np.nan In [76]: td Out[76]: 0 31 days 00:00:00 1 31 days 00:00:00 2 31 days 00:05:03 3 NaT dtype: timedelta64[ns] # to seconds In [77]: td.astype("timedelta64[s]") Out[77]: 0 31 days 00:00:00 1 31 days 00:00:00 2 31 days 00:05:03 3 NaT dtype: timedelta64[s] 对于除支持的“s”、“ms”、“us”、“ns”之外的 timedelta64 分辨率,另一种方法是除以另一个 timedelta 对象。请注意,除以 NumPy 标量是真正的除法,而 astyping 相当于底除法。 # to days In [78]: td / np.timedelta64(1, "D") Out[78]: 0 31.000000 1 31.000000 2 31.003507 3 NaN dtype: float64 将 Series除或乘以timedelta64[ns]整数或整数 Series 会产生另一个timedelta64[ns]dtypes Series。 In [79]: td * -1 Out[79]: 0 -31 days +00:00:00 1 -31 days +00:00:00 2 -32 days +23:54:57 3 NaT dtype: timedelta64[ns] In [80]: td * pd.Series([1, 2, 3, 4]) Out[80]: 0 31 days 00:00:00 1 62 days 00:00:00 2 93 days 00:15:09 3 NaT dtype: timedelta64[ns] timedelta64[ns]系列除以标量的 四舍五入(取整除法)Timedelta得到一系列整数。 In [81]: td // pd.Timedelta(days=3, hours=4) Out[81]: 0 9.0 1 9.0 2 9.0 3 NaN dtype: float64 In [82]: pd.Timedelta(days=3, hours=4) // td Out[82]: 0 0.0 1 0.0 2 0.0 3 NaN dtype: float64 mod (%) 和 divmod 运算是为Timedelta与另一个类似 timedelta 或数字参数进行运算时定义的。 In [83]: pd.Timedelta(hours=37) % datetime.timedelta(hours=2) Out[83]: Timedelta('0 days 01:00:00') # divmod against a timedelta-like returns a pair (int, Timedelta) In [84]: divmod(datetime.timedelta(hours=2), pd.Timedelta(minutes=11)) Out[84]: (10, Timedelta('0 days 00:10:00')) # divmod against a numeric returns a pair (Timedelta, Timedelta) In [85]: divmod(pd.Timedelta(hours=25), 86400000000000) Out[85]: (Timedelta('0 days 00:00:00.000000001'), Timedelta('0 days 01:00:00')) 属性# 您可以直接使用属性Timedelta来访问各个组件。这些与 返回的值相同,例如,该属性表示 >= 0 且 < 1 天的秒数。这些是根据是否签名来签名的。TimedeltaIndexdays,seconds,microseconds,nanosecondsdatetime.timedelta.secondsTimedelta 这些操作也可以通过.dt的属性直接访问Series。 笔记 请注意,属性不是 的显示值Timedelta。用于.components检索显示的值。 为一个Series: In [86]: td.dt.days Out[86]: 0 31.0 1 31.0 2 31.0 3 NaN dtype: float64 In [87]: td.dt.seconds Out[87]: 0 0.0 1 0.0 2 303.0 3 NaN dtype: float64 Timedelta您可以直接访问标量字段的值。 In [88]: tds = pd.Timedelta("31 days 5 min 3 sec") In [89]: tds.days Out[89]: 31 In [90]: tds.seconds Out[90]: 303 In [91]: (-tds).seconds Out[91]: 86097 您可以使用该.components属性来访问时间增量的简化形式。这将返回一个DataFrame类似于 的索引Series。这些是的显示Timedelta值。 In [92]: td.dt.components Out[92]: days hours minutes seconds milliseconds microseconds nanoseconds 0 31.0 0.0 0.0 0.0 0.0 0.0 0.0 1 31.0 0.0 0.0 0.0 0.0 0.0 0.0 2 31.0 0.0 5.0 3.0 0.0 0.0 0.0 3 NaN NaN NaN NaN NaN NaN NaN In [93]: td.dt.components.seconds Out[93]: 0 0.0 1 0.0 2 3.0 3 NaN Name: seconds, dtype: float64 您可以使用以下 方法将 a 转换Timedelta为ISO 8601 持续时间字符串.isoformat In [94]: pd.Timedelta( ....: days=6, minutes=50, seconds=3, milliseconds=10, microseconds=10, nanoseconds=12 ....: ).isoformat() ....: Out[94]: 'P6DT0H50M3.010010012S' 时间增量索引# 要生成具有时间增量的索引,您可以使用TimedeltaIndex或timedelta_range()构造函数。 使用TimedeltaIndex您可以传递类似字符串、Timedelta、timedelta或 的np.timedelta64对象。通过np.nan/pd.NaT/nat将代表缺失值。 In [95]: pd.TimedeltaIndex( ....: [ ....: "1 days", ....: "1 days, 00:00:05", ....: np.timedelta64(2, "D"), ....: datetime.timedelta(days=2, seconds=2), ....: ] ....: ) ....: Out[95]: TimedeltaIndex(['1 days 00:00:00', '1 days 00:00:05', '2 days 00:00:00', '2 days 00:00:02'], dtype='timedelta64[ns]', freq=None) 可以传递字符串 'infer' 以便将索引的频率设置为创建时推断的频率: In [96]: pd.TimedeltaIndex(["0 days", "10 days", "20 days"], freq="infer") Out[96]: TimedeltaIndex(['0 days', '10 days', '20 days'], dtype='timedelta64[ns]', freq='10D') 生成时间增量范围# 与 类似,您可以 使用date_range()构建 a 的常规范围。默认频率是日历日:TimedeltaIndextimedelta_range()timedelta_range In [97]: pd.timedelta_range(start="1 days", periods=5) Out[97]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D') start、end、 和的各种组合periods可与 一起使用 timedelta_range: In [98]: pd.timedelta_range(start="1 days", end="5 days") Out[98]: TimedeltaIndex(['1 days', '2 days', '3 days', '4 days', '5 days'], dtype='timedelta64[ns]', freq='D') In [99]: pd.timedelta_range(end="10 days", periods=4) Out[99]: TimedeltaIndex(['7 days', '8 days', '9 days', '10 days'], dtype='timedelta64[ns]', freq='D') 该freq参数可以传递多种频率别名: In [100]: pd.timedelta_range(start="1 days", end="2 days", freq="30min") Out[100]: TimedeltaIndex(['1 days 00:00:00', '1 days 00:30:00', '1 days 01:00:00', '1 days 01:30:00', '1 days 02:00:00', '1 days 02:30:00', '1 days 03:00:00', '1 days 03:30:00', '1 days 04:00:00', '1 days 04:30:00', '1 days 05:00:00', '1 days 05:30:00', '1 days 06:00:00', '1 days 06:30:00', '1 days 07:00:00', '1 days 07:30:00', '1 days 08:00:00', '1 days 08:30:00', '1 days 09:00:00', '1 days 09:30:00', '1 days 10:00:00', '1 days 10:30:00', '1 days 11:00:00', '1 days 11:30:00', '1 days 12:00:00', '1 days 12:30:00', '1 days 13:00:00', '1 days 13:30:00', '1 days 14:00:00', '1 days 14:30:00', '1 days 15:00:00', '1 days 15:30:00', '1 days 16:00:00', '1 days 16:30:00', '1 days 17:00:00', '1 days 17:30:00', '1 days 18:00:00', '1 days 18:30:00', '1 days 19:00:00', '1 days 19:30:00', '1 days 20:00:00', '1 days 20:30:00', '1 days 21:00:00', '1 days 21:30:00', '1 days 22:00:00', '1 days 22:30:00', '1 days 23:00:00', '1 days 23:30:00', '2 days 00:00:00'], dtype='timedelta64[ns]', freq='30min') In [101]: pd.timedelta_range(start="1 days", periods=5, freq="2D5h") Out[101]: TimedeltaIndex(['1 days 00:00:00', '3 days 05:00:00', '5 days 10:00:00', '7 days 15:00:00', '9 days 20:00:00'], dtype='timedelta64[ns]', freq='53h') 指定start、end、 和periods将生成一系列均匀间隔的时间增量,从start到end包括在内,periods结果 中的元素数量TimedeltaIndex: In [102]: pd.timedelta_range("0 days", "4 days", periods=5) Out[102]: TimedeltaIndex(['0 days', '1 days', '2 days', '3 days', '4 days'], dtype='timedelta64[ns]', freq=None) In [103]: pd.timedelta_range("0 days", "4 days", periods=10) Out[103]: TimedeltaIndex(['0 days 00:00:00', '0 days 10:40:00', '0 days 21:20:00', '1 days 08:00:00', '1 days 18:40:00', '2 days 05:20:00', '2 days 16:00:00', '3 days 02:40:00', '3 days 13:20:00', '4 days 00:00:00'], dtype='timedelta64[ns]', freq=None) 使用 TimedeltaIndex # 与其他类似日期时间的索引类似,DatetimeIndex和PeriodIndex,您可以用作 TimedeltaIndexpandas 对象的索引。 In [104]: s = pd.Series( .....: np.arange(100), .....: index=pd.timedelta_range("1 days", periods=100, freq="h"), .....: ) .....: In [105]: s Out[105]: 1 days 00:00:00 0 1 days 01:00:00 1 1 days 02:00:00 2 1 days 03:00:00 3 1 days 04:00:00 4 .. 4 days 23:00:00 95 5 days 00:00:00 96 5 days 01:00:00 97 5 days 02:00:00 98 5 days 03:00:00 99 Freq: h, Length: 100, dtype: int64 选择的工作方式类似,但对字符串和切片进行强制: In [106]: s["1 day":"2 day"] Out[106]: 1 days 00:00:00 0 1 days 01:00:00 1 1 days 02:00:00 2 1 days 03:00:00 3 1 days 04:00:00 4 .. 2 days 19:00:00 43 2 days 20:00:00 44 2 days 21:00:00 45 2 days 22:00:00 46 2 days 23:00:00 47 Freq: h, Length: 48, dtype: int64 In [107]: s["1 day 01:00:00"] Out[107]: 1 In [108]: s[pd.Timedelta("1 day 1h")] Out[108]: 1 此外,您可以使用部分字符串选择,并且将推断范围: In [109]: s["1 day":"1 day 5 hours"] Out[109]: 1 days 00:00:00 0 1 days 01:00:00 1 1 days 02:00:00 2 1 days 03:00:00 3 1 days 04:00:00 4 1 days 05:00:00 5 Freq: h, dtype: int64 运营# TimedeltaIndex最后,与的组合DatetimeIndex允许某些保留 NaT 的组合操作: In [110]: tdi = pd.TimedeltaIndex(["1 days", pd.NaT, "2 days"]) In [111]: tdi.to_list() Out[111]: [Timedelta('1 days 00:00:00'), NaT, Timedelta('2 days 00:00:00')] In [112]: dti = pd.date_range("20130101", periods=3) In [113]: dti.to_list() Out[113]: [Timestamp('2013-01-01 00:00:00'), Timestamp('2013-01-02 00:00:00'), Timestamp('2013-01-03 00:00:00')] In [114]: (dti + tdi).to_list() Out[114]: [Timestamp('2013-01-02 00:00:00'), NaT, Timestamp('2013-01-05 00:00:00')] In [115]: (dti - tdi).to_list() Out[115]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2013-01-01 00:00:00')] 转换# 与上面的频率转换类似Series,您可以转换这些索引以产生另一个索引。 In [116]: tdi / np.timedelta64(1, "s") Out[116]: Index([86400.0, nan, 172800.0], dtype='float64') In [117]: tdi.astype("timedelta64[s]") Out[117]: TimedeltaIndex(['1 days', NaT, '2 days'], dtype='timedelta64[s]', freq=None) 标量类型操作也可以工作。这些可能会返回不同类型的索引。 # adding or timedelta and date -> datelike In [118]: tdi + pd.Timestamp("20130101") Out[118]: DatetimeIndex(['2013-01-02', 'NaT', '2013-01-03'], dtype='datetime64[ns]', freq=None) # subtraction of a date and a timedelta -> datelike # note that trying to subtract a date from a Timedelta will raise an exception In [119]: (pd.Timestamp("20130101") - tdi).to_list() Out[119]: [Timestamp('2012-12-31 00:00:00'), NaT, Timestamp('2012-12-30 00:00:00')] # timedelta + timedelta -> timedelta In [120]: tdi + pd.Timedelta("10 days") Out[120]: TimedeltaIndex(['11 days', NaT, '12 days'], dtype='timedelta64[ns]', freq=None) # division can result in a Timedelta if the divisor is an integer In [121]: tdi / 2 Out[121]: TimedeltaIndex(['0 days 12:00:00', NaT, '1 days 00:00:00'], dtype='timedelta64[ns]', freq=None) # or a float64 Index if the divisor is a Timedelta In [122]: tdi / tdi[0] Out[122]: Index([1.0, nan, 2.0], dtype='float64') 重新采样# 与时间序列重采样类似,我们可以使用TimedeltaIndex. In [123]: s.resample("D").mean() Out[123]: 1 days 11.5 2 days 35.5 3 days 59.5 4 days 83.5 5 days 97.5 Freq: D, dtype: float64