Pandas foundation | user travel log time merge sort

osc_ ds5ni1ur 2021-01-19 01:38:12
pandas foundation user travel log


author : Xiaoming ,Pandas Data processing experts , Committed to helping countless data practitioners solve data processing problems .

Requirement specification

There is a data of user visit log in the following format ( After copying the table shown below , Run the following code to get the same result ):

import pandas as pd
df = pd.read_clipboard()
df

result :

uid start end
0 A 1 2
1 A 4 7
2 A 3 6
3 A 8 9
4 B 2 3
5 B 4 7
6 B 10 11
7 B 6 8
8 B 12 15
9 C 14 15

among uid Represents each user ,start The beginning of the tour ,end It's the end of the tour , As can be seen from the table above , There is a overlap of tour time , For example, users A Travel time 3-6 and 4-7 overlap , It can be said that the tour time is 3-7.

What we need to do now is to merge the overlapping travel time of each user , Finally, it is displayed in chronological order .

Be careful :3-4 and 4-6 It's also overlapping time , Can be combined into 3-6.

First time merge and sort a user

Take out a user's data , For testing operations :

tmp = df.groupby("uid").get_group('B')
tmp

result :

uid start end
4 B 2 3
5 B 4 7
6 B 10 11
7 B 6 8
8 B 12 15

Observation found that , To solve this problem , We first need to sort the data by start time .

img

After ordering :

tmp = tmp.sort_values('start')
tmp

result :

uid start end
4 B 2 3
5 B 4 7
7 B 6 8
6 B 10 11
8 B 12 15

Look at the sorted data , We can quickly see the rules of merger :

Merge when the start time of the current tour record is less than or equal to the end time of the previous record , It's simple :

result = []
for uid, start, end in tmp.values:
# If there is no data in the result set or the start time of the current record is greater than the end time of the previous record 
# You can directly add the current record to the result set 
if not result or start > result[-1][2]:
result.append([uid, start, end])
else:
# otherwise , Indicates that the current record can be merged with the previous record 
# The merge method is if the end time of the current record is greater than the end time of the previous record ,
# Then the end time of the previous record is changed to the end time of the current record 
result[-1][2] = max(result[-1][2], end)
tmp = pd.DataFrame(result, columns=["uid", "start", "end"])
tmp

result :

uid start end
0 B 2 3
1 B 4 8
2 B 10 11
3 B 12 15

Complete code

And then we sort out the whole processing code :

result = []
for uid, tmp in df.groupby("uid"):
tmp = tmp[["start", "end"]].sort_values('start')
rows = []
for start, end in tmp.values:
if not rows or start > rows[-1][2]:
rows.append([uid, start, end])
else:
rows[-1][2] = max(rows[-1][2], end)
tmp = pd.DataFrame(rows, columns=["uid", "start", "end"])
result.append(tmp)
result = pd.concat(result)
result

result :

uid start end
0 A 1 2
1 A 3 7
2 A 8 9
0 B 2 3
1 B 4 8
2 B 10 11
3 B 12 15
0 C 14 15

Okay , The end , And the flower !

版权声明
本文为[osc_ ds5ni1ur]所创,转载请带上原文链接,感谢
https://pythonmana.com/2021/01/20210112114856324l.html

  1. 小白量化投资交易入门课(python入门金融分析)
  2. Python:PyCharm选择性忽略PEP8警告
  3. Python: pychar selectively ignores pep8 warnings
  4. Django-模板
  5. Django template
  6. Python正则表达式大全
  7. 最全Python正则表达式来袭
  8. A python knowledge for Xiaobai
  9. 2. Flexible pandas index
  10. 1. Get to know pandas
  11. See how I use Python to create a magic with baby (one play can play for a day)?
  12. Wow, python can do real-time translation
  13. Python经典编程习题100例
  14. 100 examples of Python classic programming exercises
  15. Invincible, with Python for English teachers to develop a magic tool for English composition correction (support primary school to IELTS)
  16. 抖音数据采集教程,最全python库selenium自动化使用
  17. Pandas 11-综合练习
  18. Pandas 11 - comprehensive exercises
  19. Pandas基础|用户游览日志时间合并排序
  20. python自学 第三章 python语言基础之保留字、标识符与内置函数
  21. python学习例程3-函数
  22. Python GUI 之Tkinter小结 - 知乎
  23. Chapter 3 reserved words, identifiers and built-in functions of the foundation of Python
  24. Tkinter summary of Python GUI - Zhihu
  25. 【Python常用包】itertools
  26. Itertools
  27. [Python] Matplotlib 图表的绘制和美化技巧
  28. Drawing and beautifying skills of [Python] Matplotlib chart
  29. Drawing and beautifying skills of [Python] Matplotlib chart
  30. Python序列之列表(一)
  31. Python解析库lxml与xpath用法总结
  32. Python解析库lxml与xpath用法总结
  33. Usage Summary of Python parsing library lxml and XPath
  34. Usage Summary of Python parsing library lxml and XPath
  35. Python web/HTML GUI
  36. Why is sanic better than Django flame?
  37. Wechat applet Python sends subscription message
  38. Invincible, with Python for English teachers to develop an English composition correction artifact (support primary school to IELTS)
  39. How can I use Python to create a magic with children (one can play for one day)?
  40. Pandas module
  41. Machine learning in Python - Boston house price forecast
  42. 50 Great Python modules
  43. Share the survival status of Python practitioners and tell you the real salary of general programmers
  44. Pandas basic operation update
  45. Python Programming day02 Python operator
  46. Usage Summary of Python parsing library lxml and XPath
  47. 1. First meeting pandas
  48. Conversion between Python and base conversion between Python and base
  49. Basics of Python
  50. Fundamentals of python (XIV): errors and exceptions
  51. Fundamentals of python (8): time related modules
  52. Fundamentals of python (I): necessary knowledge for getting started
  53. Operators in Python 3
  54. The list of national computer non graduate schools (captured by Python), just look at this one!
  55. Python data visualization: Seaborn
  56. Quick start pandas (lower)
  57. Operators in Python 3
  58. Python tarfile module
  59. Python basic syntax