asuerhao's Blog

如果有什么做的不到的地方请尽管留言, 我会改进的 : )

出租车乘客上下车的数据文件处理

    原始文件内容为:

name,time,jd,wd,status,v,angle,
粤B13K97,2011/04/18 00:00:26,114.044151,22.531418,1,75,5,
粤B13K97,2011/04/18 00:00:56,114.038452,22.530817,0,78,5,
粤B13K97,2011/04/18 00:01:56,114.026199,22.531134,0,84,6,
粤B13K97,2011/04/18 00:02:26,114.020035,22.532049,0,80,5,
粤B13K97,2011/04/18 00:02:56,114.013885,22.530767,1,82,5,
粤B13K97,2011/04/18 00:03:04,114.012283,22.530399,1,80,5,
粤B13K97,2011/04/18 00:03:26,114.007767,22.529682,1,79,5,
粤B13K97,2011/04/18 00:03:56,114.001984,22.530184,1,75,6,
粤B13K97,2011/04/18 00:04:26,113.996498,22.528917,0,75,5,
粤B13K97,2011/04/18 00:04:56,113.991653,22.526068,0,73,5,
粤B13K97,2011/04/18 00:05:26,113.986450,22.523933,0,74,5,
粤B13K97,2011/04/18 00:06:26,113.975067,22.522949,1,78,5,
粤B13K97,2011/04/18 00:06:34,113.973465,22.522949,1,80,5,
粤B13K97,2011/04/18 00:06:56,113.968849,22.522932,1,83,5,
粤B13K97,2011/04/18 00:07:56,113.956467,22.523268,1,69,6,
粤B13K97,2011/04/18 00:08:26,113.951736,22.523832,1,54,6,
粤B13K97,2011/04/18 00:08:56,113.949387,22.525534,1,49,0,
粤B13K97,2011/04/18 00:09:26,113.949799,22.529217,0,47,0,

......

其中第五个字段表示出租车的载客信息, 1表示有乘客 , 0表示空载. 需要将所有上下车的乘车信息挑出来, 即由1变为0或由0变为1:

直接上Shell脚本:

sed -e 1d FileName |
                      awk -F, '{ now_f5=int($5)
                                 if((now_f5 != pro_f5) && (pro != ""))
                                     print pro "\n" $0
                                 pro_f5=int($5); pro=$0}' |
                                                           uniq

处理后的数据为:

粤B13K97,2011/04/18 00:00:26,114.044151,22.531418,1,75,5,
粤B13K97,2011/04/18 00:00:56,114.038452,22.530817,0,78,5,
粤B13K97,2011/04/18 00:02:26,114.020035,22.532049,0,80,5,
粤B13K97,2011/04/18 00:02:56,114.013885,22.530767,1,82,5,
粤B13K97,2011/04/18 00:03:56,114.001984,22.530184,1,75,6,
粤B13K97,2011/04/18 00:04:26,113.996498,22.528917,0,75,5,
粤B13K97,2011/04/18 00:05:26,113.986450,22.523933,0,74,5,
粤B13K97,2011/04/18 00:06:26,113.975067,22.522949,1,78,5,
粤B13K97,2011/04/18 00:08:56,113.949387,22.525534,1,49,0,
粤B13K97,2011/04/18 00:09:26,113.949799,22.529217,0,47,0,

......

 

安全BASH脚本的开场白(仅供参考)

#! /bin/bash --
# '--'标志选项的结束,禁止其余的选项处理.
# 任何'--'之后的参数将作为文件名和参数对待.
# 参数 '-'与之等价. 这可以避免某种程度的欺骗攻击(Spoofing Attack).

# IFS变量中存储着输入字段分割符,它会影响Shell接下来对输入数据解释的方式.
# 为了防止某些Shell导入该变量的一个外部设置,
# 在脚本开始时将IFS重设为标准值(空格,Tab和换行):
IFS=$' \t\n'

# 为了执行我们所预期的命令:
# 首先确定unalias不是一个被重新定义的函数(POSIX中,unset是一个特殊Shell内部命令,
# 它在函数和普通内部命令之前执行,所以你不必担心它被重新定义为一个函数,
# 不过,GNU/BASH却可以将其定义为函数,不解。。。):
unset -f unalias
# 删除所有命令别名,前面加'\'是为了保证unalias不是一个别名:
\unalias -a
# 确保command不是一个函数, 它本身是一个普通的内部命令:
unset -f command
# 设置一个可以信赖的PATH变量:
# 'command -p', 表示使用$PATH的默认值,并避开Shell函数,以执行后面的命令.
# 'getconf', 列出系统配置变量值.
SYSPATH="S(command -p getconf PATH 2>/dev/null)"
if [[ -z "$SYSPATH" ]]; then
      SYSPATH="/usr/bin:/bin"
fi
PATH="$SYSPATH:$PATH"
# 确保所有的子进程继承我们的安全查找路径:
export PATH