ERA-40/Interim数据批量下载及Arcmap批处理

本文整理分享ERA-Interim系列数据的批量下载和ArcGIS里栅格数据的Arcpy批处理代码。批量下载的py脚本在官网中有很详细的介绍，本文的重点是数据下载后的一系列栅格处理。首先用matlab把.nc数据转为栅格，然后借助Arcpy整理了栅格数据处理代码，包括tif转shp、插值(Kriging Interpolation)、裁剪(Raster clip)和提取特定流域数据(Zonal Statistics)等，相当于是ERA再分析数据集从下载到处理，在进行数据分析前一步的所有批处理。
虽然Arcmap里各类操作代码在帮助文档里都有说明，但还是需要稍作调整，并且整合ERA系列数据的处理步骤全网仅此一处，对正在下数据、处理数据的同学来说还是有帮助的，同时这也是自己近期工作的一个整理。では，始めよう( • ̀ω•́ )✧

00x01 数据简介

ECMWF(European Centre for Medium-Range Weather Forecasts, 欧洲中期天气预报中心)主要的再分析数据资料为ERA系列，ERA-Interim是既ERA-15和ERA-40之后现持续更新的资料，下载地址here。该数据集有日值、月值和年值，分有再分析资料和预报资料，两年前我曾在Melunaya的简书里粗略地分析了一下，这里就不再细讲。对于中长尺度的水文研究，我们选用Monthly Means of Daily Means（月均值）。

00x02 批量下载

在官网数据下载的页面中，有进行批处理的教程，对于选择多年数据下载需要写个循环。example里有介绍，这里就直接放代码吧。

ecmwf_era_interim.py

#!/usr/bin/env python

import calendar
from ecmwfapi import ECMWFDataServer
server = ECMWFDataServer()
   
def retrieve_era_interim():
    '''    
       A function to demonstrate how to retrieve ERA-Interim monthly means data.   
       Change the variables below to adapt to your needs.
  
       ERA-Interim monthly data is timestamped to the first of the month, hence dates
       have to be specified as a list in this format:
       '19950101/19950201/19950301/.../20051101/20051201'.
  
       Data is stored on one tape per decade, so for efficiency we split the date range into
       decades, hence we get multiple date lists by decade:
       '19950101/19950201/19950301/.../19991101/19991201'
       '20000101/20000201/20000301/.../20051101/20051201'
       '20000101/20000201/20000301/.../20051101/20051201'
       In the example below the output data are organised as one file per decade:
       'era_interim_moda_1990'
       'era_interim_moda_2000'
       'era_interim_moda_2010'
       Please note that at the moment only decade 2010 is available.
    '''
    yearStart = 2010                        # adjust to your requirements - as of 2017-07 only 2010-01-01 onwards is available
    yearEnd = 2018                          # adjust to your requirements
    months = [1,2,3,4,5,6,7,8,9,10,11,12]   # adjust to your requirements
  
    years = range(yearStart, yearEnd+1)
    print 'Years: ',years
    decades = list(set([divmod(i, 10)[0] for i in years]))
    decades = [x * 10 for x in decades]
    decades.sort()
    print 'Decades:', decades
  
# loop through decades and create a month list
    for d in decades:
        requestDates=''
        for y in years:
            if ((divmod(y,10)[0])*10) == d:
                for m in months:
                    requestDates = requestDates+str(y)+(str(m)).zfill(2)+'01/'
        requestDates = requestDates[:-1]
        print 'Requesting dates: ', requestDates
        target = 'era_interim_sw_%d'% (d)    # specifies the output file name #soilwater
        print 'Output file: ', target
        era_interim_request(requestDates, d, target)

# the actual data request
def era_interim_request(requestDates, decade, target):
    '''
        Change the keywords below to adapt to your needs.
        The easiest way to do this is:
        1. go to http://apps.ecmwf.int/datasets/data/interim-full-moda/levtype=sfc/
        2. click through to the parameters you want, for any date
        3. click 'View the MARS request'
    '''
    server.retrieve({
        "class": "ei",              # do not change
        "dataset": "interim",       # do not change
        'expver': '1',              # do not change
        'stream': 'moda',           # monthly means of daily means
        'type': 'an',               # analysis (versus forecast, fc)
        'levtype': 'sfc',           # surface data (versus pressure level, pl, and model level, ml)
        'param': '39.128/40.128/41.128/42.128',         # here: parameter(Volumetric soil water layer 1,Volumetric soil water layer 2,
                                                        # Volumetric soil water layer 3, Volumetric soil water layer 4)
        'grid': '0.125/0.125',        # horizontal resolution of output in degrees lat/lon
        'format': 'netcdf',         # get output in netcdf; only works with regular grids; for GRIB remove this line
        'date': requestDates,       # dates, set automatically from above
        'decade': decade,           # decade set automatically from above
        'target': target,            # output file name, set automatically from above
    })
 
if __name__ == '__main__':
    retrieve_era_interim()

此处下载的数据是土壤水分数据，不同的数据和数据集参数不一样，要根据下载页面里的”MARS request”进行修改。ERA-40和ERA-Interim的不同点在于class、dataset，其他都一致。选择下载的数据格式为NetCDF文件，后缀为.nc。

ecmwf_era_40.py
1
2
"class": "e4",
"dataset": "era40",

00x03 批处理

nc文件可以在panoply直接可视化，在处理前查看头文件信息和数据的大致情况，检查一下是否下错。

03.1 nc转tif

Arcmap不能直接处理nc数据，这里用Matlab将nc数据转为tif栅格数据。
Matlab有许多nc文件处理的相关函数，用ncread函数获取nc文件里的变量数据，用geotiffwrite函数将所需变量写入tif并输出。nc文件“按层存储”，读取需要两个循环，逐年与逐月。这段代码也是在网上拼拼凑凑出来的，参考作者过多不便列出，代码不属于我。

ncread_tif.m

% ERA-Interim-soilwater数据读取
clc  

InPath = 'I:\Soil_Moisture_Data\ERA_Interim_Soilwater\';  
InFile = strcat(InPath,'era_interim_sw_2000.nc');  % 获取nc文件的基本信息  
ncdisp(InFile)  
  
Lon = ncread(InFile,'longitude'); %读取经度数据  
Lat = ncread(InFile,'latitude');  
Sw1 = ncread(InFile,'swvl1');

for i=1:10
    for j=1:12
        k=12*(i-1)+j;
        Sw1_i=Sw1(:,:,k);
        ii=1999+i;
        
        Sw1_i_Tif = strcat(InPath,'sw1_',num2str(ii),'_',num2str(j),'.tif');
        GeoRef = georasterref('Rastersize',[1441,2880],'Latlim',[-90,90],'Lonlim',[0,360]);
        geotiffwrite(Sw1_i_Tif,rot90(Sw1_i,1),GeoRef)
    end   
end
disp('finished')

需要注意的就是写地理坐标的时候注意栅格和经纬度的大小与范围，以及栅格的投影，是否需要旋转。
把.nc数据下载下来转为月尺度/日尺度的.tif之后，一般就可以给要求“下载卫星数据并初步处理留档备用”的导师交差了。

03.2 栅格处理

接下来是我前段时间需要获取研究区域的均值时，根据arcmap里的帮助文档，写的python批处理脚本。毕竟几十年的数据太多了，而且当需要对不同卫星资料进行对比的时候，处理操作大致一样，不写点代码自己手动batch的话就很累了。

有的卫星数据分辨率不够高，需要插值。因为我处理的都是气象数据，插值的话用克里金插值。
栅格插值先将栅格数据转为点矢量，再进行克里金插值，得到栅格数据。需要用到的工具有Conversion tools –> From raster –> Raster to point 和 Spatial analyst tools –> Interpolation –> Kriging.

RasterToPoint.py

# Name: RasterToPoint_Ex.py
# Description: Converts a raster dataset to point features.
# Requirements: None

# Import system modules
import arcpy
from arcpy import env

# Set environment settings
arcpy.env.workspace = "H:\\***_year\\"

# Set local variables
rasters = arcpy.ListRasters("*","tif")
outpath = "H:\\***_year_ip\\"
field = "Value"

# Execute RasterToPoint
for raster in rasters:
    print(raster)
    outpoint = outpath + "ret_" + raster[5:9] + ".shp"
    arcpy.RasterToPoint_conversion(raster, outpoint, field)
    print("ret_" + raster[5:9] + ".shp" + " has done")
print("All done")

KigingInterpolation.py

# Name: Kriging_Interpolation_Ex.py
# coding=utf-8
# Description: Interpolates a surface from points using kriging.
# Requirements: Spatial Analyst Extension
# Import system modules

import arcpy
from arcpy import env
from arcpy.sa import *

# Check out the ArcGIS Spatial Analyst extension license
arcpy.CheckOutExtension("Spatial")

# Set environment settings
env.workspace = "H:\\***_year_ip\\"

# Set local variables
features = arcpy.ListFeatureClasses() #ListData function
outpath = "H:\\***_year_ip\\"

field = "grid_code"
cellSize = 0.01 #插值后的栅格大小
# lagSize = default
# majorRange = default
# partialSill = dedfault
# nugget = default

# Set complex variables
kModelOrdinary = KrigingModelOrdinary("SPHERICAL") #大小写很关键
kRadius = RadiusVariable(12)

# Execute Kriging
for feature in features:
    print(feature)
    outKriging = Kriging(feature, field, kModelOrdinary, cellSize, kRadius, "#") #“#”表示不设置，为空
    
    # Save the output
    outKriging.save(outpath + "ret_ip" + feature[4:8] + '.tif')
    print("ret_ip" + feature[4:8] + '.tif has done')

print"All done!"

插值得到高分辨率后裁剪至自己所需的研究区域。用栅格clip就ok，Data management tools –> Raster –> Raster processing –> clip.

RasterClip.py

import arcpy
arcpy.CheckOutExtension("spatial")

arcpy.env.workspace = "I:\\***_tif\\"

rasters = arcpy.ListRasters("*","tif")

OutPath = "I:\\***\\"

mask = "H:\\***.shp" #裁剪用的边界

for raster in rasters:
    print(raster)
    OutData = OutPath + "r" + raster[0:]
    arcpy.Clip_management(raster, "#", OutData, feature,"0", "ClippingGeometry","MAINTAIN_EXTENT")
                                                      #“0”表示nodata值设为0，默认为“#”
    print("r" + raster[0:] + " has done")
print("All done")

为了得到研究区域里的面均值，用zonal工具，Spatial analyst tools –> Zonal –> Zonal statistics as table。得到dbf文件，可以直接用excel打开。

ZonalStatisticsAsTables.py

# Name: ZonalStatisticsAsTable_Ex.py
# Description: Summarizes values of a raster within the zones of 
#              another dataset and reports the results to a table.
# Requirements: Spatial Analyst Extension

# Import system modules
import arcpy
from arcpy import env
from arcpy.sa import *

# Set environment settings
env.workspace = "H:\\***_year\\"

# Set local variables
inZoneData = "H:\\***_watersh.shp" 
zoneField = "RiverName" #按shp边界文件里的rivername进行数值提取
rasters = arcpy.ListRasters("*","tif")
outpath = "H:\\***_year\\"

# Check out the ArcGIS Spatial Analyst extension license
arcpy.CheckOutExtension("Spatial")

# Execute ZonalStatisticsAsTable
for raster in rasters:
    print(raster)
    outTable = outpath + "zonal_" + raster[7:11] + ".dbf"
    outZSaT = ZonalStatisticsAsTable(inZoneData, zoneField, raster, outTable, "NODATA", "ALL") 
                                   #“ALL”即表示所有参数都提取，包括有均值、最大值、最小值、总数等。
    print("zonal_" + raster[7:11] + ".dbf" + " has done")
print("All done")

03.3 DBF文件批量写入

zonal出来后的表格是每个月一个表，需要统计月、年数据的话，得把每个表里各流域的数据汇总到一个表里。由于dbf格式实为数据框（dataframe），在R语言里可以很方便的读写，遂用R语言写了一段批量读写dbf数据的代码。

WriteDBF.R

# Name:读写dbf数据文件。
# dBase 是 Ashton-Tate 写的一个DOS程序，随后被Borland拥有。它有一种很流行的二进制无格式文件，以.dbf作为扩展名。
# dBase文件含有一个信息头，随后是一系列字段，因此和 R 的数据框非常类似。
# 数据本身以文本格式保存，可以包括字符，逻辑和数值字段，和以后版本的其它类型（见http://clicketyclick.dk/docs/data_types.html）

# 目的：批量把Arcmap里zonal后的tables数据整合到一个表里。
# zonal后的tables类型为.dbf，按 rivername分了A河和B河，需要每流域的mean。
# 每个月一个表，1980-2015共36年，共432个表，
# 需要表头保留，还要增加一列时间。————增加时间列没有写出。增加写入列数据、从文件名中提取时间还不会。

# 编写思路：
# 遍历文件于集合myfiles中，逐个读取至数据框X，按行提取数据暂存数据框Y后写入新表格中。
# 需要先定义中间过程中的数据框；
# 逐个读取再存入需要借助中间变量，循环前为空，写入新表后重新回空。

# 我觉得R里面肯定有更简单些的函数，可以直接追加之类的……目前没找到。写这个时间用太久了，先就用这个循环处理吧。
# 以下以一个流域即一个表为例。

library(foreign)

myfiles <- Sys.glob("H:\\***_dbf\\*.dbf")
#返回的myfiles是一个装满满足条件的文件名的集合，而没有读取这些文件
n <- length(myfiles)
#先要定义空的数据框，保留数据框的表头，但数据为空
dbf_ex <- read.dbf(myfiles[1])
empty_dbf = dbf_ex[FALSE,]

Ahe = empty_dbf
zonal_A = empty_dbf

i=1
while(i<=n){
  print("*************************************")
  print (i)
  #读取dbf数据文件
  empty_dbf <- read.dbf(myfiles[i])
  print (empty_dbf)
  #以RiverName为行名，以便检索
  rownames(empty_dbf)=empty_dbf$RiverName
  #按行RiverName，列'RiverName','MEAN'提取数据
  Ahe <- empty_dbf['Ahe',c('RiverName','MEAN')]
  #行追加写入
  zonal_A <-rbind(zonal_A,Ahe)
  #print (zonal_A)
  #中间数据回空
  empty_dbf = dbf_ex[FALSE,]
  Ahe= empty_dbf
  i=i+1
}
write.dbf(zonal_A,file = "H:\\***zonalA.dbf")

其实我从来没有学过R语言，以前用过的都是现有的包下载，处理数据+作图。第一次处理dbf的时候，用python鼓捣了很久，后来才明白应该从数据格式入手，然后才知道R语言的dataframe和dbf数据非常契合，立马解决读取的问题。然后写入的时候，又搞不清楚逻辑关系，想了很久。找不到“神奇的一键函数”，就只能用朴素的循环来写了。

👆👆至此完成ERA-Interim再分析资料数据的批量下载、初步处理、获取流域面均值的批量操作。
以上栅格数据的处理好在是依次处理，于是可以整合到一个脚本里，只要把每次输入和输出的文件夹及文件命名好，当需要这些或者类似处理的卫星资料转为栅格文件后，就可以让他自己一口气跑了，开心( • ̀ω•́ )✧

第一次写技术内容的博客，内容与编排还有些生疏。
希望能帮助到一些同学，借此为自己积攒些人品，只求之后的科研工作能顺利些。

❀❀❀