C# 处理DataTable 重复数据
一个datatable里面假如有ABCD4列10W行数据要求C列所有行相同的数据只保留一行,其他的删除掉,给出一个算法10秒内处理完成...
一个 datatable 里面 假如有A B C D 4列 10W行数据 要求 C列所有行相同的数据只保留一行,其他的删除掉,给出一个算法10秒内处理完成
展开
2个回答
2013-05-31
展开全部
你可以用HashSet<T>来存储已存在的行 检索速度会快很多 代码如下 不知道你的机子什么配置 我的机子Debug模式100000行数据 用了232毫秒using System;
using System.Data;
using System.Collections.Generic;
using System.Diagnostics;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
DataTable dataTable = new DataTable();
dataTable.Columns.AddRange(new DataColumn[] {
new DataColumn("A"),
new DataColumn("B"),
new DataColumn("C"),
new DataColumn("D")
});
for (int i = 0; i < 50000; i++)
{
dataTable.Rows.Add(new object[] { 0, 0, "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + i.ToString(), 0 });
dataTable.Rows.Add(new object[] { 0, 0, "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + (50000 - i).ToString(), 0 });
}
Stopwatch stopwatch = new Stopwatch();
stopwatch.Reset();
stopwatch.Start();
Console.WriteLine("DataTable行数:{0}", dataTable.Rows.Count);
HashSet<string> hash = new HashSet<string>();//.NET 3.0以上使用
//Dictionary<string, int> dic = new Dictionary<string, int>();//.NET 2.0可以用这个
for (int i = 0; i < dataTable.Rows.Count; i++)
{
//if (dic.ContainsKey(dataTable.Rows[i][2] as string))//.NET 2.0
//{
// dic.Add(dataTable.Rows[i][2] as string, 0);
//}
if (!hash.Contains(dataTable.Rows[i][2] as string))
{
hash.Add(dataTable.Rows[i][2] as string);
}
else
{
dataTable.Rows.RemoveAt(i);
i--;
}
}
stopwatch.Stop();
Console.WriteLine("用时:{0}毫秒", stopwatch.ElapsedMilliseconds);
Console.WriteLine("DataTable行数:{0}", dataTable.Rows.Count);
Console.ReadKey();
}
}
}
using System.Data;
using System.Collections.Generic;
using System.Diagnostics;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
DataTable dataTable = new DataTable();
dataTable.Columns.AddRange(new DataColumn[] {
new DataColumn("A"),
new DataColumn("B"),
new DataColumn("C"),
new DataColumn("D")
});
for (int i = 0; i < 50000; i++)
{
dataTable.Rows.Add(new object[] { 0, 0, "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + i.ToString(), 0 });
dataTable.Rows.Add(new object[] { 0, 0, "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + (50000 - i).ToString(), 0 });
}
Stopwatch stopwatch = new Stopwatch();
stopwatch.Reset();
stopwatch.Start();
Console.WriteLine("DataTable行数:{0}", dataTable.Rows.Count);
HashSet<string> hash = new HashSet<string>();//.NET 3.0以上使用
//Dictionary<string, int> dic = new Dictionary<string, int>();//.NET 2.0可以用这个
for (int i = 0; i < dataTable.Rows.Count; i++)
{
//if (dic.ContainsKey(dataTable.Rows[i][2] as string))//.NET 2.0
//{
// dic.Add(dataTable.Rows[i][2] as string, 0);
//}
if (!hash.Contains(dataTable.Rows[i][2] as string))
{
hash.Add(dataTable.Rows[i][2] as string);
}
else
{
dataTable.Rows.RemoveAt(i);
i--;
}
}
stopwatch.Stop();
Console.WriteLine("用时:{0}毫秒", stopwatch.ElapsedMilliseconds);
Console.WriteLine("DataTable行数:{0}", dataTable.Rows.Count);
Console.ReadKey();
}
}
}
2013-05-31
展开全部
如果有id列是标识列:DELETE FROM table1 t1
WHERE t1.id> (SELECT min(t2.id) FROM table t2 WHERE t1.C=t2.C);因为最小的就一条记录,把比最小的都删除掉,结果就只剩一条记录了。
WHERE t1.id> (SELECT min(t2.id) FROM table t2 WHERE t1.C=t2.C);因为最小的就一条记录,把比最小的都删除掉,结果就只剩一条记录了。
已赞过
已踩过<
评论
收起
你对这个回答的评价是?
推荐律师服务:
若未解决您的问题,请您详细描述您的问题,通过百度律临进行免费专业咨询